An overview of NOSQL (JFokus 2011)

NOSQL
- an overview -
JFokus 2011
Emil Eifrem @emileifrem
CEO, Neo Technology emil@neotechnology.com
#neo4j

So what’s the plan?
๏ Why NOSQL?

๏ The NOSQL landscape

๏ NOSQL challenges

๏ Conclusion

2

First off: the name

๏ WE ALL HATES IT, M’KAY?

3

NOSQL is NOT...
๏ NO to SQL

4

NOSQL is NOT...
๏ NO to SQL

๏ NEVER SQL

4

NOSQL is simply

Not Only SQL

5

NOSQL - Why now?
Four trends

7

Trend 1:
data set size

40
2007 Source: IDC 2007

988

Trend 1:
data set size

40
2007 2010 Source: IDC 2007

Trend 2: Connectedness
Information connectivity

web 1.0 web 2.0 “web 3.0”

1990 2000 2010 2020 10


Text documents
web 1.0 web 2.0 “web 3.0”

1990 2000 2010 2020 11


Hypertext

Text documents
web 1.0 web 2.0 “web 3.0”

1990 2000 2010 2020 12


Folksonomies

Tagging

Wikis User-generated
content
Blogs

RSS

Hypertext

Text documents
web 1.0 web 2.0 “web 3.0”

1990 2000 2010 2020 13

Giant
Global
Graph (GGG)

Ontologies

RDF

Folksonomies

Tagging

content
Blogs

RSS

Hypertext

Text documents
web 1.0 web 2.0 “web 3.0”

1990 2000 2010 2020 14

Giant
Global
Graph (GGG)

Ontologies

RDF

Folksonomies

Tagging

content
Blogs

RSS

Hypertext

Text documents
web 1.0 web 2.0 “web 3.0”

1990 2000 2010 2020 15

Trend 3: Semi-structure
๏ Individualization of content
• In the salary lists of the 1970s, all elements had exactly one job
• In Or 15? lists of the 2000s, we need 5 job columns! Or 8?
the salary

๏ All encompassing “entire world views”
• Store more data about each entity
๏ Trend accelerated by the decentralization of content generation
that is the hallmark of the age of participation (“web 2.0”)

16

Aside: RDBMS performance
Relational database

Requirement of application
Performance

Data complexity 17

Relational database

Performance

Data complexity 18

Salary List Relational database

Performance

Data complexity 19


Performance

Majority of
Webapps

Data complexity 20


Performance

Majority of
Webapps

Social network

}
Semantic Trading

custom

Data complexity 21

Trend 4: Architecture

1980s: Application (<-- note lack of s)

Application

DB

22


1990s: Database as integration hub

Application Application Application

DB

23


2000s: (moving towards) Decoupled services
with their own backend

Service Service Service

DB DB DB

24

Why NOSQL Now?

๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture

25

Four NOSQL categories

26

Category 1: Key-Value stores
๏ Lineage:
• “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)
๏ Data model:
• Global key-value mapping
• Think: Globally available HashMap/Dict/etc
๏ Examples:
• Project Voldemort
• Tokyo {Cabinet,Tyrant, etc} 27
Riak

Category 1: Key-Value stores
๏ Strengths
• Simple data model
• Great at scaling out horizontally
๏ Weaknesses:
• Simplistic data model
• Poor for complex data

28

Category II: ColumnFamily (BigTable) stores
๏ Lineage:
• “Bigtable:(2006)
Data”
A Distributed Storage System for Structured

๏ Data model:
• A big table, with column families
๏ Examples:
• HBase
• HyperTable
• Cassandra 29

Category III: Document databases
๏ Lineage:
• Lotus Notes
๏ Data model:
• Collections of documents
• A document is a key-value collection
๏ Examples:
• CouchDB
• MongoDB 30
Terrastore

Document db: An example
๏ How would we model a blogging software?

๏ One stab:
• Represent each Blog as a Collection of Post documents
• Represent Comments as nested documents in the Post
documents

31

Document db: Creating a blog post
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
// ...
Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB
// ...
DB blogs = mongo.getDB( "blogs" ); // Access the blogs database
DBCollection myBlog = blogs.getCollection( "Thobe’s blog" );

DBObject blogPost = new BasicDBObject();
blogPost.put( "title", "JFokus 2011" );
blogPost.put( "pub_date", new Date() );
blogPost.put( "body", "Publishing a post about JFokus in my
MongoDB blog!" );
blogPost.put( "tags", Arrays.asList( "conference", "names" ) );
blogPost.put( "comments", new ArrayList() );

myBlog.insert( blogPost ); 32

Retrieving posts
// ...
import com.mongodb.DBCursor;
// ...

public Object getAllPosts( String blogName ) {
DBCollection blog = db.getCollection( blogName );
return renderPosts( blog.find() );
}

private Object renderPosts( DBCursor cursor ) {
// order by publication date (descending)
cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) );
// ...
}

33

Category IV: Graph databases
๏ Lineage:
• Euler and graph theory
๏ Data model:
• Nodes with properties
• Typed relationships with properties
๏ Examples:
• Sones GraphDB
• InﬁniteGraph 34
Neo4j

Property Graph model

35


LOVES

LIVES WITH
LOVES

OWNS
DRIVES

36

name: “Mary”
LOVES
name: “James” age: 35
age: 32 LIVES WITH
twitter: “@spam” LOVES

OWNS
property type: “car” DRIVES

brand: “Volvo”
model: “V70”

37

Graphs are whiteboard friendly An application domain model
outlined on a whiteboard or piece
of paper would be translated to
an ER-diagram, then normalized
to fit a Relational Database.
With a Graph Database the model
from the whiteboard is
implemented directly.

Image credits: Tobias Ivarsson
38

Graphs are whiteboard friendly An application domain model
outlined on a whiteboard or piece
of paper would be translated to
an ER-diagram, then normalized
to fit a Relational Database.
With a Graph Database the model
thobe from the whiteboard is
implemented directly.

Joe project blog

Wardrobe Strength

Hello Joe

Modularizing Jython

Neo4j performance analysis
Image credits: Tobias Ivarsson
39

Graph db: Creating a social graph
GraphDatabaseService graphDb = new EmbeddedGraphDatabase(
GRAPH_STORAGE_LOCATION );

Transaction tx = graphDb.beginTx();
try {
Node mrAnderson = graphDb.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );

Node morpheus = graphDb.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );

Relationship friendship = mrAnderson.createRelationshipTo(
morpheus, SocialGraphTypes.FRIENDSHIP );

tx.success();
} finally {
tx.finish();
} 40

Graph db: How do I know this person?
Node me = ...
Node you = ...

PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath(
Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ),
/* maximum depth: */ 4 );

Path shortestPath = shortestPathFinder.findSinglePath(me, you);

for ( Node friend : shortestPath.nodes() ) {
System.out.println( friend.getProperty( "name" ) );
}

41

Graph db: Recommend new friends
Node person = ...

TraversalDescription friendsOfFriends = Traversal.description()
.expand( Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) )
.prune( Traversal.pruneAfterDepth( 2 ) )
.breadthFirst() // Visit my friends before their friends.
//Visit a node at most once (don’t recommend direct friends)
.uniqueness( Uniqueness.NODE_GLOBAL )
.filter( new Predicate<Path>() {
// Only return friends of friends
public boolean accept( Path traversalPos ) {
return traversalPos.length() == 2;
}
} );

for ( Node recommendation :
friendsOfFriends.traverse( person ).nodes() ) {
System.out.println( recommendedFriend.getProperty("name") );
} 42

Four emerging NOSQL categories

๏Key-Value stores
๏ColumnFamiy stores
๏Document databases
๏Graph databases

43

Scaling to size vs. Scaling to complexity
Size
Key/Value stores

Complexity

44

Size
Key/Value stores

ColumnFamily stores

Complexity

44

Size
Key/Value stores

ColumnFamily stores

Document databases

Complexity

44

Size
Key/Value stores

ColumnFamily stores

Document databases

Graph databases

Complexity

44

Size
Key/Value stores

ColumnFamily stores

Document databases

Graph databases
Billions of nodes
and relationships

My subjective view: > 90% of use cases

Complexity

44

NOSQL challenges?

45

NOSQL challenges?
๏ Mindshare
• But that’s also product usability (“how do you query it?”)

45

NOSQL challenges?
๏ Mindshare
๏ Tool support
• Both devtime tools and runtime ops tools
• Standards may help?
• ... or maybe just time

45

NOSQL challenges?
๏ Mindshare
๏ Tool support
• Both devtime tools and runtime ops tools
• Standards may help?
• ... or maybe just time
๏ Middleware support

45

Middleware support?
๏ Lemme tell you the story about Mike and his restaurant site

46

Domain & data model

Restaurant UserAccount
@Entity @Entity
public class Restaurant { @Table(name = "user_account")
@Id @GeneratedValue public class UserAccount {
private Long id; @Id @GeneratedValue
private String name; private Long id;
private String city; private String userName;
private String state; private String firstName;
private String zipCode; private String lastName;
@Temporal(TemporalType.TIMESTAMP)
private Date birthDate;
@ManyToMany(cascade = CascadeType.ALL)
private Set<Restaurant> favorites;

47

Step 1: Buildsing a web site

Tomcat

One box

MySQL

48

Step II: Whoa, ppl are actually using it?

49

Step II: Whoa, ppl are actually using it?

Tomcat

Two boxes

MySQL

49

Step III: That’s a LOT of pages served...

50

Step III: That’s a LOT of pages served...

Tomcat Tomcat Tomcat n boxes

MySQL 1 box

50

Step IV: Our DB is completely overwhelmed...

51

Step IV: Our DB is completely overwhelmed...

Tomcat Tomcat Tomcat n boxes

MySQL (m) MySQL (s) n boxes

51

Step V: Our DBs are STILL overwhelmed

?
52

What does the site look like now?

53

Step V: Our DBs are STILL overwhelmed
๏ Turns out the problem is due to joins

๏ A while back Mike introduced a new feature
• Recommend restaurants based on the user’s friends (and friends
of friends)

• Whoa, recommendations aren’t just simple get and put!
• They’re killing us with joins
๏ What about sharding?
๏ What about SSDs?
54

Polyglot persistence (Not Only SQL)
๏ How did we get into this situation?

๏ Well, data sets are increasingly less uniform
๏ Parts of Mike’s data ﬁts well in an RDBMS
๏ But parts of it is graph-shaped
• It ﬁts much better in a graph database!
• And I’m sure that there is or will be very key-value-esque parts
of the dataset

๏ Simple, just store some of it in a graph db and some of it in MySQL!
But what does the code look like? 55

We were here

@Entity @Entity
public class Restaurant { @Table(name = "user_account")
private Set<Restaurant> favorites;

56

Then we added recommendations
@Entity @Entity
@NodeEntity(partial = true) @Table(name = "user_account")
public class Restaurant { @NodeEntity(partial = true)
Recommendation private Set<Restaurant> favorites;

@RelationshipEntity @GraphProperty
public class Recommendation { String nickname;
@StartNode @RelatedTo(type = "friends",
private UserAccount user; elementClass = UserAccount.class)
@EndNode Set<UserAccount> friends;
private Restaurant restaurant; @RelatedToVia(type = "recommends",
private int stars; elementClass = Recommendation.class)
private String comment; Iterable<Recommendation> recommendations;

57

Then we added recommendations
@Entity @Entity
@NodeEntity(partial = true) @Table(name = "user_account")
public class Restaurant { @NodeEntity(partial = true)
Recommendation private Set<Restaurant> favorites;

@RelationshipEntity @GraphProperty
public class Recommendation { String nickname;
@StartNode @RelatedTo(type = "friends",
private UserAccount user; elementClass = UserAccount.class)
@EndNode Set<UserAccount> friends;
private Restaurant restaurant; @RelatedToVia(type = "recommends",
private int stars; elementClass = Recommendation.class)
private String comment; Iterable<Recommendation> recommendations;

58

And we’re back to talking about Middleware

59

๏ This example was from the Spring Data project
• Speciﬁcally Spring Data Graph
• Available at: http://www.springsource.org/spring-data
• Currently in M02 -- will be released 1.0 end of March
• Would LOVEkinda love,REALLY love, likeifValentine’s day user
yesterday
<--- like
some feedback you’re a Spring

59

๏ This example was from the Spring Data project
• Speciﬁcally Spring Data Graph
• Available at: http://www.springsource.org/spring-data
• Currently in M02 -- will be released 1.0 end of March
• Would LOVEkinda love,REALLY love, likeifValentine’s day user
yesterday
<--- like
some feedback you’re a Spring

๏ But this really should be available in any middleware stack
• Ask Redhat / JBoss
• Ask Sun / Oracle 59
Ask Microsoft

Conclusion
๏ There’s an explosion of ‘nosql’ databases out there
• Some are immature and experimental
• Some are coming out of years of battle-hardened production
๏ NOSQL is about ﬁnding the right tool for the job
• Frequently that’s an RDBMS
• But increasingly commonly an RDBMS is the perfect ﬁt
๏ We will have heterogenous data backends in the future
• Now the restthatthe stack needs to step up and help developers
cope with
of
60

Key takeaway

Not Only SQL
is here
61

Key takeaway

Not Only SQL
is here to stay
62

Key takeaway

Not Only SQL
is FUN - dl & play
around now!
63

Image credits: Lost! Sorry... :(

64

An overview of NOSQL (JFokus 2011)

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (16)

Similaire à An overview of NOSQL (JFokus 2011)

Similaire à An overview of NOSQL (JFokus 2011) (20)

Plus de Emil Eifrem

Plus de Emil Eifrem (6)

Dernier

Dernier (20)

An overview of NOSQL (JFokus 2011)