This presentation is a review of the NoSQL spaces I did for the X Jornades de Programari Lliure in Barcelona.
You will see a complete review of the NoSQL movement, use cases, technology review, an special review of what are the Graph Databases. And more....
Special thanks to @Hagenburger, @sbitxu, @jannis and the inspiration of the big @jimwebber and the amazing community.
1. N SQL
Pere Urbón-Bayes
Moviepilot Gmbh
@purbon
purbon@purbon.com
dijous 30 de juny de 2011
2. We’re going to talk
about?
Were we are, and where do we come from?
NoSQL. { “motivation” : “use cases” }
Graph databases.
....
dijous 30 de juny de 2011
3. {
"if_you":{
"are_the_master_of": [ "movies", "data analytics", "ruby", "git", "nosql" ],
"love":"recommendation systems",
"would_love_to_know_about":"graph_databases",
"believe_in":"open source"
},
Moviepilot is a
"join_us":"true",
leading provider and
"contact_with":"jobs at moviepilot.com"
discovery service for
}
movies and series
based in Berlin!
dijous 30 de juny de 2011
4. Come and Going
History
1960 Navigational
Databases
1970 Relational Databases.
Edgar Codd Algebra.
1970 ends, SQL DBMS.
SQL, DB2, Ingres,
PostgreSQL, Sybase,
dijous 30 de juny de 2011
7. Where are we now?
Is every
thing related?
Semantic Web
Business Intelligence
Tagging Folksonomies
Social networks Linked Data
RDF
RDMS Blogs
Text Files
1990 2000 2010 2030
dijous 30 de juny de 2011
8. Where are we now?
1980
1990
2000
2010
2020
dijous 30 de juny de 2011
9. How are our apps...?
Data warehousing and Business Intelligence.
Stream processing.
Text search.
Scientific processing.
Semi-(un)-structured data.
dijous 30 de juny de 2011
10. How are our apps...?
Need to scale Performance,
horizontally. Performance,
Performance.
Partition and
replication. Flexibility.
OLTP and OLAP. Big even Huge
datasets.
Web 2.0.
.....?
dijous 30 de juny de 2011
11. N SQL
select fun, profit from real_world where relational=false and barcelona=true;
Carlo Strozzi, 1998.
Eric Evans (Rackspace) and Johan Oskarsson
(last.fm), early 2009.
no:sql(east) 2009, no:sql(eu) 2010.
dijous 30 de juny de 2011
12. N SQL
select fun, profit from real_world where relational=false and barcelona=true;
Ability to scale Access throw
horizontally. different end points.
Replication and Dynamic schema
distribution. environment.
Weaker concurrency Leave more business
model. to the app side.
Smart use of
resources.
dijous 30 de juny de 2011
14. Dismantle
Store
Rebuild
Enjoy Brick
Window
Roof
Unstructured Structured
Unstructured?
dijous 30 de juny de 2011
15. ACID
select fun, profit from real_world where relational=false and barcelona=true;
Atomicity
Helps All operations are executed or none is.
Understand data. Consistency
Persistence guaranteed.
Data is consistent after the transaction.
Hurts Isolation
Horizontal scale.
High Availability. Transactions are independent.
Durability
Changes persist, event if failures.
dijous 30 de juny de 2011
16. “There is a magic bullet!
It's called relaxing the requirements.”
- Evan Weaver, @evan
dijous 30 de juny de 2011
17. CAP
select fun, profit from real_world where relational=false and barcelona=true;
Consistency mysql
Each client has the same
view. C A
Availability redis
P riak
All client can read and
write.
Partition Tolerance
Works well across different Only Two!!!!
network partitions.
dijous 30 de juny de 2011
18. “You have database problem. You
research blog and HN. You start use
NoSQL product. Now you not
know anymore if you have problem.”
- Devops BORAT, @devops_borat
dijous 30 de juny de 2011
19. NoSQL systems.
select fun, profit from real_world where relational=false and barcelona=true;
Most commons Other systems
Column DBs. XML Databases
Document DBs. Grid Databases.
Key-Value DBs. RDF.
Graph DBs. ....
Object DBs.
dijous 30 de juny de 2011
20. Column Databases
select fun, profit from real_world where relational=false and barcelona=true;
Is a DBMS that stores its content by column
rather than by row. This has advantages for
data warehouses.
More efficient with Aggregates and if data is
column oriented.
Suited for OLAP and not much for OLTP.
First implementations, early 1970.
dijous 30 de juny de 2011
21. Apache Cassandra
select fun, profit from real_world where relational=false and barcelona=true;
Designed to handle very large data spread
across multiple commodity servers.
High Availability with no SPOF.
Born at Facebook, to power Inbox Search.
Hybrid system, between column and rows.
Initial Release 2008. Version 0.8.1 28/06/11.
dijous 30 de juny de 2011
22. Key-Value Databases
select fun, profit from real_world where relational=false and barcelona=true;
Allow the use to store key-value pairs, where
the key usually consist of a string, and the
value is a simple primitive.
Suited for use cases where properties and
values are enough, ex: profiles, logs, etc...
Eventually consistent, hierarchy, multivalued,
etc..
First implementations, around 1980.
dijous 30 de juny de 2011
23. Redis.io
select fun, profit from real_world where relational=false and barcelona=true;
Open-source, networked, in-memory,
persistent, journaled, key-value datastore.
Binding for the major languages.
The data structure storage system.
Master-Slave replication. High performance.
Initial Release 2009. Version 2.2.7 11/05/11
dijous 30 de juny de 2011
24. Document Databases
select fun, profit from real_world where relational=false and barcelona=true;
Is a DBMS where the default unit of store is
a document. XML, JSON, YAML, .....
More complex than Key-Value store.
Suited for multi document apps. News, CVs,...
Eventual consistency, limited Atomicity and
Isolation.
One of the first, Lotus Notes, 1989.
dijous 30 de juny de 2011
25. OrientDB
select fun, profit from real_world where relational=false and barcelona=true;
Open source database written in Java.
Schema-[full,less,mix] modes.
Support SQL, ACID compliant, HTTP, Rest and
JSON. Distributed and scalable.
Light and embeddable. Binding most langs.
Initial Release 2010, Version 1.0rc2 17/06/11
dijous 30 de juny de 2011
26. Graph Databases
select fun, profit from real_world where relational=false and barcelona=true;
Is a database that uses graph structures
with nodes, edges, and properties.
Suited for associative datasets, map object
oriented app structure. Avoid expensive joins.
Are powerful for graph-like operations, like
shortest path, communities, etc.
First implementations around 2007.
dijous 30 de juny de 2011
28. What is a graph?
Graph G(V,E) where V = {v1,v2,...,vN) and E =
{E1,E2,...,EN)
Directed / Undirected
Mixed
Multigraph
Weighted
dijous 30 de juny de 2011
30. Graph Databases
The Property Graph
Abstractions
Nodes and Relationships.
Properties on both.
John smith liked http://www.example.com at 01/10/11
dijous 30 de juny de 2011
31. Graph Databases
Applications
Task planning Dependency analysis
Scheduling Impact analysis
Process assignation Network flow
Routing Traffic analysis and
optimization
Logistics
Delivery
League planning optimization
Pattern Recognition Optimization of tasks
dijous 30 de juny de 2011
32. Graph Databases
Applications
Recommendations Walks
Heuristics Search
(PageRank) algorithms
Shooting stars
Local
K-nearest
Shortest Paths
neighbors
Hammock
Functions
dijous 30 de juny de 2011
33. Graph Databases
Applications
Semantic web Link analysis
RDF (OWL) Store Structure mining
RDF-Sail
SPARQL
Linked data (Open
Data)
dijous 30 de juny de 2011
34. Graph Databases
Vendors
Neo4J: Open source Sones: SaaS dot Net
database NoSQL Graph database.
graph.
OrientDB: The
HyperGraphDB: An IA Document-GraphDB.
and semantic web
graph database. FlockDB: The twitter
graphdb.
Infogrid: The
Internet Graph Pregel: Graph
database. Processing at Google.
dijous 30 de juny de 2011