The future is here and the future are Graph Databases! Have a lot of interconnected data dat you want to extract value and meaning from it? Having too many joins that are running too slow? Do you want to do Real-Time Recommendations? Read this!
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Still using MySQL? Maybe you should reconsider.
1. Still using MySQL?
Maybe you should reconsider
Radu-Sebastian Amarie
Co-Founder @ Softbinator
Head of Engineering @ Findie.me
radu@softbinator.ro
#85
4. “Every 2 days we create as much information
as we did up to 2003”
– Eric Schmidt, Google
5. Data is more connected.
• Text (content)
• HyperText (added pointers)
• RSS (joined those pointers)
• Blogs (added pingbacks)
• Tagging (grouped related data)
• RDF (described connected data)
• GGG (content + pointers + relationships +
descriptions)
6. Data is much more
connected.
< Email address
similarity between
users from a
Subscriber list on
Mailchimp
You can read more here:
http://blog.mailchimp.com/digging-deeper-
into-wavelength-and-egp-data-finding-
interest-clusters-in-mailchimps-network/
7. Data is more Semi-Structured:
Think IMDb
How would you model the data of all the Movies ever
made?
8. Movies / Details (Title / Description / Storyline) / Cast
(and roles and names and relationship to other
characters) / Crew (positions: Producers / Director /
Director of Photography and 113 other roles) / Plot
Keywords / Taglines / Genres / Motion Picture Ratings
/ Sites / Countries / Countries Filmed In / Languages /
Dates / Budgets / Companies / Credits / Technical
Specs / Trivia / Goofs / Quotes / Reviews / Message
Boards / Ratings / Links to other ratings like
Metascore from MetaCritic / And all the relationships
between all the individual data.
14. How do we represent this data?
Relational Database
15. Graph
DatabaseRelational Database
GOOD FOR:
Well-understood data structures that doesn’t
change too frequently
Known problems involving discrete parts of
the data, or minimal connectivity
GOOD FOR:
Dynamic systems where data topology is difficult to
predict
Dynamic requirements that evolve with the business
Problems where data relationships contribute
meaning & value
How do we represent this data?
23. What can a GraphDB contain?
NODES:
• The objects in the graph
• Can have key-value properties
• Can be labeled
RELATIONSHIPS:
• Relate Node by type and
direction
• Can have key-value properties
24. How do you query a graph?
By finding patterns.
33. MATCH (u:User {id: 1})-[:HAS_SKILL]->(s:Skill) RETURN s
SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id
WHERE users.id = 1
34. Speed!!
“We found Neo4j to be literally thousands of times faster
than our prior MySQL solution, with queries that require
10 - 100 times less code. Today, Neo4j provides eBay with
functionality that was previously impossible.”
- Volker Pacher, Senior Developer
“Minutes to milliseconds” performance
Queries up to 1000x faster than RDBMS or other NoSQL
35. TheSameQueryusing
Cypher
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = “John Doe”
RETURN sub.name AS Subordinate,
count(report) AS Total
Project Impact
Less time writing queries
• More time understanding the answers
• Leaving time to ask the next question
Less time debugging queries:
• More time writing the next piece of code
• Improved quality of overall code base
Code that’s easier to read:
• Faster ramp-up for new project members
• Improved maintainability &
troubleshooting
46. Awesome community support
& Drivers:
.NET / Java / Spring / JavaScript / Python / Ruby / PHP / R / Go / C/C++
47. Recap
Neo4j is Great.
1. When you have a large social-driven project in which your data topology
is difficult to predict.
2. You data is very interconnected and you need that to get extra meaning
& value.
3. Your application evolves rapidly
4. You want to be fast and write queries easily (Cypher became openCypher
in partnership with Oracle and Spark)
5. You want to be able to get recommendations directly from the Database.
50. Thanks to…
(for inspiration)
• Michael Hunger with http://www.slideshare.net/jexp/geekout-publish
• William Lyon with http://www.slideshare.net/neo4j/intro-to-neo4j-and-graph-
databases
• William Lyon again with http://www.slideshare.net/neo4j/introducing-neo4j-30
• Max de Marzi with http://www.slideshare.net/maxdemarzi/introduction-to-
graph-databases-12735789
Notes de l'éditeur
Or at least the amount of data. lol
Giant Global Graph is a name coined by the inventor of the World Wide Web, Tim Berners-Lee in 2007
Movies
Cast, Crew, their categories, their relationships
Categories, subcategorize, taxonomies.
Graph Databases. Why is this trending? It’s clearly because it makes life easier to interconnected data, especially to query it.
Before January 2014 they launched Cypher in Neo4j 2, than the graph world had a boost.
Key Value Stores (Things like Redis / Rockdb / etc)
Simple data model & Scalable but You need to create your own "foreign keys" and it sucks for complex data.
Wide Column Family (Things like Cassandra / HBase / HyperTable)
Great semi-structure data, naturally indexed and scalable butpoor for interconnected data.
Document Data (MongoDb / CouchDB / etc)
A collection of documents, documents are key value collections, index-centric, lots of map-reduce
Simple, powerful data model and scalable but poor for interconnected data, query model limited to keys and indexes, map reduce for larger queries.
Relational (Why do they call it relational. LOL)
Great for structured data that does’t change. Scalable.
Sucks for connected complex data.
We can give you access here, after the presentation