SlideShare une entreprise Scribd logo
1  sur  68
Using MongoDB as a Graph Database 
Chris Clarke 
NoSQL Birmingham 
16th October 2014
Graphs 101 
For the uninitiated
John knows Jane
John knows Jane 
Jane knows John 
John knows Jane
John knows Jane
John knows Jane 
Jane ? John 
John knows Jane
John knows Jane 
Jane knows John 
knows 
John Jane 
knows
RDF
Entity Property Value 
John knows Jane
Subject Predicate Object 
John knows Jane
Subject Predicate Object 
John knows Jane 
Jane knows John
Subject Predicate Object 
http://example.com/John foaf:knows http://example.com/Jane 
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
Subject Predicate Object 
http://example.com/John 
http://example.com/John 
foaf:knows http://example.com/Jane 
foaf:name “John” 
http://example.com/John rdf:type foaf:Person 
http://example.com/Jane foaf:name “Jane” 
http://example.com/Jane rdf:type foaf:Person 
http://example.com/Jane foaf:knows http://example.com/John 
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
foaf:Person 
rdf:type rdf:type 
foaf:knows 
example:John example:Jane 
foaf:knows 
foaf:name foaf:name 
“John” “Jane”
“WTF! Surely this is easier in JSON!” 
– Jack Fullstack
> db.people.find() 
{ 
_id: ObjectID(‘123’), 
name: ‘John’ 
knows: [ObjectID(‘456’)] 
}, 
{ 
_id: ObjectID(‘456’), 
name: ‘Jane’ 
knows: [ObjectID(‘123’)] 
}
foaf: 
Pers 
on
Dataset A Dataset B 
example:John 
foaf:name 
“John” 
example:John 
foaf:age 
24
Dataset 
A+B 
example:John 
foaf:name foaf:age 
“John” 24
SPARQL 
An RDF Query Language
PREFIX foaf: 
<http://xmlns.com/foaf/0.1/> 
SELECT ?name ?email 
WHERE { 
?person a foaf:Person. 
?person foaf:name ?name. 
?person foaf:mbox ?email. 
} 
ORDER BY ?name 
LIMIT 50
CONSTRUCT 
DESCRIBE 
SELECT 
ASK 
Graph 
Graph 
Tabular 
Boolean
Graphs and Talis 
A bit of history
Over time… 
• Our apps become popular. Last week, average 4M 
requests per day and at peak times 600k+ per hour 
• Our dataset is growing in size - about 350M triples 
this week 
• Our apps needed more queries and more expensive 
queries 
• Our in-house triple store was EoL and out of date
Project Tripod 
http://github.com/talis/tripod-php 
http://github.com/talis/tripod-node
System characteristics 
• 99:1 read:write 
• Well shared, tenant based system. Our largest 
single customer has 35M triples 
• Graph data structures and operations (merges, sub-graphs 
etc.) well entrenched in the codebase, over 
2M lines code (inc. libraries) 
• Actually not that many distinct query shapes
Simple Queries, and how they 
influenced our core data 
model
DESCRIBE <http://example.com/John> 
Give me all the triples about John as a graph 
SELECT ?name ?age 
WHERE { 
<http://example.com/John> <foaf:name> ?name . 
<http://example.com/John> <foaf:age> ?age . 
} 
Give me properties name, age of John as tabular data
Subject Predicate 
Object 
http://example.com/John 
http://example.com/John 
foaf:knows http://example.com/Jane 
foaf:name “John” 
http://example.com/John rdf:type foaf:Person 
http://example.com/Jane foaf:name “Jane” 
http://example.com/Jane rdf:type foaf:Person 
http://example.com/Jane foaf:knows http://example.com/John 
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Concise Bound Description of http://example.com/John 
http://example.com/John 
http://example.com/John 
foaf:knows http://example.com/Jane 
foaf:name “John” 
http://example.com/John rdf:type foaf:Person 
http://example.com/Jane foaf:name “Jane” 
http://example.com/Jane rdf:type foaf:Person 
http://example.com/Jane foaf:knows http://example.com/John 
Concise Bound Description of http://example.com/Jane
Concise Bound Description of http://example.com/John 
http://example.com/John 
http://example.com/John 
foaf:knows http://example.com/Jane 
foaf:name “John” 
http://example.com/John rdf:type foaf:Person 
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
}
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
}
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
} 
_id is the unique primary key. There can only be one John
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
} 
l means value is a 
literal text value 
_id is the unique primary key. There can only be one John
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
} 
u means value is a 
uri, or another node. 
l means value is a 
literal text value 
_id is the unique primary key. There can only be one John
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
} 
DESCRIBE <http://example.com/John> 
SELECT ?name ?age 
WHERE { 
<http://example.com/John> <foaf:name> ?name . 
<http://example.com/John> <foaf:age> ?age . 
}
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
} 
DESCRIBE <http://example.com/John> 
mongo$ col.findOne({_id:”example:John”}); 
SELECT ?name ?age 
WHERE { 
<http://example.com/John> <foaf:name> ?name . 
<http://example.com/John> <foaf:age> ?age . 
} 
mongo$ col.findOne({_id:”example:John”},{“foaf:name.l”:1,”foaf:age.l”:1});
{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, 
{ s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, 
{ s: “example:John, p: “foaf:name” o: { l: “John” } },
{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, 
{ s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, 
{ s: “example:John, p: “foaf:name” o: { l: “John” } }, 
DESCRIBE <http://example.com/John> 
mongo$ var s = col.find({s:”example:John”}); 
mongo$ while (s.hasNext()) { 
addToGraph(s.next()) 
} 
SELECT ?name ?age 
WHERE { 
<http://example.com/John> <foaf:name> ?name . 
<http://example.com/John> <foaf:age> ?age . 
} 
mongo$ col.find({s:”example:John”, p: “foaf:name”}},{“o”:1}); 
mongo$ col.find({s:”example:John”, p: “age”}},{“o”:1});
{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, 
{ s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, 
{ s: “example:John, p: “foaf:name” o: { l: “John” } }, 
DESCRIBE ?person WHERE { ?person <foaf:name> “John” . } 
mongo$ var s = col.find({p:”foaf:name”, o:”John”}); // BasicCursor = 
slow 
{ 
_id: “example:John”, 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
} 
DESCRIBE ?person WHERE { ?person <foaf:name> “John” . } 
mongo$ col.ensureIndex({“foaf:name.u”:1}); 
mongo$ var s = col.find({“foaf:name.u”:”John”}); // BTreeCursor = fast
Complex Queries
DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ? 
authorList ?author ?usedBy ?creator ?libraryNote ?publisher 
WHERE 
{ 
OPTIONAL 
{ 
<http://example.com/foo> resource:contains ?sectionOrItem . 
OPTIONAL 
{ 
?sectionOrItem resource:resource ?resource . 
OPTIONAL { ?resource dcterms:isPartOf ?document . } 
OPTIONAL 
{ 
?resource bibo:authorList ?authorList . 
OPTIONAL { ?authorList ?p ?author . } 
} 
OPTIONAL { ?resource dcterms:publisher ?publisher . } 
} 
OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } 
} . 
OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . 
OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator } 
}
DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ? 
authorList ?author ?usedBy ?creator ?libraryNote ?publisher 
WHERE 
{ 
OPTIONAL 
{ 
<http://example.com/foo> resource:contains ?sectionOrItem . 
OPTIONAL 
{ 
?sectionOrItem resource:resource ?resource . 
OPTIONAL { ?resource dcterms:isPartOf ?document . } 
OPTIONAL 
{ 
?resource bibo:authorList ?authorList . 
OPTIONAL { ?authorList ?p ?author . } 
} 
OPTIONAL { ?resource dcterms:publisher ?publisher . } 
} 
OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } 
} . 
OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . 
OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator } 
}
“We don’t need dynamic queries” 
– Project Tripod Team, sometime 2012
Precomputed views 
Remember those from the RDBMS?
{ 
_id: { “example:John” 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
} 
{ 
_id: “example:Jane”, 
“foaf:knows”: { u: “example:John” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “Jane” } 
} 
DESCRIBE example:John ?knownPerson 
WHERE { example:John foaf:knows ?knownPerson . } 
mongo$ var john = col.findOne({_id:”example:John”}); 
for (var i=0; i < john[“foaf:knows”].length; i++) { 
var knownPerson = col.findOne({“_id: john[“foaf:knows”][i]}); 
}
System characteristics 
• 99:1 read:write 
• Well shared, tenant based system. Our largest 
single customer has 35M triples 
• Graph data structures and operations (merges, sub-graphs 
etc.) well entrenched in the codebase, over 
2M lines code (inc. libraries). 
• Actually not that many distinct query shapes.
{ 
_id : { r: “example:John, t: “v_knows”}, 
graphs: [{ 
_id: { “example:John” 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
}, 
{ 
_id: “example:Jane”, 
“foaf:knows”: { u: “example:John” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “Jane” } 
}] 
} 
DESCRIBE example:John ?knownPerson 
WHERE { example:John foaf:knows ?knownPerson . } 
mongo$ viewsCol.findOne({_id: {r:”example:John”,t:”v_knows”}})
{ 
_id : { r: “example:John, t: “v_knows”}, 
graphs: [{ 
_id: { “example:John” 
“foaf:knows”: { u: “example:Jane” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “John” } 
}, 
{ 
_id: “example:Jane”, 
“foaf:knows”: { u: “example:John” }, 
“rdf:type”: { u: “foaf:Person” }, 
“foaf:name”: { l: “Jane” } 
}] 
_impactIndex : [“example:Jane”,”example:John”] 
}
View specification 
{ 
"_id":"v_knows", 
"type":["foaf:Person"], 
"from":"CBD_people", 
"joins":{ 
“foaf:knows":{} 
} 
}
More complex example 
{ 
"_id":"v_resources", 
"type":["resourcelist:Resource"], 
"from":"CBD_resources", 
"joins":{ 
"dct:partOf":{ 
"joins": { 
"bibo:authorList":{ 
"joins" : { 
"followSequence":{ 
"maxJoins":50 
} 
} 
}, 
"bibo:editorList":{ 
"joins" : { 
"followSequence":{ 
"maxJoins":50 
} 
} 
}, 
"dct:publisher":{} 
} 
}, 
"dct:isPartOf":{ 
"joins": { 
"bibo:authorList":{ 
"joins" : { 
"followSequence":{ 
"maxJoins":50 
} 
} 
}, 
"bibo:editorList":{ 
"joins" : { 
"followSequence":{ 
"maxJoins":50 
} 
} 
}, 
"dct:publisher":{} 
} 
}, 
"bibo:authorList":{ 
"joins" : { 
"followSequence":{ 
"maxJoins":50 
} 
} 
}, 
"bibo:editorList":{ 
"joins" : { 
"followSequence":{ 
"maxJoins":50 
} 
} 
}, 
"dct:publisher":{} 
} 
}
What about tabular data? 
• We also have tables and table specs 
• Conceptually the same as views 
• Instead of an array of graphs we have computed 
columns for complex tabular queries 
• You can page, limit, offset results just like you’d 
expect
{ 
"_id" : { 
"r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4-41DB 
"type" : "t_user_resources" 
}, 
"value" : { 
"_impactIndex" : [ 
{ 
"r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4 
"c" : "tenantContexts:DefaultGraph" 
}, 
{ 
"r" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE", 
"c" : "tenantContexts:DefaultGraph" 
} 
], 
"collection" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks", 
"createdDate" : "2011-02-08T15:59:45+00:00", 
"resourceUri" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE", 
"note" : "ELECTRONIC", 
"title" : "Feminism & psychology", 
"type" : [ 
"resourcelist:Resource", 
"bibo:Journal" 
] 
} 
}
Database layout 
talis-rs:PRIMARY> show collections 
CBD_config 
CBD_draft 
CBD_events 
CBD_jobs 
CBD_lists 
CBD_nodes 
CBD_resources 
CBD_reviews 
CBD_service 
CBD_user_lists 
CBD_user_resources 
CBD_users 
table_rows 
views 
r/w 
} read only
Fast and slow saves, 
you decide.
Tripod save() 
• Based on change sets, you supply the old and new 
graphs 
• CBDs updated immediately. Write ahead transaction 
log for multi-CBD writes 
• Choice per save on whether to update views/tables 
sync or async (eventually consistent) 
• Async adds jobs to a Mongo based queue
Measure everything
Query volume 
complex vs. simple
Query volume 
graph vs. tabular
Query speed 
complex vs. simple graph query
Hardware 
• Real tin, 2x Dell low-end rack mount servers 
• 96Gb RAM, 24 cores 
• RAID-10 disks, non-SSD 
• Keep ‘em on the same LAN as your app servers 
• About the same to lease per month than a couple of 
c3.4xlarge (30Gb, 32vCPU) 
• We’re about to add similar second cluster, 144Gb
Why Mongo? 
RTFM, not HN comment feeds. 
But seriously it could have been n other document DBs
There’s lots more 
Search, named graphs (quads), data functions
Future roadmap 
• Multi-cluster <- IN PROGRESS 
• NodeJS port <- IN PROGRESS 
• Choose better solution for tlog, probably PostgreSQL 
• Background queue -> redis and resque 
• Chainable API 
• Spout of updates for Apache Storm 
• Versioned views/tables config
Aperture 
Annotate your models to persist to graph
Aperture 
Annotate your models to persist to graph
tripod-php code… 
…same in aperture
@talis 
facebook.com/talisgroup 
+44 (0) 121 374 2740 
talis.com 
info@talis.com 
48 Frederick Street 
Birmingham 
B1 3HN

Contenu connexe

Plus de Chris Clarke

Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationChris Clarke
 
A Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data PrinciplesA Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data PrinciplesChris Clarke
 
Aspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th MayAspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th MayChris Clarke
 
Aspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th MayAspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th MayChris Clarke
 
Bringing eContent to Life
Bringing eContent to LifeBringing eContent to Life
Bringing eContent to LifeChris Clarke
 
Xiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of dataXiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of dataChris Clarke
 

Plus de Chris Clarke (6)

Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource Recommendation
 
A Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data PrinciplesA Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data Principles
 
Aspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th MayAspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th May
 
Aspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th MayAspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th May
 
Bringing eContent to Life
Bringing eContent to LifeBringing eContent to Life
Bringing eContent to Life
 
Xiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of dataXiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of data
 

Dernier

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Using MongoDB as a graph database - 2014 redux

  • 1. Using MongoDB as a Graph Database Chris Clarke NoSQL Birmingham 16th October 2014
  • 2. Graphs 101 For the uninitiated
  • 4. John knows Jane Jane knows John John knows Jane
  • 6. John knows Jane Jane ? John John knows Jane
  • 7. John knows Jane Jane knows John knows John Jane knows
  • 8. RDF
  • 9. Entity Property Value John knows Jane
  • 10. Subject Predicate Object John knows Jane
  • 11. Subject Predicate Object John knows Jane Jane knows John
  • 12. Subject Predicate Object http://example.com/John foaf:knows http://example.com/Jane PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  • 13. Subject Predicate Object http://example.com/John http://example.com/John foaf:knows http://example.com/Jane foaf:name “John” http://example.com/John rdf:type foaf:Person http://example.com/Jane foaf:name “Jane” http://example.com/Jane rdf:type foaf:Person http://example.com/Jane foaf:knows http://example.com/John PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  • 14. foaf:Person rdf:type rdf:type foaf:knows example:John example:Jane foaf:knows foaf:name foaf:name “John” “Jane”
  • 15. “WTF! Surely this is easier in JSON!” – Jack Fullstack
  • 16. > db.people.find() { _id: ObjectID(‘123’), name: ‘John’ knows: [ObjectID(‘456’)] }, { _id: ObjectID(‘456’), name: ‘Jane’ knows: [ObjectID(‘123’)] }
  • 18. Dataset A Dataset B example:John foaf:name “John” example:John foaf:age 24
  • 19. Dataset A+B example:John foaf:name foaf:age “John” 24
  • 20. SPARQL An RDF Query Language
  • 21. PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } ORDER BY ?name LIMIT 50
  • 22. CONSTRUCT DESCRIBE SELECT ASK Graph Graph Tabular Boolean
  • 23. Graphs and Talis A bit of history
  • 24. Over time… • Our apps become popular. Last week, average 4M requests per day and at peak times 600k+ per hour • Our dataset is growing in size - about 350M triples this week • Our apps needed more queries and more expensive queries • Our in-house triple store was EoL and out of date
  • 25. Project Tripod http://github.com/talis/tripod-php http://github.com/talis/tripod-node
  • 26. System characteristics • 99:1 read:write • Well shared, tenant based system. Our largest single customer has 35M triples • Graph data structures and operations (merges, sub-graphs etc.) well entrenched in the codebase, over 2M lines code (inc. libraries) • Actually not that many distinct query shapes
  • 27. Simple Queries, and how they influenced our core data model
  • 28. DESCRIBE <http://example.com/John> Give me all the triples about John as a graph SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age . } Give me properties name, age of John as tabular data
  • 29. Subject Predicate Object http://example.com/John http://example.com/John foaf:knows http://example.com/Jane foaf:name “John” http://example.com/John rdf:type foaf:Person http://example.com/Jane foaf:name “Jane” http://example.com/Jane rdf:type foaf:Person http://example.com/Jane foaf:knows http://example.com/John PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  • 30. Concise Bound Description of http://example.com/John http://example.com/John http://example.com/John foaf:knows http://example.com/Jane foaf:name “John” http://example.com/John rdf:type foaf:Person http://example.com/Jane foaf:name “Jane” http://example.com/Jane rdf:type foaf:Person http://example.com/Jane foaf:knows http://example.com/John Concise Bound Description of http://example.com/Jane
  • 31. Concise Bound Description of http://example.com/John http://example.com/John http://example.com/John foaf:knows http://example.com/Jane foaf:name “John” http://example.com/John rdf:type foaf:Person { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }
  • 32. { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }
  • 33. { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } } _id is the unique primary key. There can only be one John
  • 34. { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } } l means value is a literal text value _id is the unique primary key. There can only be one John
  • 35. { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } } u means value is a uri, or another node. l means value is a literal text value _id is the unique primary key. There can only be one John
  • 36. { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } } DESCRIBE <http://example.com/John> SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age . }
  • 37. { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } } DESCRIBE <http://example.com/John> mongo$ col.findOne({_id:”example:John”}); SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age . } mongo$ col.findOne({_id:”example:John”},{“foaf:name.l”:1,”foaf:age.l”:1});
  • 38. { s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } },
  • 39. { s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } }, DESCRIBE <http://example.com/John> mongo$ var s = col.find({s:”example:John”}); mongo$ while (s.hasNext()) { addToGraph(s.next()) } SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age . } mongo$ col.find({s:”example:John”, p: “foaf:name”}},{“o”:1}); mongo$ col.find({s:”example:John”, p: “age”}},{“o”:1});
  • 40. { s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } }, DESCRIBE ?person WHERE { ?person <foaf:name> “John” . } mongo$ var s = col.find({p:”foaf:name”, o:”John”}); // BasicCursor = slow { _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } } DESCRIBE ?person WHERE { ?person <foaf:name> “John” . } mongo$ col.ensureIndex({“foaf:name.u”:1}); mongo$ var s = col.find({“foaf:name.u”:”John”}); // BTreeCursor = fast
  • 42. DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ? authorList ?author ?usedBy ?creator ?libraryNote ?publisher WHERE { OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator } }
  • 43. DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ? authorList ?author ?usedBy ?creator ?libraryNote ?publisher WHERE { OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator } }
  • 44. “We don’t need dynamic queries” – Project Tripod Team, sometime 2012
  • 45. Precomputed views Remember those from the RDBMS?
  • 46. { _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } } { _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” } } DESCRIBE example:John ?knownPerson WHERE { example:John foaf:knows ?knownPerson . } mongo$ var john = col.findOne({_id:”example:John”}); for (var i=0; i < john[“foaf:knows”].length; i++) { var knownPerson = col.findOne({“_id: john[“foaf:knows”][i]}); }
  • 47. System characteristics • 99:1 read:write • Well shared, tenant based system. Our largest single customer has 35M triples • Graph data structures and operations (merges, sub-graphs etc.) well entrenched in the codebase, over 2M lines code (inc. libraries). • Actually not that many distinct query shapes.
  • 48. { _id : { r: “example:John, t: “v_knows”}, graphs: [{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }, { _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” } }] } DESCRIBE example:John ?knownPerson WHERE { example:John foaf:knows ?knownPerson . } mongo$ viewsCol.findOne({_id: {r:”example:John”,t:”v_knows”}})
  • 49. { _id : { r: “example:John, t: “v_knows”}, graphs: [{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }, { _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” } }] _impactIndex : [“example:Jane”,”example:John”] }
  • 50. View specification { "_id":"v_knows", "type":["foaf:Person"], "from":"CBD_people", "joins":{ “foaf:knows":{} } }
  • 51. More complex example { "_id":"v_resources", "type":["resourcelist:Resource"], "from":"CBD_resources", "joins":{ "dct:partOf":{ "joins": { "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } }, "dct:isPartOf":{ "joins": { "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } }, "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } }
  • 52. What about tabular data? • We also have tables and table specs • Conceptually the same as views • Instead of an array of graphs we have computed columns for complex tabular queries • You can page, limit, offset results just like you’d expect
  • 53. { "_id" : { "r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4-41DB "type" : "t_user_resources" }, "value" : { "_impactIndex" : [ { "r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4 "c" : "tenantContexts:DefaultGraph" }, { "r" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE", "c" : "tenantContexts:DefaultGraph" } ], "collection" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks", "createdDate" : "2011-02-08T15:59:45+00:00", "resourceUri" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE", "note" : "ELECTRONIC", "title" : "Feminism & psychology", "type" : [ "resourcelist:Resource", "bibo:Journal" ] } }
  • 54. Database layout talis-rs:PRIMARY> show collections CBD_config CBD_draft CBD_events CBD_jobs CBD_lists CBD_nodes CBD_resources CBD_reviews CBD_service CBD_user_lists CBD_user_resources CBD_users table_rows views r/w } read only
  • 55. Fast and slow saves, you decide.
  • 56. Tripod save() • Based on change sets, you supply the old and new graphs • CBDs updated immediately. Write ahead transaction log for multi-CBD writes • Choice per save on whether to update views/tables sync or async (eventually consistent) • Async adds jobs to a Mongo based queue
  • 58. Query volume complex vs. simple
  • 59. Query volume graph vs. tabular
  • 60. Query speed complex vs. simple graph query
  • 61. Hardware • Real tin, 2x Dell low-end rack mount servers • 96Gb RAM, 24 cores • RAID-10 disks, non-SSD • Keep ‘em on the same LAN as your app servers • About the same to lease per month than a couple of c3.4xlarge (30Gb, 32vCPU) • We’re about to add similar second cluster, 144Gb
  • 62. Why Mongo? RTFM, not HN comment feeds. But seriously it could have been n other document DBs
  • 63. There’s lots more Search, named graphs (quads), data functions
  • 64. Future roadmap • Multi-cluster <- IN PROGRESS • NodeJS port <- IN PROGRESS • Choose better solution for tlog, probably PostgreSQL • Background queue -> redis and resque • Chainable API • Spout of updates for Apache Storm • Versioned views/tables config
  • 65. Aperture Annotate your models to persist to graph
  • 66. Aperture Annotate your models to persist to graph
  • 68. @talis facebook.com/talisgroup +44 (0) 121 374 2740 talis.com info@talis.com 48 Frederick Street Birmingham B1 3HN

Notes de l'éditeur

  1. Not just mongodb Specific to our circumstances YMMV
  2. The theory part - remember I’m not a data scientist ;-)
  3. Ball and stick diagrams The balls are nodes and the sticks are named relationships between the nodes This is an undirected graph
  4. Ball and stick diagrams The balls are nodes and the sticks are named relationships between the nodes This is an undirected graph
  5. This is a directed graph directional relationship
  6. Doesn’t tell us Jane knows John
  7. A toolset to work with graph data. Directed graphs
  8. Values can be other Entities
  9. This is a triple
  10. The same node can be a subject or an object.
  11. In RDF subjects and properties are actually URIs that can be dereferenced Here the predicate is part of a public vocabulary called FOAF Billions of triples out there on the public internets defined using FOAF Namespacing - makes URIs shorter
  12. In RDF subjects and properties are actually URIs that can be dereferenced Here the predicate is part of a vocabulary called FOAF Billions of triples out there on the public internets defined using FOAF
  13. Here it is in ball and stick
  14. Yes, you can! Data schema only makes sense to you Not graph data Complex graphs quickly end in renormalisation hell, or many, many follow your nose queries
  15. Real data graphs quickly get complicated
  16. Really easy to merge datasets from different sources that talk ABOUT THE SAME THING Global identifiers via URIs
  17. Really easy to merge datasets from different sources that talk ABOUT THE SAME THING Global identifiers via URIs
  18. W3 standard
  19. SQL-like, to an extent. WHERE is Pattern matching, essentially joins UNIONS, Geo extensions, etc.
  20. 4 main query types
  21. We started working on our first application in 2008 Talis was 3 companies back then. One built a general purpose graph store, part of technical strategy to build on it RDF based, integrates other data sources from around the web
  22. We did caching for performance. Complicated! Data size outgrew our existing general purpose technology stack, became hard to operate Complex SPARQL queries expensive on large data sets In 2008 even low hundred ms from the DB was acceptable (with caching). Today we do 20 queries a page and expect single digit or better performance. Our graph store end of lifed
  23. 2012 - project to replace generalised triple store with something more specific to our app FIND A NEW POD FOR OUR TRIPLES It’s a library. Currently implemented in php and parts ported to node. Sorry, our apps are php.
  24. We didn’t consider moving from the graph. You can’t just refactor the whole codebase to relational and flip a switch overnight and expect it to work. This was a moving target.
  25. Lots of very simple data These can be satisfied very easily and cheaply if you group all the immediate properties of a subject together “Concise Bound Description”
  26. Earlier example
  27. Graph theory concept: CBD
  28. Our data model: One document per CBD
  29. In more detail
  30. _id indexed by default
  31. Mega fast queries with single docs returned, no cursors. Micro secs on decent hardware.
  32. Mega fast queries with single docs returned, no cursors. Micro secs on decent hardware.
  33. Contrast that to most triple stores, they traditionally model the triple. Cayley being one of them
  34. Makes queries expensive. Have to deal with cursors with 1..n documents. Have to pluck values via multiple or complex queries.
  35. Gets worse when you want to find matches by value
  36. JOINS
  37. Typical complex query 9 “joins” Document databases don’t generally like joins. map reduce?!
  38. Only thing that changes in this query is the URI 9 joins in this query = expensive In the whole system we probably only had 20 queries that required joins
  39. A revelation from the data gods! Flexibility of SPARQL great for the developer but simply put hard to scale 1000’s of hours optimisation over relation DB’s query engines over decades This is why in the old design we hid everything behind a cache
  40. Pre compute all possible answers to the query data storage cheap
  41. Without pre-computed views This is just a single join. Very messy. What if John knows 50 people? n+1 queries.
  42. We discard and re-generate views at write time There’s only about 20 of them in our whole app.
  43. Pre-computed typed views 1 query, ultra fast
  44. When we do a save we do a lookup to see which views might be impacted
  45. In our config Simple config lang with a few keywords This means we have to specify queries up front, not send them at run time
  46. Most complex in our system. 11 joins!
  47. This is a table row from our system Instead of graphs key/value pairs Note you can have multi-value cells (type). This was a limitation of SPARQL select for us.
  48. CBD collections are read-write for the developer table_rows, views read only, tripod driver manages regeneration In our system: 50M distinct CBDs 34M distinct views 23M distinct table rows roughly 800Mb per MT inc indexes and views etc.
  49. This brings us nicely onto saves. Trade speed with eventual consistency.
  50. Mongo doesn’t have transactions. TLog is a separate mongo cluster used to control transactions + rollback. Also allows us to update a nightly backup to the last applied transaction in the case of total data loss. TLog is in Mongo but a poor choice. Moving to Postgres. Async faster, but not consistent. Depends on situation Queue implemented in Mongo, moving to redis + resque (probably)
  51. Tripod has built in ability to collect stats. We use statsD+graphite
  52. Lot less tabular queries
  53. Scale on left is ms. This includes database, network to web server and the time marshalling into php objects. This is where the extra time is spent for views!
  54. Cost wise cloud just didn’t stack up for us, esp in 2012. Tin vs. Cloud like for like is more like 2x today, 8x cost in 2012 RAM is king here We’re adding a second cluster with 144Gb shortly PaaS is prohibitively expensive at scale. We had a support contract on the first cluster but on the second we’re going it alone.
  55. Don’t mention write preference or I will shoot you in the head. Looking for a document database not a graph database Evaluated Couch, Riak and Postgres - CouchBase was a new product, just merged with Memcache. Felt risky although map/reduce queries fitted well with views/tables - Riak. Features we liked were commercially licences - Postgres - JSON datatype was primitive at the time, worth a second look today tho ServerDensity David Swiss Army Knife NoSQL Community - commercial - & developer Friendly API Ultimately not bound to it - swapping out parts as we scale There’s a lot of shit written about Mongo. Don’t read HN. Instead RTFM.
  56. But not enough time
  57. Mongo doesn’t have transactions. TLog is a separate mongo cluster used to control transactions + rollback. Also allows us to update a nightly backup to the last applied transaction in the case of total data loss. TLog is in Mongo but a poor choice. Moving to Postgres. Async faster, but not consistent. Depends on situation Queue implemented in Mongo, moving to redis + resque (probably) FINALLY before we go, a sneak peak of a project we’ve been working on to hide the graph entirely…
  58. Our app already worked natively with graphs But the model in most apps is not a graph Aperture is our new project built on top of tripod allowing you to hide the complexity of graphs Plain old php object
  59. Simply annotating it will