SlideShare une entreprise Scribd logo
1  sur  112
Storing
   the
 Family
Tree with
We’re going to talk about
MongoDB Intro & Fundamentals
MongoDB for Genealogy data
Scaling MongoDB for all the generations
The Family Tree
Storing a graph in MongoDB
Steve                  @sp

                     A
                      15+ years building
                      the internet
                         Father, husband,
                         skateboarder,
                         genealogist at ❤


Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic

Well Funded: Sequoia, Union Square, Flybridge
Introduction
     to
MongoD
A bit of
history
1974
The relational database is created
1979
1979   1994
1979   1994   1995
Computers in 1995
100 mhz Pentium
10 base T
16 MB ram
200 MB HD
Cloud in 1995
(Windows 95 cloud wallpaper)
Cell Phones in 2012
Dual core 1.5Ghz
802.11n (300+ Mbps)
1 GB ram
64 GB Solid State
MongoDB
         Application     Document
                         Oriented
    High                 { author : “steve”,
                           date : new Date(),

Performance
                           text : “About MongoDB...”,
                           tags : [“tech”, “database”]}




                           Fully
                         Consistent
 Horizontally Scalable
MongoDB philosophy
 Keep functionality when we can (key/value
 stores are great, but we need more)
 Non-relational (no joins) makes scaling
 horizontally practical
 Document data models are good
 Database technology should run anywhere
 virtualized, cloud, metal, etc
Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
Database Landscape
Scalability & Performance


                            MemCache

                                             MongoDB



                                                  RDBMS



                               Depth of Functionality
“
MongoDB has the best
features of key/value
stores, document
databases and relational
databases in one.
         John Nunemaker
Relational made normalized
     data look like this
                      Category
                  • Name
                  • Url




                           Article
       User       • Name
                                              Tag
• Name            • Slug             • Name
• Email Address   • Publish date     • Url
                  • Text




                     Comment
                  • Comment
                  • Date
                  • Author
Document databases make
normalized data look like this
                            Article
                     • Name
                     • Slug
                     • Publish date
        User         • Text
   • Name            • Author
   • Email Address
                         Comment[]
                      • Comment
                      • Date
                      • Author

                            Tag[]
                      • Value

                         Category[]
                      • Value
But we’ve been using
a relational database
    for 40 years!
How do people store
documents in real life?
Think about a
doctors office
 There’s two ways they
could organize their files
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
2. Group related records


    Patient 1   Patient 2   Patient 3   ...




    Vendor 1    Vendor 2    Vendor 3
2. Group related records


    Patient 1               Patient 3   ...


            Patient 2

    Vendor 1     Vendor 2   Vendor 3
Databases work the same way
          Relation                               Docum


                                         Patient 1     Vendor 1


                                                            Article
              Category                                 • Name
            • Name                                     • Slug
            • Url                                      • Publish
                                          User             date
                                                       •   Text
                                   •   Name            •   Author
                                   •   Email Address
               Article
    User                     Tag
            • Name                                         Comment[]
• Name                   • Name
• Email     • Slug       • Url                         • Comment
  Address   • Publish
               date                                    • Date
                                                       • Author

              Comment                                        Tag[]
            • Comment                                  • Value
            • Date
            • Author
                                                        Category[]
                                                       • Value
Terminology
 RDBMS                 Mongo
Table, View   ➜   Collection
Row           ➜   Document
Index         ➜   Index
Join          ➜   Embedded
Foreign Key   ➜   Document
                  Reference
Partition     ➜   Shard
Why MongoDB
                   My Top 10 Reasons

10. Great developer experience
 9. Speaks your language
 8. Scale horizontally
 7. Fully consistent data w/atomic operations

1.It’s web scale
 6. Memory caching integrated
5. Open source
 4. Flexible, rich & structured data format not just K:V
 3. Ludicrously fast (without going plaid)
 2. Simplify infrastructure & application
Why MongoDB
                   My Top 10 Reasons

10. Great developer experience
 9. Speaks your language
 8. Scale horizontally
 7. Fully consistent data w/atomic operations

1.It’s web scale
 6. Memory caching integrated
5. Open source
 4. Flexible, rich & structured data format not just K:V
 3. Ludicrously fast (without going plaid)
 2. Simplify infrastructure & application
MongoDB
Use Cases
CMS / Blog
Needs:
• Business needed modern data store for rapid development and
  scale

Solution:
• Use PHP & MongoDB

Results:
• Real time statistics
• All data, images, etc stored together
  easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver

Solution:
• Use MongoDB instead of Oracle

Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
Customer Analytics
Problem:
• Deal with massive data volume across all customer sites

Solution:
• Use MongoDB to replace Google Analytics / Omniture options

Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding fixed vertical scale problem
• Very happily looking at other places to use MongoDB
Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents

Solution:
• Switched from MySQL to MongoDB

Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
  RDBMS

Solution:
• Switched from MySQL to MongoDB

Results:
•   Massive simplification of code base
•   Rapidly build, halving time to market (and cost)
•   Eliminated need for external caching system
•   50x+ performance improvement over MySQL
Tons more
   MongoDB casts a wide net

  people keep coming up with
 new and brilliant ways to use it
In Good Company




   and 1000s more
MongoD
  B
Start with an
              (or array, hash, dict, e

place1 = {

   name : "10gen HQ",

 address : "578 Broadway 7th Floor",

   city : "New York",

    zip : "10011",
   tags : [ "business", "awesome" ]
}
Inserting the record
    Initial Data Load


               > db.places.insert(place1)

> db.places.insert(place1)
Querying
{

    name : "10gen HQ",

 address : "134 5th Avenue 3rd Floor",

    city : "New York",

     zip : "10011",
   tags : [ "business", "awesome" ]
}

> db.posts.findOne({ zip: "10011",
            tags: "awesome" })

> db.posts.find({tags: "business" })
Nested Documents
  { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : "roger",
    date : "Sat Apr 24 2011 19:47:11",
    text : "About MongoDB...",
    tags : [ "tech", "databases" ],
    comments : [

         {

         
 
 author : "Fred",

         
 
 date : "Sat Apr 25 2010 20:51:03",

         
 
 text : "Best Post Ever!"

         
}
     ]
}
Object ID
> db.places.insert(place1)

object(MongoId)#4 (1) {
  ["$id"]=> string(24) "4e9cc76a4a1817fd21000000"
}

   4e9cc76a4a1817fd21000000
   |------||----||--||----|
     ts  mac pid inc
A More Complex Document

place1 = {
   name : "10gen HQ",
 address : "578 Broadway 7th Floor",
   city : "New York",
     zip : "10011",
   tags : [ "business", "awesome" ],
 latlong : [40.0,72.0],
     tips : [ { user : "ryan",
              time : 6/26/2011,
               tip : "stop by for office hours"},
   
           {.....}]
}
Indexing & Adv Querying
// Index nested documents
db.posts.ensureIndex({ "comments.author":1 })
db.posts.find({'comments.author':'Fred'})

// Regular Expressions
db.posts.find({'comments.author': /^Fr/})

// Index on tags (multi-key index)
db.posts.ensureIndex({ tags: 1})
db.posts.find( { tags: 'tech' } )

// geospatial index
db.posts.ensureIndex({ "author.location": "2d" })
db.posts.find({"author.location":{$near:[22,42]}})
Updating
place1 = {
    name : "10gen HQ",
> db.places.update(
 address : "578 Broadway 7th Floor",
  {name : "10gen HQ"},
    city : "New York",
  { $push :
     zip : "10011",
       { tips :
    tags : [ "business", "awesome" ],
 latlong {: user : "nosh",
              [40.0,72.0],
             tips : [ { user : "ryan",
              time : 6/26/2011, 
                   time : 6/26/2011,
               tiptip"Office by for office hours on
                     : : "stop hours are great!"
           }              Wednesdays from 4-6pm"}, 
       }         { user : "nosh",
                   time : 7/14/2011, 
  }
                    tip : "Office hours are great!"}
)              ]
}
Updating
place1 = {
    name : "10gen HQ",
> db.places.update(
 address : "578 Broadway 7th Floor",
  {name : "10gen HQ"},
    city : "New York",
  { $push :
     zip : "10011",
       { tips :
    tags : [ "business", "awesome" ],
 latlong {: user : "nosh",
              [40.0,72.0],
             tips : [ { user : "ryan",
              time : 6/26/2011, 
                   time : 6/26/2011,
               tiptip"Office by for office hours on
                     : : "stop hours are great!"
           }              Wednesdays from 4-6pm"}, 
       }         { user : "nosh",
                   time : 7/14/2011, 
  }
                    tip : "Office hours are great!"}
)              ]
}
Atomic
   Operations
$set   $unset       $rename

   $push     $pop     $pull


 $addToSet          $in
Cursors
$cursor = $c->find(array("foo" => "bar"));

foreach ($cursor as $id => $value) {
   echo "$id: ";
   var_dump( $value );
}

$a = iterator_to_array($cursor);
Paging
page_num = 3;
results_per_page = 10;

cursor = db.collection.find()
  .sort({ "ts" : -1 })
  .skip(page_num * results_per_page)
  .limit(results_per_page);
Grid FS
Storing Files




Under 16mb
Storing Big Files




>16mb stored in 16mb chunks
Storing Big Files




Works with replicated and
A better network FS
GridFS files are seamlessly sharded & replicated.
No OS constraints...
No file size limits
No naming constraints
No folder limits
Standard across different OSs
MongoDB automatically generates the MD5 hash of
the file
MongoDB for
 Genealogy
   Data
Types of
      genealogy data
Events (birth, death,   Photographs
etc)
                        Diaries & letters
Official records
                        Ship passenger list
Census
                        Occupation
Names
                        and more
Relationships
Challenges of
           genealogy data
Lots of possible data points... need flexible schema
Multiple versions of same data point
(3 different dates for death date, 4 variations on
name).
Data related to records
Multiple versions of same nodes
(intelligent nondestructive merge needed)
Need to have meta data associated
Genealo
 gy is
changin
   g
0   @I2@ INDI
1   NAME Charles Phillip /Ingalls/
1   SEX M
1   BIRT
2   DATE 10 JAN 1836
2   PLAC Cuba, Allegheny, NY
1   DEAT


                           Recog
2   DATE 08 JUN 1902
2   PLAC De Smet, Kingsbury, Dakota Territory
1   FAMC @F2@
1   FAMS @F3@


                            nize
0   @I3@ INDI
1   NAME Caroline Lake /Quiner/
1   SEX F
1   BIRT
2   DATE 12 DEC 1839
GEDCOM
File format, not a database
Handles the great variety of data well
Doesn’t really scale beyond a local user.
Doesn’t provide good mechanism for storing
external documents (birth certificates, etc).
Built to solve problem of sharing data
Genealogy &
              MongoDB

Genealogy is anything but rigid and fixed
Flexible schema fits genealogy data well
Packaging things together makes sense
Relating records doesn’t require a relational
database
Indivi
•AFN
•Modification Date
                      Events[]
                    •type
                    •date
    Name            •contributor[]
                    •record[]
 •First[]
 •Middle[]            Location
 •Last[]             •city
                     •state
                     •county
                     •country
Indivi                  Events[]
                                          Us
                                         • Name
• AFN                • type              • Email Address
• Modification Date   • date              • Password
                     • contributor[]     • Individual_id
                     • record[]
   Name
• First[]
• Middle[]              Location
• Last[]               • city
                       • state           Rec
                       • county          • contributor
                       • country         • type
                       • coordinates[]   • thumbnail
                                         • content
                                         • description
                                         • tags[]
Individual
individual = {
  _id : ObjectId("4f2978dfaa999d9db02618ce"),
  AFN : '1XYK-KQJ',
  name: {
     first: ['john', 'johannes'],
     middle: 'peter',
     last: ['smith', 'sandvik']
   }
}
Individual
individual = {
  _id : ObjectId("4f2978dfaa999d9db02618ce"),
  AFN : '1XYK-KQJ',
  name: {
     first: ['john', 'johannes'],
     middle: 'peter',
     last: ['smith', 'sandvik']
   }
}


db.individual.find(
{name.first : ‘john’, name.middle : ‘peter’})
Events
events : [
   death : {
    date : ISODate('1989-07-14'),
    location : {
      city: 'pensacola',
      state: 'fl',
      county: 'escambia',
      country: 'usa'
      coordinates : [30.26,87.12]},
    contributor : ObjectId("4eeac...691")}]
events : [
   death : {
                Events
    date : ISODate('1989-07-14'),
    location : {
      city: 'pensacola',
      state: 'fl',
      county: 'escambia',
      country: 'usa'
      coordinates : [30.26,87.12]},
    contributor : ObjectId("4eeac...691")}]

db.individual.find(
{events.death.date : ISODate(‘1989-07-14’)})

db.individual.find(
{events.death.location : { $near:[30,90]}})
Duplicate Events
events : [
  birth : [ {
      date : ISODate('1928-04-06'),
      location : {
        city: 'brattleboro',
        state: 'vt',
        county: 'windham',
        country: 'usa'
        coordinates : [42.51,72.34]},
      contributor : ObjectId("4ee...00000"),
      records: ObjectId("4ed8a...7b000000")
  },
county: 'windham',

Duplicate Events
            country: 'usa'
            coordinates : [42.51,72.34]},
          contributor : ObjectId("4ee...00000"),
          records: ObjectId("4ed8a...7b000000")
    },
    {
          date : ISODate('1928-04-16'),
          location : {
            city: 'brattleboro',
            state: 'vt',
            county: 'windham',
            country: 'usa'
            coordinates : [42.51,72.34]},
          contributor : ObjectId("4ee...37bb"),
          records: ObjectId("4eea...0000c8"),
    }],
}
Duplicate Events
events : [
  birth : [ { date : ISODate('1928-04-06')},
          { date : ISODate('1928-04-16')}],
]

db.individual.find(
{events.birth.date : ISODate(‘1928-04-16’)})

                     Same Query
                       Works!!
Multiple Events
marriage : [{
  date : ISODate('1939-08-11'),
  end_date : ISODate('1940-02-19'),
  to : ObjectId("4f297978aa999d9db02618cf"),
  location : {
    city: 'raleigh',
    state: 'nc',
    county: 'wake',
    country: 'usa'
    coordinates : [35.49,78.38]},
  contributor : ObjectId("4eeac...91537bb")},
{
  date : ISODate('1944-04-19'),
  to : ObjectId("4f2978dfaa999d9db02618ce"),
  location : {
marriage : [{


 Multiple Events
  date : ISODate('1939-08-11'),
  end_date : ISODate('1940-02-19'),
  to : ObjectId("4f297978aa999d9db02618cf"),
  location : {
    city: 'raleigh',
    state: 'nc',
    county: 'wake',
    country: 'usa'
    coordinates : [35.49,78.38]},
  contributor : ObjectId("4eeac...91537bb")},
{
  date : ISODate('1944-04-19'),
  to : ObjectId("4f2978dfaa999d9db02618ce"),
  location : {
    city: 'atlanta',
    state: 'ga',
    county: 'fulton',
    country: 'usa'
    coordinates : [33.45,84.23]},
    contributor : ObjectId("4eeb...37bb")}]
individual = {                              All
   _id : ObjectId("4f2978dfaa999d9db02618ce"),




                                          togeth
   AFN : '1XYK-KQJ',
   name: {
      first: ['john', 'johannes'],
      middle: 'peter',
      last: ['smith', 'sandvik']
   },
   events : [



                                            er
      birth : [
         {
             date : ISODate('1928-04-06'),
             location : {
                                   Text
                city: 'brattleboro',
                state: 'vt',
                county: 'windham',
                country: 'usa'
                coordinates : [42.51,72.34]
             },
             contributor : ObjectId("4eeabc958b691537bb000000"),
             records: ObjectId("4ed8aea7d8562f7d7b000000")
         },
         {
             date : ISODate('1928-04-16'),
             location : {
                city: 'brattleboro',
Records
record1 = {
   _id : ObjectId("4ed8aea7d8562f7d7b")
   contributor : ObjectId("4eeab...1537bb"),
   type : 'birth certificate',
   thumbnail : BinData(0,"/9j/4AAQSkZJ...."),
   content : BinData(0,"j6b/Id11lWqs..."),
   tags : ['NY', 'certified'],
   description : "John's birth certificate"
}
Users
user = {
  _id : ObjectId("4eeabc958b691537bb"),
  username : 'spf13',
  email_address : 'genealogy@spf13.com',
  password : 'a.long.passphrase18',
  individual_id : ObjectId("4f2f...0ce"),
}
Scaling
 MongoDB
 for all the
generation
Replica Sets
Primary         Primary    Primary

Secondary      Secondary   Secondary


Secondary       Arbiter    Secondary

                           Secondary

                           Secondary
Sharding
          App       App      App
         Server    Server   Server
         MongoS    MongoS    MongoS

                                           ConfigD
                                           ConfigD
                                           ConfigD


MongoD       MongoD     MongoD    MongoD

MongoD       MongoD     MongoD    MongoD

MongoD       MongoD     MongoD    MongoD
The Family
 Tree
It’s not a tree at all,
  It’s really a graph
     ... and an odd one at that
It would be easy if it
always looked like this
It would be easy if it
always looked like this
All sorts of mess
Step & adopted relationships
Duplicate nodes
Lots of missing nodes
Divorces and re-marriages
Multiple names for the same person
Multiple dates for the same event
How to make
sense of it all
Storing a
graph
   in
Graphs are important




Without them we couldn’t store family relationships
Trees / graphs
        in MongoDB
Since MongoDB data structures are
essentially objects, a good degree of
flexibility here.
Think of how you would structure them in
your application
Trees / graphs
        in MongoDB
Each node is stored as a document

Contains references to related nodes

What is “related” depends on your
application
References vs
         Relation
MongoDB uses references
Unlike foreign keys, references don’t
enforce integrity
Reference is really just a reference
For many applications a reference is
sufficient
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Bi-directional
 {   _id:   "a", children: ["e"] }
 {   _id:   "b", children: ["e"] }
 {   _id:   "c", children: ["f"] }
 {   _id:   "d", children: ["f"] }
 {   _id:   "e", children: ["g"], parents: ["a", "b" ]}
 {   _id:   "f", children: ["g"], parents: ["c", "d" ]}
 {   _id:   "g", children: [] , parents: ["e", "f"] }


•Doesn’t really add much beyond the first example
•More maintenance
•Duplication of each relationship
•Only real advantage is ability to grab all related
nodes (both directions) with one query.
Array of Ancestors
{   _id:   "a" }
{   _id:   "b" }
{   _id:   "c" }
{   _id:   "d" }
{   _id:   "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}
{   _id:   "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}
{   _id:   "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }



Great for small trees (or subsets).
//find all descendants of b:
> db.tree.find({ ancestors: ‘b’})
Could be used to store X generations of ancestors
Optimized for retrieving entire tree
//find all direct descendants of b:
> db.tree.find({ parents: ‘b’})
Uses implied relationships
//find all ancestors of g:
No = db.tree.findOne( { _id: 'g'is )this person my grandson?
> g help on specifics... }
> db.tree.find( { _id: { $in : g.ancestors } )
Easier retrieval at expense of costlier maintenance
Array of Ancestors
{   _id:   "a" }
{   _id:   "b" }
{   _id:   "c" }
{   _id:   "d" }
{   _id:   "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}
{   _id:   "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}
{   _id:   "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }



Great for small trees (or subsets).
//find all descendants of b:
> db.tree.find({ ancestors: ‘b’})
Could be used to store X generations of ancestors
Optimized for retrieving entire tree
//find all direct descendants of b:
> db.tree.find({ parents: ‘b’})
Uses implied relationships
//find all ancestors of g:
No = db.tree.findOne( { _id: 'g'is )this person my grandson?
> g help on specifics... }
> db.tree.find( { _id: { $in : g.ancestors } )
Easier retrieval at expense of costlier maintenance
Relations (basic)
{   _id     : "b",
    relations : [
       {
         id      : "a",
         relation : "parent"},
       {
         id      : "c",
         relation : "grandparent"},
       {
         id      : "d",
         relation : "parent"}]}
Relations (detailed)
{   _id     : "b",
    relations : [
       {
         id      : "a",
         relation : "parent",
         type      : "mother",
         subtype : "biological" },
       {
         id      : "c",
         relation : "parent",
         type      : "father",
         subtype : "adopted"},
       {
         id      : "d",
         relation : "parent",
         type      : "father",
         subtype : "biological"}]}
Shouldn’t I store my
family tree in a graph
     database?
   They are built to store trees after all
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Unfortunately that’s not
how we commonly work
Typically we are working with a node and
it’s immediate neighbors
The significant majority of our operations
aren’t traversing

If those operations are
important, perhaps a
hybrid graph & document
solution makes sense
http://spf13.com
                           http://github.com/s
                           @spf13




Question
    download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
MongoDB for Genealogy

Contenu connexe

Tendances

기계독해를 위한 BERT 언어처리 모델 활용
기계독해를 위한 BERT 언어처리 모델 활용기계독해를 위한 BERT 언어처리 모델 활용
기계독해를 위한 BERT 언어처리 모델 활용
Kenneth Jung
 
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
min woog kim
 

Tendances (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Scalable News Feed with Mongo DB
Scalable News Feed with Mongo DBScalable News Feed with Mongo DB
Scalable News Feed with Mongo DB
 
Elasticsearch ve Udemy Kullanım Pratikleri
Elasticsearch ve Udemy Kullanım PratikleriElasticsearch ve Udemy Kullanım Pratikleri
Elasticsearch ve Udemy Kullanım Pratikleri
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
AtCoder Beginner Contest 023 解説
AtCoder Beginner Contest 023 解説AtCoder Beginner Contest 023 解説
AtCoder Beginner Contest 023 解説
 
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [리뷰의 재발견 팀] : 이커머스 리뷰 유용성 파악 및 필터링
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [리뷰의 재발견 팀] : 이커머스 리뷰 유용성 파악 및 필터링제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [리뷰의 재발견 팀] : 이커머스 리뷰 유용성 파악 및 필터링
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [리뷰의 재발견 팀] : 이커머스 리뷰 유용성 파악 및 필터링
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [MarketIN팀] : 디지털 마케팅 헬스체킹 서비스
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [MarketIN팀] : 디지털 마케팅 헬스체킹 서비스제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [MarketIN팀] : 디지털 마케팅 헬스체킹 서비스
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [MarketIN팀] : 디지털 마케팅 헬스체킹 서비스
 
기계독해를 위한 BERT 언어처리 모델 활용
기계독해를 위한 BERT 언어처리 모델 활용기계독해를 위한 BERT 언어처리 모델 활용
기계독해를 위한 BERT 언어처리 모델 활용
 
【CEDEC2018】Azure最新情報+「オトギフロンティア」運用大公開+サーバーレスアーキテクチャー
【CEDEC2018】Azure最新情報+「オトギフロンティア」運用大公開+サーバーレスアーキテクチャー【CEDEC2018】Azure最新情報+「オトギフロンティア」運用大公開+サーバーレスアーキテクチャー
【CEDEC2018】Azure最新情報+「オトギフロンティア」運用大公開+サーバーレスアーキテクチャー
 
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
김민욱, (달빛조각사) 엘릭서를 이용한 mmorpg 서버 개발, NDC2019
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
수화 인식 자동 번역 iOS 앱 프로젝트 제안서
수화 인식 자동 번역 iOS 앱 프로젝트 제안서수화 인식 자동 번역 iOS 앱 프로젝트 제안서
수화 인식 자동 번역 iOS 앱 프로젝트 제안서
 
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [기린그림 팀] : 사용자의 손글씨가 담긴 그림 일기 생성 서비스
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [기린그림 팀] : 사용자의 손글씨가 담긴 그림 일기 생성 서비스제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [기린그림 팀] : 사용자의 손글씨가 담긴 그림 일기 생성 서비스
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [기린그림 팀] : 사용자의 손글씨가 담긴 그림 일기 생성 서비스
 
Geohash
GeohashGeohash
Geohash
 
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [쇼미더뮤직 팀] : 텍스트 감정추출을 통한 노래 추천
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [쇼미더뮤직 팀] : 텍스트 감정추출을 통한 노래 추천제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [쇼미더뮤직 팀] : 텍스트 감정추출을 통한 노래 추천
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [쇼미더뮤직 팀] : 텍스트 감정추출을 통한 노래 추천
 
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한 질의어 오타 교정 시스템 구축
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한  질의어 오타 교정 시스템 구축한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한  질의어 오타 교정 시스템 구축
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한 질의어 오타 교정 시스템 구축
 

Similaire à MongoDB for Genealogy

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
Korea Sdec
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
MongoDB APAC
 
mongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputing
moeincanada007
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 

Similaire à MongoDB for Genealogy (20)

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
 
mongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputing
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
mongoDB at Visibiz
mongoDB at VisibizmongoDB at Visibiz
mongoDB at Visibiz
 
MongoDB by Emroz sardar.
MongoDB by Emroz sardar.MongoDB by Emroz sardar.
MongoDB by Emroz sardar.
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 

Plus de Steven Francia

MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
Steven Francia
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 

Plus de Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

MongoDB for Genealogy

  • 1. Storing the Family Tree with
  • 2. We’re going to talk about MongoDB Intro & Fundamentals MongoDB for Genealogy data Scaling MongoDB for all the generations The Family Tree Storing a graph in MongoDB
  • 3. Steve @sp A 15+ years building the internet Father, husband, skateboarder, genealogist at ❤ Chief Solutions Architect @ responsible for drivers, integrations, web & docs
  • 4. Company behind MongoDB Offices in NYC, Palo Alto, London & Dublin 100+ employees Support, consulting, training Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic Well Funded: Sequoia, Union Square, Flybridge
  • 5. Introduction to MongoD
  • 8.
  • 9.
  • 10. 1979
  • 11. 1979 1994
  • 12. 1979 1994 1995
  • 13. Computers in 1995 100 mhz Pentium 10 base T 16 MB ram 200 MB HD
  • 14. Cloud in 1995 (Windows 95 cloud wallpaper)
  • 15. Cell Phones in 2012 Dual core 1.5Ghz 802.11n (300+ Mbps) 1 GB ram 64 GB Solid State
  • 16. MongoDB Application Document Oriented High { author : “steve”, date : new Date(), Performance text : “About MongoDB...”, tags : [“tech”, “database”]} Fully Consistent Horizontally Scalable
  • 17. MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
  • 18. Under the hood Written in C++ Runs nearly everywhere Data serialized to BSON Extensive use of memory-mapped files i.e. read-through write-through memory caching.
  • 19. Database Landscape Scalability & Performance MemCache MongoDB RDBMS Depth of Functionality
  • 20. “ MongoDB has the best features of key/value stores, document databases and relational databases in one. John Nunemaker
  • 21. Relational made normalized data look like this Category • Name • Url Article User • Name Tag • Name • Slug • Name • Email Address • Publish date • Url • Text Comment • Comment • Date • Author
  • 22. Document databases make normalized data look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
  • 23. But we’ve been using a relational database for 40 years!
  • 24. How do people store documents in real life?
  • 25. Think about a doctors office There’s two ways they could organize their files
  • 26. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 27. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 28. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 29. 2. Group related records Patient 1 Patient 2 Patient 3 ... Vendor 1 Vendor 2 Vendor 3
  • 30. 2. Group related records Patient 1 Patient 3 ... Patient 2 Vendor 1 Vendor 2 Vendor 3
  • 31. Databases work the same way Relation Docum Patient 1 Vendor 1 Article Category • Name • Name • Slug • Url • Publish User date • Text • Name • Author • Email Address Article User Tag • Name Comment[] • Name • Name • Email • Slug • Url • Comment Address • Publish date • Date • Author Comment Tag[] • Comment • Value • Date • Author Category[] • Value
  • 32. Terminology RDBMS Mongo Table, View ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Foreign Key ➜ Document Reference Partition ➜ Shard
  • 33. Why MongoDB My Top 10 Reasons 10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations 1.It’s web scale 6. Memory caching integrated 5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 34. Why MongoDB My Top 10 Reasons 10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations 1.It’s web scale 6. Memory caching integrated 5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 36. CMS / Blog Needs: • Business needed modern data store for rapid development and scale Solution: • Use PHP & MongoDB Results: • Real time statistics • All data, images, etc stored together easy access, easy deployment, easy high availability • No need for complex migrations • Enabled very rapid development and growth
  • 37. Photo Meta-Data Problem: • Business needed more flexibility than Oracle could deliver Solution: • Use MongoDB instead of Oracle Results: • Developed application in one sprint cycle • 500% cost reduction compared to Oracle • 900% performance improvement compared to Oracle
  • 38. Customer Analytics Problem: • Deal with massive data volume across all customer sites Solution: • Use MongoDB to replace Google Analytics / Omniture options Results: • Less than one week to build prototype and prove business case • Rapid deployment of new features
  • 39. Archiving Why MongoDB: • Existing application built on MySQL • Lots of friction with RDBMS based archive storage • Needed more scalable archive storage backend Solution: • Keep MySQL for active data (100mil) • MongoDB for archive (2+ billion) Results: • No more alter table statements taking over 2 months to run • Sharding fixed vertical scale problem • Very happily looking at other places to use MongoDB
  • 40. Online Dictionary Problem: • MySQL could not scale to handle their 5B+ documents Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Eliminated need for external caching system • 20x performance improvement over MySQL
  • 41. E-commerce Problem: • Multi-vertical E-commerce impossible to model (efficiently) in RDBMS Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Rapidly build, halving time to market (and cost) • Eliminated need for external caching system • 50x+ performance improvement over MySQL
  • 42. Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
  • 43. In Good Company and 1000s more
  • 45. Start with an (or array, hash, dict, e place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] }
  • 46. Inserting the record Initial Data Load > db.places.insert(place1) > db.places.insert(place1)
  • 47. Querying { name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] } > db.posts.findOne({ zip: "10011", tags: "awesome" }) > db.posts.find({tags: "business" })
  • 48. Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Apr 24 2011 19:47:11", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Post Ever!" } ] }
  • 49. Object ID > db.places.insert(place1) object(MongoId)#4 (1) { ["$id"]=> string(24) "4e9cc76a4a1817fd21000000" } 4e9cc76a4a1817fd21000000 |------||----||--||----| ts mac pid inc
  • 50. A More Complex Document place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], latlong : [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, tip : "stop by for office hours"}, {.....}] }
  • 51. Indexing & Adv Querying // Index nested documents db.posts.ensureIndex({ "comments.author":1 }) db.posts.find({'comments.author':'Fred'}) // Regular Expressions db.posts.find({'comments.author': /^Fr/}) // Index on tags (multi-key index) db.posts.ensureIndex({ tags: 1}) db.posts.find( { tags: 'tech' } ) // geospatial index db.posts.ensureIndex({ "author.location": "2d" }) db.posts.find({"author.location":{$near:[22,42]}})
  • 52. Updating place1 = { name : "10gen HQ", > db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"} ) ] }
  • 53. Updating place1 = { name : "10gen HQ", > db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"} ) ] }
  • 54. Atomic Operations $set $unset $rename $push $pop $pull $addToSet $in
  • 55. Cursors $cursor = $c->find(array("foo" => "bar")); foreach ($cursor as $id => $value) { echo "$id: "; var_dump( $value ); } $a = iterator_to_array($cursor);
  • 56. Paging page_num = 3; results_per_page = 10; cursor = db.collection.find() .sort({ "ts" : -1 }) .skip(page_num * results_per_page) .limit(results_per_page);
  • 59. Storing Big Files >16mb stored in 16mb chunks
  • 60. Storing Big Files Works with replicated and
  • 61. A better network FS GridFS files are seamlessly sharded & replicated. No OS constraints... No file size limits No naming constraints No folder limits Standard across different OSs MongoDB automatically generates the MD5 hash of the file
  • 63. Types of genealogy data Events (birth, death, Photographs etc) Diaries & letters Official records Ship passenger list Census Occupation Names and more Relationships
  • 64. Challenges of genealogy data Lots of possible data points... need flexible schema Multiple versions of same data point (3 different dates for death date, 4 variations on name). Data related to records Multiple versions of same nodes (intelligent nondestructive merge needed) Need to have meta data associated
  • 66. 0 @I2@ INDI 1 NAME Charles Phillip /Ingalls/ 1 SEX M 1 BIRT 2 DATE 10 JAN 1836 2 PLAC Cuba, Allegheny, NY 1 DEAT Recog 2 DATE 08 JUN 1902 2 PLAC De Smet, Kingsbury, Dakota Territory 1 FAMC @F2@ 1 FAMS @F3@ nize 0 @I3@ INDI 1 NAME Caroline Lake /Quiner/ 1 SEX F 1 BIRT 2 DATE 12 DEC 1839
  • 67. GEDCOM File format, not a database Handles the great variety of data well Doesn’t really scale beyond a local user. Doesn’t provide good mechanism for storing external documents (birth certificates, etc). Built to solve problem of sharing data
  • 68. Genealogy & MongoDB Genealogy is anything but rigid and fixed Flexible schema fits genealogy data well Packaging things together makes sense Relating records doesn’t require a relational database
  • 69. Indivi •AFN •Modification Date Events[] •type •date Name •contributor[] •record[] •First[] •Middle[] Location •Last[] •city •state •county •country
  • 70. Indivi Events[] Us • Name • AFN • type • Email Address • Modification Date • date • Password • contributor[] • Individual_id • record[] Name • First[] • Middle[] Location • Last[] • city • state Rec • county • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 71. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } }
  • 72. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } } db.individual.find( {name.first : ‘john’, name.middle : ‘peter’})
  • 73. Events events : [ death : { date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}]
  • 74. events : [ death : { Events date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}] db.individual.find( {events.death.date : ISODate(‘1989-07-14’)}) db.individual.find( {events.death.location : { $near:[30,90]}})
  • 75. Duplicate Events events : [ birth : [ { date : ISODate('1928-04-06'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") },
  • 76. county: 'windham', Duplicate Events country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }], }
  • 77. Duplicate Events events : [ birth : [ { date : ISODate('1928-04-06')}, { date : ISODate('1928-04-16')}], ] db.individual.find( {events.birth.date : ISODate(‘1928-04-16’)}) Same Query Works!!
  • 78. Multiple Events marriage : [{ date : ISODate('1939-08-11'), end_date : ISODate('1940-02-19'), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: 'raleigh', state: 'nc', county: 'wake', country: 'usa' coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")}, { date : ISODate('1944-04-19'), to : ObjectId("4f2978dfaa999d9db02618ce"), location : {
  • 79. marriage : [{ Multiple Events date : ISODate('1939-08-11'), end_date : ISODate('1940-02-19'), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: 'raleigh', state: 'nc', county: 'wake', country: 'usa' coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")}, { date : ISODate('1944-04-19'), to : ObjectId("4f2978dfaa999d9db02618ce"), location : { city: 'atlanta', state: 'ga', county: 'fulton', country: 'usa' coordinates : [33.45,84.23]}, contributor : ObjectId("4eeb...37bb")}]
  • 80. individual = { All _id : ObjectId("4f2978dfaa999d9db02618ce"), togeth AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] }, events : [ er birth : [ { date : ISODate('1928-04-06'), location : { Text city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34] }, contributor : ObjectId("4eeabc958b691537bb000000"), records: ObjectId("4ed8aea7d8562f7d7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro',
  • 81. Records record1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : 'birth certificate', thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : ['NY', 'certified'], description : "John's birth certificate" }
  • 82. Users user = { _id : ObjectId("4eeabc958b691537bb"), username : 'spf13', email_address : 'genealogy@spf13.com', password : 'a.long.passphrase18', individual_id : ObjectId("4f2f...0ce"), }
  • 83. Scaling MongoDB for all the generation
  • 84. Replica Sets Primary Primary Primary Secondary Secondary Secondary Secondary Arbiter Secondary Secondary Secondary
  • 85. Sharding App App App Server Server Server MongoS MongoS MongoS ConfigD ConfigD ConfigD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD
  • 87. It’s not a tree at all, It’s really a graph ... and an odd one at that
  • 88. It would be easy if it always looked like this
  • 89. It would be easy if it always looked like this
  • 90. All sorts of mess Step & adopted relationships Duplicate nodes Lots of missing nodes Divorces and re-marriages Multiple names for the same person Multiple dates for the same event
  • 91. How to make sense of it all
  • 93. Graphs are important Without them we couldn’t store family relationships
  • 94. Trees / graphs in MongoDB Since MongoDB data structures are essentially objects, a good degree of flexibility here. Think of how you would structure them in your application
  • 95. Trees / graphs in MongoDB Each node is stored as a document Contains references to related nodes What is “related” depends on your application
  • 96. References vs Relation MongoDB uses references Unlike foreign keys, references don’t enforce integrity Reference is really just a reference For many applications a reference is sufficient
  • 97. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 98. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 99. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 100. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 101. Bi-directional { _id: "a", children: ["e"] } { _id: "b", children: ["e"] } { _id: "c", children: ["f"] } { _id: "d", children: ["f"] } { _id: "e", children: ["g"], parents: ["a", "b" ]} { _id: "f", children: ["g"], parents: ["c", "d" ]} { _id: "g", children: [] , parents: ["e", "f"] } •Doesn’t really add much beyond the first example •More maintenance •Duplication of each relationship •Only real advantage is ability to grab all related nodes (both directions) with one query.
  • 102. Array of Ancestors { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]} { _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]} { _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] } Great for small trees (or subsets). //find all descendants of b: > db.tree.find({ ancestors: ‘b’}) Could be used to store X generations of ancestors Optimized for retrieving entire tree //find all direct descendants of b: > db.tree.find({ parents: ‘b’}) Uses implied relationships //find all ancestors of g: No = db.tree.findOne( { _id: 'g'is )this person my grandson? > g help on specifics... } > db.tree.find( { _id: { $in : g.ancestors } ) Easier retrieval at expense of costlier maintenance
  • 103. Array of Ancestors { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]} { _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]} { _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] } Great for small trees (or subsets). //find all descendants of b: > db.tree.find({ ancestors: ‘b’}) Could be used to store X generations of ancestors Optimized for retrieving entire tree //find all direct descendants of b: > db.tree.find({ parents: ‘b’}) Uses implied relationships //find all ancestors of g: No = db.tree.findOne( { _id: 'g'is )this person my grandson? > g help on specifics... } > db.tree.find( { _id: { $in : g.ancestors } ) Easier retrieval at expense of costlier maintenance
  • 104. Relations (basic) { _id : "b", relations : [ { id : "a", relation : "parent"}, { id : "c", relation : "grandparent"}, { id : "d", relation : "parent"}]}
  • 105. Relations (detailed) { _id : "b", relations : [ { id : "a", relation : "parent", type : "mother", subtype : "biological" }, { id : "c", relation : "parent", type : "father", subtype : "adopted"}, { id : "d", relation : "parent", type : "father", subtype : "biological"}]}
  • 106. Shouldn’t I store my family tree in a graph database? They are built to store trees after all
  • 107. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 108. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 109. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 110. Unfortunately that’s not how we commonly work Typically we are working with a node and it’s immediate neighbors The significant majority of our operations aren’t traversing If those operations are important, perhaps a hybrid graph & document solution makes sense
  • 111. http://spf13.com http://github.com/s @spf13 Question download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  10. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  11. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  12. \n
  13. \n
  14. \n
  15. \n
  16. By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. Store an array of the id of the ancestor of a given document\n
  97. Store an array of the id of the ancestor of a given document\n
  98. Store an array of the id of the ancestor of a given document\n
  99. Store an array of the id of the ancestor of a given document\n
  100. Store an array of the id of the ancestor of a given document\n
  101. Store an array of the id of the ancestor of a given document\n
  102. Store an array of the id of the ancestor of a given document\n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n