SlideShare une entreprise Scribd logo
1  sur  152
Télécharger pour lire hors ligne
Schema Design
  Workshop
                            Sridhar Nanjundeswaran

                            Software Engineer, 10Gen
                               sridhar@10gen.com
                                    @snanjund




Wednesday, December 5, 12
Agenda

       • Part One - Basic Schema & Patterns
       • Part Two - Schema Design
       • Part Three - Sharding
       • Part Four: - Replication




Wednesday, December 5, 12
Why is schema design
       different?
       • RDBMS design you ask "what answers do I have"
       • MongoDB you ask "what questions will I have"




Wednesday, December 5, 12
Goals

       • Learn Data Modeling with MongoDB
       • Labs to try to solve problems
       • Understand implications of
        • Replication
        • Sharding

       Please, ask many, many questions!




Wednesday, December 5, 12
Part One
       Basic Schema & Patterns




Wednesday, December 5, 12
So why model data?




                            http://bit.ly/SSs7QB

Wednesday, December 5, 12
Normalization
     • 1970 E.F.Codd introduces 1st Normal Form (1NF)
     • 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
     • 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
     • 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)


     Goals:
     • Avoid anomalies when inserting, updating or deleting
     • Minimize redesign when extending the schema
     • Make the model informative to users
     • Avoid bias towards a particular style of query



                                                        * source : wikipedia
Wednesday, December 5, 12
So today’s example will use...




                                 http://bit.ly/RyIOvO

Wednesday, December 5, 12
Terminology
         RDBMS              MongoDB
         Table              Collection
         Row(s)             JSON	
  Document
         Index              Index
         Join               Embedding	
  &	
  Linking
         Partition          Shard
         Partition	
  Key   Shard	
  Key



Wednesday, December 5, 12
Schema Design
       Relational Database




Wednesday, December 5, 12
Schema Design
       MongoDB




Wednesday, December 5, 12
Schema Design
       MongoDB
                            linking




Wednesday, December 5, 12
Schema Design                  embedding
       MongoDB
                            linking




Wednesday, December 5, 12
Basic schema

     Design documents that simply map to your application

     > post = { author:     "Hergé",
                date:       ISODate("2011-09-18T09:56:06.298Z"),
                text:       "Destination Moon",
                tags:       ["comic", "movie"]
              }

     > db.blogs.save(post)




Wednesday, December 5, 12
Find the document
   > db.blogs.find()

        { _id:              ObjectId("4c4ba5c0672c685e5e8aabf3"),
          author:           "Hergé",
          date:             ISODate("2011-09-18T09:56:06.298Z"),
          text:             "Destination Moon",
          tags:             [ "comic", "movie" ]
        }

   Notes:
   • ID must be unique, but can be anything you’d like
   • MongoDB will generate a default ID if one is not
   supplied




Wednesday, December 5, 12
Add an index, find via Index

   Secondary index for “author”

   //   1 means ascending, -1 means descending
   > db.blogs.ensureIndex( { author: 1 } )

   > db.blogs.find( { author: 'Hergé' } )

        { _id:    ObjectId("4c4ba5c0672c685e5e8aabf3"),
          date:   ISODate("2011-09-18T09:56:06.298Z"),
          author: "Hergé",
        ... }




Wednesday, December 5, 12
Examine the query plan

      > db.blogs.find( { author: "Hergé" } ).explain()
      {
      !   "cursor" : "BtreeCursor author_1",
      !   "nscanned" : 1,
      !   "nscannedObjects" : 1,
      !   "n" : 1,
      !   "millis" : 5,
      !   "indexBounds" : {
      !   !    "author" : [
      !   !    !   [
      !   !    !   !    "Hergé",
      !   !    !   !    "Hergé"
      !   !    !   ]
      !   !    ]
      !   }
      }

Wednesday, December 5, 12
Examine the query plan

      > db.blogs.find( { author: "Hergé" } ).explain()
      {
      !   "cursor" : "BtreeCursor author_1",
      !   "nscanned" : 1,
      !   "nscannedObjects" : 1,
      !   "n" : 1,
      !   "millis" : 5,
      !   "indexBounds" : {
      !   !    "author" : [
      !   !    !   [
      !   !    !   !    "Hergé",
      !   !    !   !    "Hergé"
      !   !    !   ]
      !   !    ]
      !   }
      }

Wednesday, December 5, 12
Examine the query plan

      > db.blogs.find( { author: "Hergé" } ).explain()
      {
      !   "cursor" : "BtreeCursor author_1",
      !   "nscanned" : 1,
      !   "nscannedObjects" : 1,
                                         Number of objects
      !   "n" : 1,                            returned
      !   "millis" : 5,
      !   "indexBounds" : {               How long it took
      !   !    "author" : [
      !   !    !   [
      !   !    !   !    "Hergé",
      !   !    !   !    "Hergé"
      !   !    !   ]
      !   !    ]
      !   }
      }

Wednesday, December 5, 12
Query operators
    Conditional operators:
     $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
     $lt, $lte, $gt, $gte, $ne...

    // find posts with any tags
    > db.blogs.find( { tags: { $exists: true } } )

    Regular expressions:
    // posts where author starts with h
    > db.blogs.find( { author: /^h/i } )

    Counting:
    // number of posts written by Hergé
    > db.blogs.find( { author: "Hergé" } ).count()




Wednesday, December 5, 12
Extending the Schema




                              http://bit.ly/PpjT1l

Wednesday, December 5, 12
Extending the Schema
    > new_comment =
        { author: "Kyle",
          date:     new Date(),
          text:     "great book" }


    > db.blogs.update(
               { text: "Destination Moon" },
               { "$push": { comments: new_comment },
                 "$inc": { comments_count: 1 }
               } )




Wednesday, December 5, 12
Extending the Schema
    > new_comment =
        { author: "Kyle",
          date:     new Date(),
          text:     "great book" }


    > db.blogs.update(
               { text: "Destination Moon" },
               { "$push": { comments: new_comment },
                 "$inc": { comments_count: 1 }
               } )


                                      Add element to
              Increment counter           array



Wednesday, December 5, 12
Extending the Schema
       > db.blogs.find( { author: "Hergé"} )

        { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
          author : "Hergé",
          date : ISODate("2011-09-18T09:56:06.298Z"),
          text : "Destination Moon",
          tags : [ "comic", "movie" ],
          comments : [
   !         {
   !      !    author : "Kyle",
   !      !    date : ISODate("2011-09-19T09:56:06.298Z"),
   !      !    text : "great book"
   !         }
          ],
          comments_count: 1
        }




Wednesday, December 5, 12
Extending the Schema
    // create index on nested documents:
    > db.blogs.ensureIndex( { "comments.author": 1 } )

    > db.blogs.find( { "comments.author": "Kyle" } )

    // find last 5 posts:
    > db.blogs.find().sort( { date: -1 } ).limit(5)

    // most commented post:
    > db.blogs.find().sort( { comments_count: -1 } ).limit(1)


    When sorting, check if you need an index




Wednesday, December 5, 12
Common Patterns




                            http://bit.ly/SNnt4z

Wednesday, December 5, 12
Inheritance




                            http://bit.ly/T7MqUz

Wednesday, December 5, 12
Inheritance




Wednesday, December 5, 12
Single Table Inheritance -
       RDBMS
       select * from shapes;

               id           type     area   radius   length   width

               1            circle   3.14   1



               2            square   4               2




               3            rect     10              5        2




Wednesday, December 5, 12
Single Table Inheritance -
  MongoDB
    > db.shapes.find()
     { _id: "1", type: "c", area: 3.14, radius: 1}
     { _id: "2", type: "s", area: 4,    length: 2}
     { _id: "3", type: "r", area: 10,   length: 5, width: 2}




                            missing values not
                                 stored!




Wednesday, December 5, 12
Single Table Inheritance -
  MongoDB
    > db.shapes.find()
     { _id: "1", type: "c", area: 3.14, radius: 1}
     { _id: "2", type: "s", area: 4,    length: 2}
     { _id: "3", type: "r", area: 10,   length: 5, width: 2}

    // find shapes where radius > 0
    > db.shapes.find( { radius: { $gt: 0 } } )




Wednesday, December 5, 12
Single Table Inheritance -
  MongoDB
    > db.shapes.find()
     { _id: "1", type: "c", area: 3.14, radius: 1}
     { _id: "2", type: "s", area: 4,    length: 2}
     { _id: "3", type: "r", area: 10,   length: 5, width: 2}

    // find shapes where radius > 0
    > db.shapes.find( { radius: { $gt: 0 } } )

    // create index
    > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } )


                            index only values
                                present!



Wednesday, December 5, 12
One to Many




                            http://bit.ly/Oqbt8z

Wednesday, December 5, 12
One to Many

    One to Many relationships can specify
    • degree of association between objects
    • containment
    • life-cycle




Wednesday, December 5, 12
One to Many
   Embedded Array
     •$slice operator to return subset of comments
     •some queries harder
                   •e.g find latest comments across all blogs
   blogs: {
       author : "Hergé",
       date : ISODate("2011-09-18T09:56:06.298Z"),
       comments : [
   !     { author : "Kyle",
   !   !    date : ISODate("2011-09-19T09:56:06.298Z"),
   !   !    text : "great book" }
       ] }

   > db.blogs.find( { author: "Hergé" },
                    { comment: { $slice : 10 } } )



Wednesday, December 5, 12
One to Many
   Normalized (2 collections)
   • most flexible
   • more queries

   blogs: { _id: 1000,
            author: "Hergé",
            date: ISODate("2011-09-18T09:56:06.298Z"),
            comments: [
   !                      {comment : 1)}
                       ]}

   comments : { _id : 1,
                blog: 1000,
                author : "Kyle",
   !   !        date : ISODate("2011-09-19T09:56:06.298Z")}

   > blog = db.blogs.find( { text: "Destination Moon" } );
   > db.comments.find( { blog: blog._id } ).limit(5);


Wednesday, December 5, 12
Many to Many




                            http://bit.ly/QTzhBF

Wednesday, December 5, 12
Many - Many

  Example:

  • Blog can have many Tags
  • Tag can be used by many Blogs




Wednesday, December 5, 12
Many - Many
   // Each Tag lists the "_id" of the Blog
   tags:
      { _id: 20,
        name: "comic", // Unique
        blog_ids: [ 10, 11, 12 ] }

           { _id: 30,
             name: "movie",     // Unique
             blog_ids: [ 10 ] }




Wednesday, December 5, 12
Many - Many
   // Each Tag lists the "_id" of the Blog
   tags:
      { _id: 20,
        name: "comic", // Unique
        blog_ids: [ 10, 11, 12 ] }

           { _id: 30,
             name: "movie",     // Unique
             blog_ids: [ 10 ] }

   // Each Blog lists the "tag" of the Tags
   blogs:
      { _id: 10, name: "Destination Moon",
        tags: [ "comic", "movie" ] }




Wednesday, December 5, 12
Many - Many
   // Each Tag lists the "_id" of the Blog
   tags:
      { _id: 20,
        name: "comic", // Unique
        blog_ids: [ 10, 11, 12 ] }
                                            links via unique key, in this
           { _id: 30,                        case "tags", could be "_id"
             name: "movie",     // Unique
             blog_ids: [ 10 ] }

   // Each Blog lists the "tag" of the Tags
   blogs:
      { _id: 10, name: "Destination Moon",
        tags: [ "comic", "movie" ] }




Wednesday, December 5, 12
Many - Many
   // Each Tag lists the "_id" of the Blog
   tags:
      { _id: 20,
        name: "comic", // Unique
        blog_ids: [ 10, 11, 12 ] }

           { _id: 30,
             name: "movie",     // Unique
             blog_ids: [ 10 ] }

   // Each Blog lists the "tag" of the Tags
   blogs:
      { _id: 10, name: "Destination Moon",
        tags: [ "comic", "movie" ] }

   // All Tags for a given Blog
   > db.tags.find( { blog_ids: 10 } )


Wednesday, December 5, 12
Use _id or not?

 blogs:                          blogs:
  { _id: 10, name: "..."          { _id: 10, name: "..."
    tags: [ "comic", "movie" ]      tags: [ 10, 20 ]
  }                               }

 Pros:                           Pros:
 • Single query                  • Single update
 Cons:                           Cons:
 • Cascade any changes           • Second query required




Wednesday, December 5, 12
Alternative
   // Each Blog lists the _id of the Tag
   blogs:
      { _id: 10, name: "Destination Moon",
        tag_ids: [ 20, 30 ] }

   // Association not stored on the Tag
   tags:
      { _id: 20,
        name: "comic" }




Wednesday, December 5, 12
Alternative
   // Each Blog lists the _id of the Tag
   blogs:
      { _id: 10, name: "Destination Moon",
        tag_ids: [ 20, 30 ] }

   // Association not stored on the Tag
   tags:
      { _id: 20,
        name: "comic" }

   // All Blogs for a given Tag
   > db.blogs.find( { tag_ids: 20 } )




Wednesday, December 5, 12
Alternative
   // Each Blog lists the _id of the Tag
   blogs:
      { _id: 10, name: "Destination Moon",
        tag_ids: [ 20, 30 ] }

   // Association not stored on the Tag
   tags:
      { _id: 20,
        name: "comic" }

   // All Blogs for a given Tag
   > db.blogs.find( { tag_ids: 20 } )

   // All Tags for a given Blog
   > blog = db.blogs.findOne( { _id: 10 } )
   > db.tags.find({_id: {$in : blog.tag_ids}})



Wednesday, December 5, 12
Many - Many
  Intersection Attributes
  Example:

  • Blog can have many Tags
  • Tag can be used my many Blogs
  • When a Tag is used, record the usage date




Wednesday, December 5, 12
Many - Many
  Normalized
   // Each Blog lists the _id of the Tag
   blogs: { _id: 10, name: "...", tag_ids: [ 20, 30 ] }

   // Association not stored on the Tag
   tags: { _id: 20, name: "comic" }

   // Store the interaction and usage date
   usages: { blog_id: 10, // Blog _id
             tag_id : 20, // Tag _id
             usage: ISODate("2012-10-12...") }

   // Find the Tags for a Blog
   for(var c = db.usages.find({ blog_id: 10 });
   c.hasNext(); )
   { u = c.next();
     t = db.tags.findOne( { _id: c.tag_id } )
     printjson( u.usage );


Wednesday, December 5, 12
Many - Many
  Intersection Attributes
   // Each Blog lists the Blog Usage Object
   blogs:
      { _id: 10, name: "Destination Moon",
        tags: [
          { tag: "comic", usage: ISODate("2012-10-12...") }
          { tag: "movie", usage: ISODate("2012-09-11...") }
        ] }

   // Find the Tags for a Blog
   > db.blogs.find( { _id: 10 }, { tags: 1} )

   Pros:
   • Usage object encapsulated where used
   Cons:
   • If updates allowed, changes will have to be cascaded

Wednesday, December 5, 12
Summary

       • Single biggest performance factor
       • More choices than in an RDBMS
       • Embedding, index design, shard keys




Wednesday, December 5, 12
Part Two
       Schema Design




Wednesday, December 5, 12
Lab #1
       Design Schema for Twitter

       • Model each users activity stream
       • Users
               • Name, email address, display name
       • Tweets
               • Text
               • Who
               • Timestamp



Wednesday, December 5, 12
Lab #1 - Solution A
       Two Collections
       // users - one doc per user
       { _id:      "alvin",
          email:   "alvin@10gen.com",
          display: "jonnyeight"
       }

       // tweets - one doc per user per tweet
       {
          user: "bob",
          for: "alvin",
          tweet: "20111209-1231",
          text: "Best Tweet Ever!",
          ts:    ISODate("2011-09-18T09:56:06.298Z")
       }




Wednesday, December 5, 12
Lab #1 - Solution B
       Embedded Tweets
       // users - one doc per user with all tweets
       { _id:        "alvin",
          email:     "alvin@10gen.com",
          display; "jonnyeight",
          tweets: [
       !     {
       !   !    user: "bob",
       !   !    tweet: "20111209-1231",
       !   !    text: "Best Tweet Ever!",
                 ts:     ISODate("2011-09-18T09:56:06.298Z")
       !     }
         ]
       }




Wednesday, December 5, 12
Embedding
        • Great for read performance
        • One seek to load entire object
        • One roundtrip to database
        • Writes can be slow if adding to objects all the time




Wednesday, December 5, 12
Linking or Embedding?


         Linking can make some queries easy

       // Find latest 50 tweets for "alvin"
       > db.tweets.find( { _id:"alvin"}
                       )
                  .sort( {ts:-1} )
                  .limit(50)



       But what effect does this have on the systems?




Wednesday, December 5, 12
Collection 1




                            Index 1




Wednesday, December 5, 12
Collection 1    Virtual
                                      Address
                                      Space 1




                            Index 1             This is your virtual
                                                  memory size
                                                     (mapped)




Wednesday, December 5, 12
Collection 1    Virtual
                                      Address
                                      Space 1

                                                Physical
                                                 RAM



                            Index 1


                                                           This is your
                                                            resident
                                                           memory size




Wednesday, December 5, 12
Collection 1    Virtual              Disk
                                      Address
                                      Space 1

                                                Physical
                                                 RAM



                            Index 1




Wednesday, December 5, 12
Collection 1        Virtual                  Disk
                                          Address
                                          Space 1

                                                        Physical
                                                         RAM



                            Index 1




                                               100 ns
                                      =
                                            10,000,000 ns
                                      =



Wednesday, December 5, 12
Collection 1    Virtual                  Disk
                                      Address
                                      Space 1

                                                Physical
                                                 RAM



                            Index 1




                                                           1



                                                           2
    > db.tweets.find( { _id: "alvin" } )
               .sort( { ts: -1 } )
               .limit(10)                                  3

 Linking = Many seeks + random reads


Wednesday, December 5, 12
Collection 1    Virtual                  Disk
                                      Address
                                      Space 1

                                                Physical
                                                 RAM



                            Index 1




    > db.tweets.find( { _id: "alvin" } )

                                                           1

 Embedding = Large Sequential Read


Wednesday, December 5, 12
Lab #2
       Alternative Schema

       • Display last 10 tweets from today
       • Efficiently use memory and Disk seeks / IOPs




Wednesday, December 5, 12
Lab #2 - Solution
       Buckets
           // tweets : one doc per user per day
           > db.tweets.findOne()

           {
                   _id:    "alvin-2011/12/09",
                   email: "alvin@10gen.com",
                   tweets: [
                      { user: "Bob",
           !            tweet: "20111209-1231",
           !            text: "Best Tweet Ever!" } ,
                   ! { author: "Joe",
           !            tweet: "20111210-9025",
           !            date:   "May 27 2011",
           !            text:   "Stuck in traffic (again)" }
                   ]
           }


Wednesday, December 5, 12
Lab #2 - Solution
       Last 10 Tweets
    > db.tweets.find( { _id: "alvin-2011/12/09" },
                    { tweets: { $slice : 10 } }
                  )
               .sort( { _id: -1 } )
               .limit(1)




Wednesday, December 5, 12
Lab #2 - Solution
       Adding a Tweet
       > tweet = { user: "Bob",
       !           tweet: "20111209-1231",
       !           text: "Best Tweet Ever!" }

       > db.tweets.update( { _id : "alvin-2011/12/09" },
                           { $push : { tweets : tweet } );




Wednesday, December 5, 12
Lab #2 - Solution
       Getting All Tweets
       > cursor = db.tweets.find
                  ( { _id : /^alvin/ } ).sort( { _id : -1 } )

       > while ( cursor.hasNext() ) {
             doc = cursor.next();
             for ( var i=0; i<doc.tweets.length; i++ )
                   printjson( doc.tweets[i] )
       }




Wednesday, December 5, 12
Lab #2 - Solution
       Deleting a Tweet
       > db.tweets.update(
          { _id: "alvin-20111209" },
          { $pull: { tweets: { tweet: "20111209-1231"   } }
       )




Wednesday, December 5, 12
Collection 1    Virtual                  Disk
                                      Address
                                      Space 1

                                                Physical
                                                 RAM



                            Index 1




  > db.tweets.find( { _id: "alvin-2011/12/09" },
                    { tweets: { $slice : 10 } } )          1
             .sort( { _id: -1 } )
             .limit(1)

     Bucket = 1 seek + 1 sequential read

Wednesday, December 5, 12
Trees




                            http://bit.ly/Oqc8Xs

Wednesday, December 5, 12
Trees

    Hierarchical information



    	
  	
  




Wednesday, December 5, 12
Trees

    Full Tree in Document

    { retweet: [
        { who: “Kyle”, text: “...”,
          retweet: [
             {who: “James”, text: “...”,
                retweet: []}
          ]}
      ]
    }

    Pros: Single Document, Performance, Intuitive

    Cons: Hard to search, Partial Results, 16MB limit

    	
  	
  
Wednesday, December 5, 12
Array of Ancestors                                        A       B       C

   // Store                 all Ancestors of a node                 E       D
     { _id:                 "a" }
     { _id:                 "b", tree: [ "a" ],        retweet:   "a"   }   F
     { _id:                 "c", tree: [ "a", "b" ],   retweet:   "b"   }
     { _id:                 "d", tree: [ "a", "b" ],   retweet:   "b"   }
     { _id:                 "e", tree: [ "a" ],        retweet:   "a"   }
     { _id:                 "f", tree: [ "a", "e" ],   retweet:   "e"   }




Wednesday, December 5, 12
Array of Ancestors                                        A       B       C

   // Store                 all Ancestors of a node                 E       D
     { _id:                 "a" }
     { _id:                 "b", tree: [ "a" ],        retweet:   "a"   }   F
     { _id:                 "c", tree: [ "a", "b" ],   retweet:   "b"   }
     { _id:                 "d", tree: [ "a", "b" ],   retweet:   "b"   }
     { _id:                 "e", tree: [ "a" ],        retweet:   "a"   }
     { _id:                 "f", tree: [ "a", "e" ],   retweet:   "e"   }

   // find all direct retweets of "b"
   > db.tweets.find( { retweet: "b" } )

   // find all retweets of "e" anywhere in tree
   > db.tweets.find( { tree: "e" } )

   // find tweet history of f:
   > tweets = db.tweets.findOne( { _id: "f" } ).tree
   > db.tweets.find( { _id: { $in : tweets } } )

Wednesday, December 5, 12
Trees as Paths                            A       B   C

                                                    E   D
    Store hierarchy as a path expression
    • Separate each node by a delimiter, e.g. “/”       F
    • Use text search for find parts of a tree
    { retweets: [
         { _id: "a", text: "initial tweet",
           path: "a" },
         { _id: "b", text: "reweet with comment",
           path: "a/b" },
         { _id: "c", text: "reply to retweet",
           path : "a/b/c"} ] }

    // Find the conversations "a" started
    > db.tweets.find( { path: /^a/i } )




Wednesday, December 5, 12
Queues & Workflows




                            http://bit.ly/QeNsPX

Wednesday, December 5, 12
Lab #3
       Following Requests
       • Users are allowed to "follow" another user
               • User send a "follow" request
               • Follower approves or not
               • Requests are timed out after 7 days
       • The approval is an async process




Wednesday, December 5, 12
Lab #3 - Solution
  Queues & Workflows
    • Need to maintain order and state
    • Ensure that updates are atomic
     > db.approvals.insert(
          { inprogress: false,
            approved:    false,
            priority:    1,
            text:        "Hey Jim, want to follow you!"
          } );
    // find highest priority approval and mark as in-progress
    job = db.approvals.findAndModify({
                   query: { inprogress: false },
                   sort:    { priority: -1 },
                   update: { $set: { inprogress: true,
                                      started: new Date() } },
                   new: true})


Wednesday, December 5, 12
Lab #3 - Solution
  Queues & Workflows
    • Need to maintain order and state
    • Ensure that updates are atomic
     > db.approvals.insert(
          { inprogress: false,
            approved:    false,
            priority:    1,
            text:        "Hey Jim, want to follow you!"
          } );
    // find highest priority approval and mark as in-progress
    job = db.approvals.findAndModify({
                   query: { inprogress: false },
                   sort:    { priority: -1 },
                   update: { $set: { inprogress: true,
                                      started: new Date() } },
                   new: true})


Wednesday, December 5, 12
Lab #3 - Solution
  Queues & Workflows
                                updated

            { inprogress: true,
              priority: 1,
              approved: False,
              started: ISODate("2011-09-18T09:56:06.298Z")
            ...
            }
                                 added




Wednesday, December 5, 12
Lab #3 - Solution
  Queues & Workflows
    • Follower approves request
    // update approval after receiving approval
    > job = db.approvals.update(
                     { _id: "1234" },
                     { $set: { approved: true } } )

    • System times out request after 7 days
    var limit=new Date();
    limit.setDate(limit.getDate()-7);

    > job = db.approvals.update(
                     { inprogress: true,
                       started: { $gt: limit} },
                     { $set: { approved: false } } )


Wednesday, December 5, 12
Lab #4
       Voting

         Twitter meets Stack Overflow

       • Users can "vote" for a tweet
       • A user can "vote" once and only once
       • Need to display current votes




Wednesday, December 5, 12
Lab #4 - Solution
       Votes
       // One document per voter per tweet
       > db.votes.insert(
             { tweet: "20111209-1231",
               voter: "alvin"
             } );

       // Unique index guarantees the user can't vote twice
       > db.votes.ensureIndex( { tweet: 1, voter: 1 },
                               { unique: true } );

       // Count will return the number of votes cast
       > db.votes.find({ tweet: "20111209-1231" }).count()




Wednesday, December 5, 12
Count or Not?

       • Indexes in MongoDB are not counting
       • The count has to be computed via a index scan
       // One summary document per tweet, no "voter" key
       > db.votes.update(
             { tweet: "20111209-1231",
               voter: { $exists: false } },
             { "$inc": { count: 1 } },
             true, false );

       // Return the count for the no "voter" document
       > db.votes.find( { tweet: "20111209-1231",
                          voter: { $exists: false } },
                        { count: 1, _id: 0} )


Wednesday, December 5, 12
Lab #5
       Time Series
       • Records votes by
               • Day, Hour, Minute
       • Show time series of votes cast




Wednesday, December 5, 12
Lab #5 - Solution A
       Time Series
       // Time series buckets, hour and minute sub-docs
       { _id: "20111209-1231",
         ts: ISODate("2011-12-09T00:00:00.000Z")
         daily: 67,
         hourly: { 0: 23, 1: 14, 2: 19 ... 23: 72 },
         minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 }
       }




Wednesday, December 5, 12
Lab #5 - Solution A
       Time Series
       // Add one to the last minute before midnight
       > db.votes.update(
          { _id: "20111209-1231",
            ts: ISODate("2011-12-09T00:00:00.037Z") },
          { $inc: { daily: 1 },
            $inc: { "hourly.23": 1 },
            $inc: { "minute.1439": 1 } )


       What is the cost of updating the minute before
       midnight?




Wednesday, December 5, 12
BSON Storage

       • Sequence of key/value pairs
       • NOT a hash map
       • Optimized to scan quickly


                      0 1 2 3 ... 1439

       • 1439 skips



Wednesday, December 5, 12
BSON Storage
     • Can skip sub-documents


                            0                1          ...       23
               1            ...   59   60   ...   119         1380 ... 1439


     • 23 skips (hours) + 59 skips (minutes) = 82 skips



Wednesday, December 5, 12
Lab #5 - Solution B
       Time Series
       // Time series buckets, each hour a sub-document
       { _id: "20111209-1231",
         ts: ISODate("2011-12-09T00:00:00.000Z")
         daily: 67,
         minute: { 0: { 0: 0, 1: 7, ... 59: 2 },
                    ...
                   23: { 0: 15,      ... 59: 6 } }
       }

       // Add one to the last second before midnight
       > db.votes.update(
          { _id: "20111209-1231" },
            ts: ISODate("2011-12-09T00:00:00.000Z") },
          { $inc: { daily: 1 },
            $inc: { "minute.23.59": 1 } })


Wednesday, December 5, 12
Lab #6
       Inventory

       • User has a number of "votes" they can use




Wednesday, December 5, 12
Lab #6 - Solution
  Inventory
       // Number of votes and who voted for
       { _id:   "alvin",
         votes: 42,
         voted_for: []
       }

       // Subtract a vote and add the voted for tweet
       // "20111209-1231"
       > db.user.update(
                 { _id: "alvin",
                   votes : { $gt : 0},
                   voted_for: { $ne: "20111209-1231" }},

                            { "$push": { voted_for: "20111209-1231"},
                              "$inc": { votes: -1}
                            } )


Wednesday, December 5, 12
Lab #6 - Solution
  Inventory
       // After vote
                                  decremented
       > db.votes.findOne()
          { _id:   "alvin",
            votes: 41,
            voted_for: ["20111209-1231"]
          }

                                    added




Wednesday, December 5, 12
Lab #7
       Statistic Buckets
       • Record referring web sites on customer sign up
       • Independent counter for each web site




Wednesday, December 5, 12
Lab #7 - Solution A
       Statistic Buckets
       { _id: "alvin",
         referrers: [
               { domain: "www.google.co.uk", count: 4 },
               { domain: "www.yahoo.com",    count: 1 },
         ] }




Wednesday, December 5, 12
Lab #7 - Solution A
       Statistic Buckets
       { _id: "alvin",
         referrers: [
               { domain: "www.google.co.uk", count: 4 },
               { domain: "www.yahoo.com",    count: 1 },
         ] }

       > db.referers.update(
           { "referrers.domain": "www.google.co.uk" },
           { $inc: { "referrers.$.count": 1 } } )




Wednesday, December 5, 12
Lab #7 - Solution A
       Statistic Buckets
       { _id: "alvin",
         referrers: [
               { domain: "www.google.co.uk", count: 4 },
               { domain: "www.yahoo.com",    count: 1 },
         ] }

       > db.referers.update(
           { "referrers.domain": "www.google.co.uk" },
           { $inc: { "referrers.$.count": 1 } } )




Wednesday, December 5, 12
Lab #7 - Solution A
       Statistic Buckets
       { _id: "alvin",
         referrers: [
               { domain: "www.google.co.uk", count: 4 },
               { domain: "www.yahoo.com",    count: 1 },
         ] }

       > db.referers.update(
           { "referrers.domain": "www.google.co.uk" },
           { $inc: { "referrers.$.count": 1 } } )

       { _id: "alvin",
         referrers: [
               { domain: "www.google.co.uk", count: 5 },
               { domain: "www.yahoo.com",    count: 1 },
         ] }


Wednesday, December 5, 12
Lab #7 - Solution A
       Statistic Buckets
       { _id: "alvin",
         referrers: [
               { domain: "www.google.co.uk", count: 4 },
               { domain: "www.yahoo.com",    count: 1 },
         ] }

       > db.referers.update(
           { "referrers.domain": "www.google.co.uk" },
           { $inc: { "referrers.$.count": 1 } } )

       { _id: "alvin",
         referrers: [
               { domain: "www.google.co.uk", count: 5 },
               { domain: "www.yahoo.com",    count: 1 },
         ] }


Wednesday, December 5, 12
Lab #7 - Solution A
       Statistic Buckets

       > db.referers.update(
           { "referrers.domain": "www.bing.com" },
           { $inc: {"referrers.$.count": 1 } }, false, true )

       What happens if a new referring site is used?




Wednesday, December 5, 12
Lab #7 - Solution B
       Statistic Buckets
       // Need to replace dots with underscores
       { _id: "alvin",
         referrers:
            { "www_google_co_uk": 4,
              "www_yahoo_com": 1 },
            }

       // simple $inc will add www_bing_com if not present
       > db.referers.update(
            { _id: "alvin" },
            { $inc: { "referrers.www_bing_com": 1 } },
            true, false);




Wednesday, December 5, 12
Part Three
       Sharding




Wednesday, December 5, 12
What is Sharding

       • Ad-hoc partitioning
       • Consistent hashing
               • Amazon Dynamo
       • Range based partitioning
               • Google BigTable
               • Yahoo! PNUTS
               • MongoDB



Wednesday, December 5, 12
MongoDB Sharding

       • Automatic partitioning and management
       • Range based
       • Convert to sharded system with no downtime
       • Fully consistent
       • No code changes required




Wednesday, December 5, 12
Sharding - Range distribution
                     sh.shardCollection("mydb.tweets",	
  {_id:	
  1}	
  ,	
  false)


                     shard01                     shard02                       shard03




Wednesday, December 5, 12
Sharding - Range distribution


                     shard01      shard02   shard03

                            a-i     j-r      s-z




Wednesday, December 5, 12
Sharding - Splits


                     shard01      shard02   shard03

                            a-i   ja-jz      s-z
                                   k-r




Wednesday, December 5, 12
Sharding - Splits


                     shard01      shard02   shard03

                            a-i    ja-ji     s-z
                                   ji-js
                                  js-jw
                                   jz-r

Wednesday, December 5, 12
Sharding - Auto Balancing


                     shard01      shard02   shard03

                            a-i    ja-ji     s-z
                                   ji-js
                     js-jw        js-jw
                                   jz-r      jz-r

Wednesday, December 5, 12
Sharding - Auto Balancing


                     shard01      shard02   shard03

                            a-i    ja-ji     s-z
                                   ji-js
                     js-jw
                                             jz-r

Wednesday, December 5, 12
Sharding for caching




Wednesday, December 5, 12
Sharding for caching
                     96 GB Mem
                    3:1 Data/Mem


                     shard01

                            a-i
    300 GB Data




                            j-r
                            s-z

                            300 GB



Wednesday, December 5, 12
Aggregate Horizontal Resources
                     96 GB Mem        96 GB Mem      96 GB Mem
                    1:1 Data/Mem     1:1 Data/Mem   1:1 Data/Mem


                     shard01         shard02        shard03

                            a-i         j-r           s-z
    300 GB Data




                            j-r
                            s-z

                            100 GB     100 GB        100 GB


Wednesday, December 5, 12
Sharding Features
       • Shard data without no downtime
       • Automatic balancing as data is written
       • Commands routed (switched) to correct node
               • Inserts - must have the Shard Key
               • Updates - can have the Shard Key
               • Queries
                       • With Shard Key - routed to nodes
                       • Without Shard Key - scatter gather
               • Indexed / Sorted Queries
                       • With Shard Key - routed in order
                       • Without Shard Key - distributed sort merge

Wednesday, December 5, 12
Lab #8
       Sharding Twitter Pictures

       User can upload pictures to Twitter feed

                { photo_id :   ???? , data : <binary> }



      What should photo_id be?
      How will photo_id be sharded?




Wednesday, December 5, 12
Lab #8
       Sharding Key
               { photo_id :   ???? , data : <binary> }

               What’s the right key?
               • auto increment
               • MD5( data )
               • month() + MD5( data )




Wednesday, December 5, 12
Right balanced access
  • Only have to keep small
  portion in ram
                               • Time Based
  • Right shard "hot"          • ObjectId
                               • Auto Increment




Wednesday, December 5, 12
Random access

   • Have to keep entire
   index in ram
   • All shards "warm"
                            • Hash




Wednesday, December 5, 12
Segmented access

   • Have to keep some
   index in ram
   • Some shards "warm"
                            •Month + Hash




Wednesday, December 5, 12
Lab #9
       Single Identities
       // Shard by _id
       ids:
       { _id :      "alvin",
         email:     "alvin@10gen.com",
         addresses: [ { state : "CA", country: "USA" },
                       { country: "UK" } ]
       }

       How would the following queries be executed?

       > db.ids.find( { _id: "alvin"} )
       > db.ids.find( { email: "alvin@10gen.com" } )




Wednesday, December 5, 12
Sharding - Routed Query
                                           find(	
  {	
  _id:	
  "alvin"}	
  )




                     shard01      shard02                         shard03

                            a-i    ja-ji                             s-z
                                   ji-js
                     js-jw
                                                                    jz-r

Wednesday, December 5, 12
Sharding - Routed Query
                                           find(	
  {	
  _id:	
  "alvin"}	
  )




                     shard01      shard02                         shard03

                            a-i    ja-ji                             s-z
                                   ji-js
                     js-jw
                                                                    jz-r

Wednesday, December 5, 12
Sharding - Scatter Gather
                                       find(	
  {	
  email:	
  "alvin@10gen.com"	
  }	
  )




                     shard01      shard02                      shard03

                            a-i    ja-ji                         s-z
                                   ji-js
                     js-jw
                                                                 jz-r

Wednesday, December 5, 12
Sharding - Scatter Gather
                                       find(	
  {	
  email:	
  "alvin@10gen.com"	
  }	
  )




                     shard01      shard02                      shard03

                            a-i    ja-ji                         s-z
                                   ji-js
                     js-jw
                                                                 jz-r

Wednesday, December 5, 12
Lab #9
       Multiple Identities

       User can have multiple identities
      • twitter name
      • email address
      • facebook name
      • etc.
      What is the best sharding key & schema design?




Wednesday, December 5, 12
Lab #9 - Solution A
       Multiple Identities
               // Shard by _id
               { _id:        "alvin",
                 email:      "alvin@10gen.com",
                 fb:         "alvin.richards",   // facebook
                 li:         "alvin.j.richards", // linkedin
                 tweets:     [ ... ]
               }

                Lookup by _id hits 1 node
                Lookup by email, li or fb is scatter gather
                Cannot create a unique index on email, li or fb




Wednesday, December 5, 12
Lab #9 - Solution B
       Multiple Identities
           identities
           { _id: { _id: "alvin"},           info: "1200-42"}
           { _id: { em: "alvin@10gen.com"}, info: "1200-42"}
           { _id: { li: "alvin.j.richards"}, info: "1200-42"}

           tweets
           { _id: "1200-42",
             tweets: [ ... ]
           }

           • Shard identities on { _id: 1}
           • Can create unique index on _id
           • Shard info on { _id: 1 }




Wednesday, December 5, 12
Sharding - Multiple Identities


                     shard01            shard02                shard03
                   em: a-q             em: r-z                _id: a-z


                   _id: "Min"-         li: d-r
                   "1100"
                  li: s-z              _id: "1100"-           _id: "1200"-
                                       "1200"                 "Max"
                                                              li: a-c


                                 ids             tweets
                                 collection      collection
Wednesday, December 5, 12
Sharding - Multiple Identities
                                                 ids.find({	
  _id:
                                                 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {"em","alvin@10gen.com	
  })




                     shard01            shard02                                    shard03
                   em: a-q             em: r-z                                   _id: a-z


                   _id: "Min"-         li: d-r
                   "1100"
                  li: s-z              _id: "1100"-                              _id: "1200"-
                                       "1200"                                    "Max"
                                                                                 li: a-c


                                 ids             tweets
                                 collection      collection
Wednesday, December 5, 12
Sharding - Multiple Identities
                                                 ids.find({	
  _id:
                                                 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {"em","alvin@10gen.com	
  })

                                                 tweets.find({	
  _id:	
  "1200-­‐42"	
  })


                     shard01            shard02                                    shard03
                   em: a-q             em: r-z                                   _id: a-z


                   _id: "Min"-         li: d-r
                   "1100"
                  li: s-z              _id: "1100"-                              _id: "1200"-
                                       "1200"                                    "Max"
                                                                                 li: a-c


                                 ids             tweets
                                 collection      collection
Wednesday, December 5, 12
Part Four
       Replication




Wednesday, December 5, 12
Types of outage
       • Planned
               • Hardware upgrade
               • O/S or file-system tuning
               • Relocation of data to new file-system / storage
               • Software upgrade

       • Unplanned
               • Hardware failure
               • Data center failure
               • Region outage
               • Human error
               • Application corruption

Wednesday, December 5, 12
Replica Sets

       • Data Protection
               • Multiple copies of the data
               • Spread across Data Centers, AZs
       • High Availability
               • Automated Failover
               • Automated Recovery




Wednesday, December 5, 12
Replica Sets


               App          Write
                                     Primary
                                                Asynchronous
                            Read                 Replication

                                    Secondary
                            Read


                                    Secondary
                            Read




Wednesday, December 5, 12
Replica Sets


               App          Write
                                     Primary
                            Read

                                    Secondary
                            Read


                                    Secondary
                            Read




Wednesday, December 5, 12
Replica Sets


               App
                                     Primary

                            Write
                                     Primary    Automatic Election of
                                                    new Primary
                            Read

                                    Secondary
                            Read




Wednesday, December 5, 12
Replica Sets


               App
                                    Recovering

                            Write                New primary serves
                                     Primary            data
                            Read

                                    Secondary
                            Read




Wednesday, December 5, 12
Replica Sets


               App
                                    Secondary
                            Read

                            Write
                                     Primary
                            Read

                                    Secondary
                            Read




Wednesday, December 5, 12
Elections

       During an election
       • Most up to date
       • Highest priority
       • Less than 10s behind failed Primary




Wednesday, December 5, 12
Types of Durability with
       MongoDB
       • Fire and forget
       • Wait for error
       • Wait for fsync
       • Wait for journal sync
       • Wait for replication




Wednesday, December 5, 12
Network Ack- Old Default
              Driver                Primary
                            write

                                              apply	
  in	
  memory




Wednesday, December 5, 12
Get last error - New default
              Driver                       Primary
                               write
                            getLastError             apply	
  in	
  memory




Wednesday, December 5, 12
Wait for Journal Sync
              Driver                       Primary
                               write
                            getLastError             apply	
  in	
  memory
                              j:true
                                                     Write	
  to	
  journal




Wednesday, December 5, 12
Wait for replication
              Driver                       Primary                           Secondary
                               write
                            getLastError             apply	
  in	
  memory
                              w:2
                                                           replicate




Wednesday, December 5, 12
Tunable Data Durability
                            Memory   Journal   Secondary Other Data Center
    RDBMS

  network                                                            async
  ACK
     w=1

    w=1
   j=true                                                            sync

w="majority"
   w=n
w="myTag"

              Less                                                   More



Wednesday, December 5, 12
Eventual Consistency
       Using Replicas for Reads
    Read	
  preference
    • primary (only)
    • primaryPreferred
    • secondary (only)
    • secondaryPreferred
    • nearest




Wednesday, December 5, 12
Immediate Consistency

                      Thread #1      Primary

                            Insert     v1



                            Read        ✔
                      Update           v2




                            Read        ✔


Wednesday, December 5, 12
Eventual Consistency

                      Thread #1      Primary   Secondary   Thread #2

                            Insert     v1
                                                                v1 does not
                                                                   exist
                            Read        ✔        ✖
                                                  v1
                                                                 reads v1
                      Update           v2
                                                 ✔
                            Read        ✔        ✖               reads v1
                                                  v2

                                                  ✔              reads v2



Wednesday, December 5, 12
Lab #10
       Replication

       Primary, Secondary or both?

      • Show the latest "votes" for a tweet and/or user
      • Changing your profile picture
      • Showing your thumbnail with a tweet




Wednesday, December 5, 12
Summary

       • Schema design is different in MongoDB
       • Basic data design principals stay the same
       • Focus on how the application manipulates data
       • Rapidly evolve schema to meet your requirements
       • Consider sharding early
       • Understand the impact of eventual consistency


Wednesday, December 5, 12
download at mongodb.org




                                 conferences,	
  appearances,	
  and	
  meetups
                                                          http://www.10gen.com/events




                       Facebook	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  	
  	
  Twitter	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  	
  	
  LinkedIn
                   http://bit.ly/mongo>	
                                                      @mongodb                                                 http://linkd.in/joinmongo




Wednesday, December 5, 12

Contenu connexe

Tendances

Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleMongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patternsjoergreichert
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignAlex Litvinok
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB
 
Building your first app with mongo db
Building your first app with mongo dbBuilding your first app with mongo db
Building your first app with mongo dbMongoDB
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling rogerbodamer
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDBantoinegirbal
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDBNate Abele
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know Norberto Leite
 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real WorldMike Friedman
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Steven Francia
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
 

Tendances (20)

Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
Elastic search 검색
Elastic search 검색Elastic search 검색
Elastic search 검색
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patterns
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
 
Building your first app with mongo db
Building your first app with mongo dbBuilding your first app with mongo db
Building your first app with mongo db
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real World
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 

Similaire à MongoSV Schema Workshop

Schema design
Schema designSchema design
Schema designchristkv
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)MongoDB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDBDoThinger
 
MongoDB @ Frankfurt NoSql User Group
MongoDB @  Frankfurt NoSql User GroupMongoDB @  Frankfurt NoSql User Group
MongoDB @ Frankfurt NoSql User GroupChris Harris
 
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyOSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyNETWAYS
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDBNorberto Leite
 
Webinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real WorldWebinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real WorldMongoDB
 
Data Modeling Examples from the Real World
Data Modeling Examples from the Real WorldData Modeling Examples from the Real World
Data Modeling Examples from the Real WorldMongoDB
 
Thomas risberg mongosv-2012-spring-data-cloud-foundry
Thomas risberg mongosv-2012-spring-data-cloud-foundryThomas risberg mongosv-2012-spring-data-cloud-foundry
Thomas risberg mongosv-2012-spring-data-cloud-foundrytrisberg
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
 
MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011Steven Francia
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
MongoDB Strange Loop 2009
MongoDB Strange Loop 2009MongoDB Strange Loop 2009
MongoDB Strange Loop 2009Mike Dirolf
 

Similaire à MongoSV Schema Workshop (20)

Schema design
Schema designSchema design
Schema design
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)
 
MongoDB at GUL
MongoDB at GULMongoDB at GUL
MongoDB at GUL
 
Full metal mongo
Full metal mongoFull metal mongo
Full metal mongo
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
MongoDB @ Frankfurt NoSql User Group
MongoDB @  Frankfurt NoSql User GroupMongoDB @  Frankfurt NoSql User Group
MongoDB @ Frankfurt NoSql User Group
 
MongoDB at RuPy
MongoDB at RuPyMongoDB at RuPy
MongoDB at RuPy
 
Latinoware
LatinowareLatinoware
Latinoware
 
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyOSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Webinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real WorldWebinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real World
 
Data Modeling Examples from the Real World
Data Modeling Examples from the Real WorldData Modeling Examples from the Real World
Data Modeling Examples from the Real World
 
Thomas risberg mongosv-2012-spring-data-cloud-foundry
Thomas risberg mongosv-2012-spring-data-cloud-foundryThomas risberg mongosv-2012-spring-data-cloud-foundry
Thomas risberg mongosv-2012-spring-data-cloud-foundry
 
lecture_34e.pptx
lecture_34e.pptxlecture_34e.pptx
lecture_34e.pptx
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
MongoDB Strange Loop 2009
MongoDB Strange Loop 2009MongoDB Strange Loop 2009
MongoDB Strange Loop 2009
 

Plus de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

MongoSV Schema Workshop

  • 1. Schema Design Workshop Sridhar Nanjundeswaran Software Engineer, 10Gen sridhar@10gen.com @snanjund Wednesday, December 5, 12
  • 2. Agenda • Part One - Basic Schema & Patterns • Part Two - Schema Design • Part Three - Sharding • Part Four: - Replication Wednesday, December 5, 12
  • 3. Why is schema design different? • RDBMS design you ask "what answers do I have" • MongoDB you ask "what questions will I have" Wednesday, December 5, 12
  • 4. Goals • Learn Data Modeling with MongoDB • Labs to try to solve problems • Understand implications of • Replication • Sharding Please, ask many, many questions! Wednesday, December 5, 12
  • 5. Part One Basic Schema & Patterns Wednesday, December 5, 12
  • 6. So why model data? http://bit.ly/SSs7QB Wednesday, December 5, 12
  • 7. Normalization • 1970 E.F.Codd introduces 1st Normal Form (1NF) • 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF) • 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF) • 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF) Goals: • Avoid anomalies when inserting, updating or deleting • Minimize redesign when extending the schema • Make the model informative to users • Avoid bias towards a particular style of query * source : wikipedia Wednesday, December 5, 12
  • 8. So today’s example will use... http://bit.ly/RyIOvO Wednesday, December 5, 12
  • 9. Terminology RDBMS MongoDB Table Collection Row(s) JSON  Document Index Index Join Embedding  &  Linking Partition Shard Partition  Key Shard  Key Wednesday, December 5, 12
  • 10. Schema Design Relational Database Wednesday, December 5, 12
  • 11. Schema Design MongoDB Wednesday, December 5, 12
  • 12. Schema Design MongoDB linking Wednesday, December 5, 12
  • 13. Schema Design embedding MongoDB linking Wednesday, December 5, 12
  • 14. Basic schema Design documents that simply map to your application > post = { author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "movie"] } > db.blogs.save(post) Wednesday, December 5, 12
  • 15. Find the document > db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "movie" ] } Notes: • ID must be unique, but can be anything you’d like • MongoDB will generate a default ID if one is not supplied Wednesday, December 5, 12
  • 16. Add an index, find via Index Secondary index for “author” // 1 means ascending, -1 means descending > db.blogs.ensureIndex( { author: 1 } ) > db.blogs.find( { author: 'Hergé' } ) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Hergé", ... } Wednesday, December 5, 12
  • 17. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { ! "cursor" : "BtreeCursor author_1", ! "nscanned" : 1, ! "nscannedObjects" : 1, ! "n" : 1, ! "millis" : 5, ! "indexBounds" : { ! ! "author" : [ ! ! ! [ ! ! ! ! "Hergé", ! ! ! ! "Hergé" ! ! ! ] ! ! ] ! } } Wednesday, December 5, 12
  • 18. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { ! "cursor" : "BtreeCursor author_1", ! "nscanned" : 1, ! "nscannedObjects" : 1, ! "n" : 1, ! "millis" : 5, ! "indexBounds" : { ! ! "author" : [ ! ! ! [ ! ! ! ! "Hergé", ! ! ! ! "Hergé" ! ! ! ] ! ! ] ! } } Wednesday, December 5, 12
  • 19. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { ! "cursor" : "BtreeCursor author_1", ! "nscanned" : 1, ! "nscannedObjects" : 1, Number of objects ! "n" : 1, returned ! "millis" : 5, ! "indexBounds" : { How long it took ! ! "author" : [ ! ! ! [ ! ! ! ! "Hergé", ! ! ! ! "Hergé" ! ! ! ] ! ! ] ! } } Wednesday, December 5, 12
  • 20. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags > db.blogs.find( { tags: { $exists: true } } ) Regular expressions: // posts where author starts with h > db.blogs.find( { author: /^h/i } ) Counting: // number of posts written by Hergé > db.blogs.find( { author: "Hergé" } ).count() Wednesday, December 5, 12
  • 21. Extending the Schema http://bit.ly/PpjT1l Wednesday, December 5, 12
  • 22. Extending the Schema > new_comment = { author: "Kyle", date: new Date(), text: "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } ) Wednesday, December 5, 12
  • 23. Extending the Schema > new_comment = { author: "Kyle", date: new Date(), text: "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } ) Add element to Increment counter array Wednesday, December 5, 12
  • 24. Extending the Schema > db.blogs.find( { author: "Hergé"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "movie" ], comments : [ ! { ! ! author : "Kyle", ! ! date : ISODate("2011-09-19T09:56:06.298Z"), ! ! text : "great book" ! } ], comments_count: 1 } Wednesday, December 5, 12
  • 25. Extending the Schema // create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) // most commented post: > db.blogs.find().sort( { comments_count: -1 } ).limit(1) When sorting, check if you need an index Wednesday, December 5, 12
  • 26. Common Patterns http://bit.ly/SNnt4z Wednesday, December 5, 12
  • 27. Inheritance http://bit.ly/T7MqUz Wednesday, December 5, 12
  • 29. Single Table Inheritance - RDBMS select * from shapes; id type area radius length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2 Wednesday, December 5, 12
  • 30. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} missing values not stored! Wednesday, December 5, 12
  • 31. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) Wednesday, December 5, 12
  • 32. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) // create index > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } ) index only values present! Wednesday, December 5, 12
  • 33. One to Many http://bit.ly/Oqbt8z Wednesday, December 5, 12
  • 34. One to Many One to Many relationships can specify • degree of association between objects • containment • life-cycle Wednesday, December 5, 12
  • 35. One to Many Embedded Array •$slice operator to return subset of comments •some queries harder •e.g find latest comments across all blogs blogs: { author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ ! { author : "Kyle", ! ! date : ISODate("2011-09-19T09:56:06.298Z"), ! ! text : "great book" } ] } > db.blogs.find( { author: "Hergé" }, { comment: { $slice : 10 } } ) Wednesday, December 5, 12
  • 36. One to Many Normalized (2 collections) • most flexible • more queries blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), comments: [ ! {comment : 1)} ]} comments : { _id : 1, blog: 1000, author : "Kyle", ! ! date : ISODate("2011-09-19T09:56:06.298Z")} > blog = db.blogs.find( { text: "Destination Moon" } ); > db.comments.find( { blog: blog._id } ).limit(5); Wednesday, December 5, 12
  • 37. Many to Many http://bit.ly/QTzhBF Wednesday, December 5, 12
  • 38. Many - Many Example: • Blog can have many Tags • Tag can be used by many Blogs Wednesday, December 5, 12
  • 39. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } { _id: 30, name: "movie", // Unique blog_ids: [ 10 ] } Wednesday, December 5, 12
  • 40. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } { _id: 30, name: "movie", // Unique blog_ids: [ 10 ] } // Each Blog lists the "tag" of the Tags blogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] } Wednesday, December 5, 12
  • 41. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } links via unique key, in this { _id: 30, case "tags", could be "_id" name: "movie", // Unique blog_ids: [ 10 ] } // Each Blog lists the "tag" of the Tags blogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] } Wednesday, December 5, 12
  • 42. Many - Many // Each Tag lists the "_id" of the Blog tags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] } { _id: 30, name: "movie", // Unique blog_ids: [ 10 ] } // Each Blog lists the "tag" of the Tags blogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] } // All Tags for a given Blog > db.tags.find( { blog_ids: 10 } ) Wednesday, December 5, 12
  • 43. Use _id or not? blogs: blogs: { _id: 10, name: "..." { _id: 10, name: "..." tags: [ "comic", "movie" ] tags: [ 10, 20 ] } } Pros: Pros: • Single query • Single update Cons: Cons: • Cascade any changes • Second query required Wednesday, December 5, 12
  • 44. Alternative // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" } Wednesday, December 5, 12
  • 45. Alternative // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" } // All Blogs for a given Tag > db.blogs.find( { tag_ids: 20 } ) Wednesday, December 5, 12
  • 46. Alternative // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" } // All Blogs for a given Tag > db.blogs.find( { tag_ids: 20 } ) // All Tags for a given Blog > blog = db.blogs.findOne( { _id: 10 } ) > db.tags.find({_id: {$in : blog.tag_ids}}) Wednesday, December 5, 12
  • 47. Many - Many Intersection Attributes Example: • Blog can have many Tags • Tag can be used my many Blogs • When a Tag is used, record the usage date Wednesday, December 5, 12
  • 48. Many - Many Normalized // Each Blog lists the _id of the Tag blogs: { _id: 10, name: "...", tag_ids: [ 20, 30 ] } // Association not stored on the Tag tags: { _id: 20, name: "comic" } // Store the interaction and usage date usages: { blog_id: 10, // Blog _id tag_id : 20, // Tag _id usage: ISODate("2012-10-12...") } // Find the Tags for a Blog for(var c = db.usages.find({ blog_id: 10 }); c.hasNext(); ) { u = c.next(); t = db.tags.findOne( { _id: c.tag_id } ) printjson( u.usage ); Wednesday, December 5, 12
  • 49. Many - Many Intersection Attributes // Each Blog lists the Blog Usage Object blogs: { _id: 10, name: "Destination Moon", tags: [ { tag: "comic", usage: ISODate("2012-10-12...") } { tag: "movie", usage: ISODate("2012-09-11...") } ] } // Find the Tags for a Blog > db.blogs.find( { _id: 10 }, { tags: 1} ) Pros: • Usage object encapsulated where used Cons: • If updates allowed, changes will have to be cascaded Wednesday, December 5, 12
  • 50. Summary • Single biggest performance factor • More choices than in an RDBMS • Embedding, index design, shard keys Wednesday, December 5, 12
  • 51. Part Two Schema Design Wednesday, December 5, 12
  • 52. Lab #1 Design Schema for Twitter • Model each users activity stream • Users • Name, email address, display name • Tweets • Text • Who • Timestamp Wednesday, December 5, 12
  • 53. Lab #1 - Solution A Two Collections // users - one doc per user { _id: "alvin", email: "alvin@10gen.com", display: "jonnyeight" } // tweets - one doc per user per tweet { user: "bob", for: "alvin", tweet: "20111209-1231", text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z") } Wednesday, December 5, 12
  • 54. Lab #1 - Solution B Embedded Tweets // users - one doc per user with all tweets { _id: "alvin", email: "alvin@10gen.com", display; "jonnyeight", tweets: [ ! { ! ! user: "bob", ! ! tweet: "20111209-1231", ! ! text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z") ! } ] } Wednesday, December 5, 12
  • 55. Embedding • Great for read performance • One seek to load entire object • One roundtrip to database • Writes can be slow if adding to objects all the time Wednesday, December 5, 12
  • 56. Linking or Embedding? Linking can make some queries easy // Find latest 50 tweets for "alvin" > db.tweets.find( { _id:"alvin"} ) .sort( {ts:-1} ) .limit(50) But what effect does this have on the systems? Wednesday, December 5, 12
  • 57. Collection 1 Index 1 Wednesday, December 5, 12
  • 58. Collection 1 Virtual Address Space 1 Index 1 This is your virtual memory size (mapped) Wednesday, December 5, 12
  • 59. Collection 1 Virtual Address Space 1 Physical RAM Index 1 This is your resident memory size Wednesday, December 5, 12
  • 60. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 Wednesday, December 5, 12
  • 61. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 100 ns = 10,000,000 ns = Wednesday, December 5, 12
  • 62. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 1 2 > db.tweets.find( { _id: "alvin" } ) .sort( { ts: -1 } ) .limit(10) 3 Linking = Many seeks + random reads Wednesday, December 5, 12
  • 63. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 > db.tweets.find( { _id: "alvin" } ) 1 Embedding = Large Sequential Read Wednesday, December 5, 12
  • 64. Lab #2 Alternative Schema • Display last 10 tweets from today • Efficiently use memory and Disk seeks / IOPs Wednesday, December 5, 12
  • 65. Lab #2 - Solution Buckets // tweets : one doc per user per day > db.tweets.findOne() { _id: "alvin-2011/12/09", email: "alvin@10gen.com", tweets: [ { user: "Bob", ! tweet: "20111209-1231", ! text: "Best Tweet Ever!" } , ! { author: "Joe", ! tweet: "20111210-9025", ! date: "May 27 2011", ! text: "Stuck in traffic (again)" } ] } Wednesday, December 5, 12
  • 66. Lab #2 - Solution Last 10 Tweets > db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) .sort( { _id: -1 } ) .limit(1) Wednesday, December 5, 12
  • 67. Lab #2 - Solution Adding a Tweet > tweet = { user: "Bob", ! tweet: "20111209-1231", ! text: "Best Tweet Ever!" } > db.tweets.update( { _id : "alvin-2011/12/09" }, { $push : { tweets : tweet } ); Wednesday, December 5, 12
  • 68. Lab #2 - Solution Getting All Tweets > cursor = db.tweets.find ( { _id : /^alvin/ } ).sort( { _id : -1 } ) > while ( cursor.hasNext() ) { doc = cursor.next(); for ( var i=0; i<doc.tweets.length; i++ ) printjson( doc.tweets[i] ) } Wednesday, December 5, 12
  • 69. Lab #2 - Solution Deleting a Tweet > db.tweets.update( { _id: "alvin-20111209" }, { $pull: { tweets: { tweet: "20111209-1231" } } ) Wednesday, December 5, 12
  • 70. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 > db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) 1 .sort( { _id: -1 } ) .limit(1) Bucket = 1 seek + 1 sequential read Wednesday, December 5, 12
  • 71. Trees http://bit.ly/Oqc8Xs Wednesday, December 5, 12
  • 72. Trees Hierarchical information     Wednesday, December 5, 12
  • 73. Trees Full Tree in Document { retweet: [ { who: “Kyle”, text: “...”, retweet: [ {who: “James”, text: “...”, retweet: []} ]} ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit     Wednesday, December 5, 12
  • 74. Array of Ancestors A B C // Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } Wednesday, December 5, 12
  • 75. Array of Ancestors A B C // Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) // find tweet history of f: > tweets = db.tweets.findOne( { _id: "f" } ).tree > db.tweets.find( { _id: { $in : tweets } } ) Wednesday, December 5, 12
  • 76. Trees as Paths A B C E D Store hierarchy as a path expression • Separate each node by a delimiter, e.g. “/” F • Use text search for find parts of a tree { retweets: [ { _id: "a", text: "initial tweet", path: "a" }, { _id: "b", text: "reweet with comment", path: "a/b" }, { _id: "c", text: "reply to retweet", path : "a/b/c"} ] } // Find the conversations "a" started > db.tweets.find( { path: /^a/i } ) Wednesday, December 5, 12
  • 77. Queues & Workflows http://bit.ly/QeNsPX Wednesday, December 5, 12
  • 78. Lab #3 Following Requests • Users are allowed to "follow" another user • User send a "follow" request • Follower approves or not • Requests are timed out after 7 days • The approval is an async process Wednesday, December 5, 12
  • 79. Lab #3 - Solution Queues & Workflows • Need to maintain order and state • Ensure that updates are atomic > db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } ); // find highest priority approval and mark as in-progress job = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true}) Wednesday, December 5, 12
  • 80. Lab #3 - Solution Queues & Workflows • Need to maintain order and state • Ensure that updates are atomic > db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } ); // find highest priority approval and mark as in-progress job = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true}) Wednesday, December 5, 12
  • 81. Lab #3 - Solution Queues & Workflows updated { inprogress: true, priority: 1, approved: False, started: ISODate("2011-09-18T09:56:06.298Z") ... } added Wednesday, December 5, 12
  • 82. Lab #3 - Solution Queues & Workflows • Follower approves request // update approval after receiving approval > job = db.approvals.update( { _id: "1234" }, { $set: { approved: true } } ) • System times out request after 7 days var limit=new Date(); limit.setDate(limit.getDate()-7); > job = db.approvals.update( { inprogress: true, started: { $gt: limit} }, { $set: { approved: false } } ) Wednesday, December 5, 12
  • 83. Lab #4 Voting Twitter meets Stack Overflow • Users can "vote" for a tweet • A user can "vote" once and only once • Need to display current votes Wednesday, December 5, 12
  • 84. Lab #4 - Solution Votes // One document per voter per tweet > db.votes.insert( { tweet: "20111209-1231", voter: "alvin" } ); // Unique index guarantees the user can't vote twice > db.votes.ensureIndex( { tweet: 1, voter: 1 }, { unique: true } ); // Count will return the number of votes cast > db.votes.find({ tweet: "20111209-1231" }).count() Wednesday, December 5, 12
  • 85. Count or Not? • Indexes in MongoDB are not counting • The count has to be computed via a index scan // One summary document per tweet, no "voter" key > db.votes.update( { tweet: "20111209-1231", voter: { $exists: false } }, { "$inc": { count: 1 } }, true, false ); // Return the count for the no "voter" document > db.votes.find( { tweet: "20111209-1231", voter: { $exists: false } }, { count: 1, _id: 0} ) Wednesday, December 5, 12
  • 86. Lab #5 Time Series • Records votes by • Day, Hour, Minute • Show time series of votes cast Wednesday, December 5, 12
  • 87. Lab #5 - Solution A Time Series // Time series buckets, hour and minute sub-docs { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 23, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 } } Wednesday, December 5, 12
  • 88. Lab #5 - Solution A Time Series // Add one to the last minute before midnight > db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { daily: 1 }, $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 } ) What is the cost of updating the minute before midnight? Wednesday, December 5, 12
  • 89. BSON Storage • Sequence of key/value pairs • NOT a hash map • Optimized to scan quickly 0 1 2 3 ... 1439 • 1439 skips Wednesday, December 5, 12
  • 90. BSON Storage • Can skip sub-documents 0 1 ... 23 1 ... 59 60 ... 119 1380 ... 1439 • 23 skips (hours) + 59 skips (minutes) = 82 skips Wednesday, December 5, 12
  • 91. Lab #5 - Solution B Time Series // Time series buckets, each hour a sub-document { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } } } // Add one to the last second before midnight > db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { daily: 1 }, $inc: { "minute.23.59": 1 } }) Wednesday, December 5, 12
  • 92. Lab #6 Inventory • User has a number of "votes" they can use Wednesday, December 5, 12
  • 93. Lab #6 - Solution Inventory // Number of votes and who voted for { _id: "alvin", votes: 42, voted_for: [] } // Subtract a vote and add the voted for tweet // "20111209-1231" > db.user.update( { _id: "alvin", votes : { $gt : 0}, voted_for: { $ne: "20111209-1231" }}, { "$push": { voted_for: "20111209-1231"}, "$inc": { votes: -1} } ) Wednesday, December 5, 12
  • 94. Lab #6 - Solution Inventory // After vote decremented > db.votes.findOne() { _id: "alvin", votes: 41, voted_for: ["20111209-1231"] } added Wednesday, December 5, 12
  • 95. Lab #7 Statistic Buckets • Record referring web sites on customer sign up • Independent counter for each web site Wednesday, December 5, 12
  • 96. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } Wednesday, December 5, 12
  • 97. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } ) Wednesday, December 5, 12
  • 98. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } ) Wednesday, December 5, 12
  • 99. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } ) { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 5 },         { domain: "www.yahoo.com", count: 1 }, ] } Wednesday, December 5, 12
  • 100. Lab #7 - Solution A Statistic Buckets { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 4 },         { domain: "www.yahoo.com", count: 1 }, ] } > db.referers.update( { "referrers.domain": "www.google.co.uk" }, { $inc: { "referrers.$.count": 1 } } ) { _id: "alvin", referrers: [         { domain: "www.google.co.uk", count: 5 },         { domain: "www.yahoo.com", count: 1 }, ] } Wednesday, December 5, 12
  • 101. Lab #7 - Solution A Statistic Buckets > db.referers.update( { "referrers.domain": "www.bing.com" }, { $inc: {"referrers.$.count": 1 } }, false, true ) What happens if a new referring site is used? Wednesday, December 5, 12
  • 102. Lab #7 - Solution B Statistic Buckets // Need to replace dots with underscores { _id: "alvin", referrers:      { "www_google_co_uk": 4,        "www_yahoo_com": 1 }, } // simple $inc will add www_bing_com if not present > db.referers.update( { _id: "alvin" }, { $inc: { "referrers.www_bing_com": 1 } }, true, false); Wednesday, December 5, 12
  • 103. Part Three Sharding Wednesday, December 5, 12
  • 104. What is Sharding • Ad-hoc partitioning • Consistent hashing • Amazon Dynamo • Range based partitioning • Google BigTable • Yahoo! PNUTS • MongoDB Wednesday, December 5, 12
  • 105. MongoDB Sharding • Automatic partitioning and management • Range based • Convert to sharded system with no downtime • Fully consistent • No code changes required Wednesday, December 5, 12
  • 106. Sharding - Range distribution sh.shardCollection("mydb.tweets",  {_id:  1}  ,  false) shard01 shard02 shard03 Wednesday, December 5, 12
  • 107. Sharding - Range distribution shard01 shard02 shard03 a-i j-r s-z Wednesday, December 5, 12
  • 108. Sharding - Splits shard01 shard02 shard03 a-i ja-jz s-z k-r Wednesday, December 5, 12
  • 109. Sharding - Splits shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r Wednesday, December 5, 12
  • 110. Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw js-jw jz-r jz-r Wednesday, December 5, 12
  • 111. Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r Wednesday, December 5, 12
  • 113. Sharding for caching 96 GB Mem 3:1 Data/Mem shard01 a-i 300 GB Data j-r s-z 300 GB Wednesday, December 5, 12
  • 114. Aggregate Horizontal Resources 96 GB Mem 96 GB Mem 96 GB Mem 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem shard01 shard02 shard03 a-i j-r s-z 300 GB Data j-r s-z 100 GB 100 GB 100 GB Wednesday, December 5, 12
  • 115. Sharding Features • Shard data without no downtime • Automatic balancing as data is written • Commands routed (switched) to correct node • Inserts - must have the Shard Key • Updates - can have the Shard Key • Queries • With Shard Key - routed to nodes • Without Shard Key - scatter gather • Indexed / Sorted Queries • With Shard Key - routed in order • Without Shard Key - distributed sort merge Wednesday, December 5, 12
  • 116. Lab #8 Sharding Twitter Pictures User can upload pictures to Twitter feed { photo_id : ???? , data : <binary> } What should photo_id be? How will photo_id be sharded? Wednesday, December 5, 12
  • 117. Lab #8 Sharding Key { photo_id : ???? , data : <binary> } What’s the right key? • auto increment • MD5( data ) • month() + MD5( data ) Wednesday, December 5, 12
  • 118. Right balanced access • Only have to keep small portion in ram • Time Based • Right shard "hot" • ObjectId • Auto Increment Wednesday, December 5, 12
  • 119. Random access • Have to keep entire index in ram • All shards "warm" • Hash Wednesday, December 5, 12
  • 120. Segmented access • Have to keep some index in ram • Some shards "warm" •Month + Hash Wednesday, December 5, 12
  • 121. Lab #9 Single Identities // Shard by _id ids: { _id : "alvin", email: "alvin@10gen.com", addresses: [ { state : "CA", country: "USA" }, { country: "UK" } ] } How would the following queries be executed? > db.ids.find( { _id: "alvin"} ) > db.ids.find( { email: "alvin@10gen.com" } ) Wednesday, December 5, 12
  • 122. Sharding - Routed Query find(  {  _id:  "alvin"}  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r Wednesday, December 5, 12
  • 123. Sharding - Routed Query find(  {  _id:  "alvin"}  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r Wednesday, December 5, 12
  • 124. Sharding - Scatter Gather find(  {  email:  "alvin@10gen.com"  }  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r Wednesday, December 5, 12
  • 125. Sharding - Scatter Gather find(  {  email:  "alvin@10gen.com"  }  ) shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r Wednesday, December 5, 12
  • 126. Lab #9 Multiple Identities User can have multiple identities • twitter name • email address • facebook name • etc. What is the best sharding key & schema design? Wednesday, December 5, 12
  • 127. Lab #9 - Solution A Multiple Identities // Shard by _id { _id: "alvin", email: "alvin@10gen.com", fb: "alvin.richards", // facebook li: "alvin.j.richards", // linkedin tweets: [ ... ] } Lookup by _id hits 1 node Lookup by email, li or fb is scatter gather Cannot create a unique index on email, li or fb Wednesday, December 5, 12
  • 128. Lab #9 - Solution B Multiple Identities identities { _id: { _id: "alvin"}, info: "1200-42"} { _id: { em: "alvin@10gen.com"}, info: "1200-42"} { _id: { li: "alvin.j.richards"}, info: "1200-42"} tweets { _id: "1200-42", tweets: [ ... ] } • Shard identities on { _id: 1} • Can create unique index on _id • Shard info on { _id: 1 } Wednesday, December 5, 12
  • 129. Sharding - Multiple Identities shard01 shard02 shard03 em: a-q em: r-z _id: a-z _id: "Min"- li: d-r "1100" li: s-z _id: "1100"- _id: "1200"- "1200" "Max" li: a-c ids tweets collection collection Wednesday, December 5, 12
  • 130. Sharding - Multiple Identities ids.find({  _id:                      {"em","alvin@10gen.com  }) shard01 shard02 shard03 em: a-q em: r-z _id: a-z _id: "Min"- li: d-r "1100" li: s-z _id: "1100"- _id: "1200"- "1200" "Max" li: a-c ids tweets collection collection Wednesday, December 5, 12
  • 131. Sharding - Multiple Identities ids.find({  _id:                      {"em","alvin@10gen.com  }) tweets.find({  _id:  "1200-­‐42"  }) shard01 shard02 shard03 em: a-q em: r-z _id: a-z _id: "Min"- li: d-r "1100" li: s-z _id: "1100"- _id: "1200"- "1200" "Max" li: a-c ids tweets collection collection Wednesday, December 5, 12
  • 132. Part Four Replication Wednesday, December 5, 12
  • 133. Types of outage • Planned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgrade • Unplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption Wednesday, December 5, 12
  • 134. Replica Sets • Data Protection • Multiple copies of the data • Spread across Data Centers, AZs • High Availability • Automated Failover • Automated Recovery Wednesday, December 5, 12
  • 135. Replica Sets App Write Primary Asynchronous Read Replication Secondary Read Secondary Read Wednesday, December 5, 12
  • 136. Replica Sets App Write Primary Read Secondary Read Secondary Read Wednesday, December 5, 12
  • 137. Replica Sets App Primary Write Primary Automatic Election of new Primary Read Secondary Read Wednesday, December 5, 12
  • 138. Replica Sets App Recovering Write New primary serves Primary data Read Secondary Read Wednesday, December 5, 12
  • 139. Replica Sets App Secondary Read Write Primary Read Secondary Read Wednesday, December 5, 12
  • 140. Elections During an election • Most up to date • Highest priority • Less than 10s behind failed Primary Wednesday, December 5, 12
  • 141. Types of Durability with MongoDB • Fire and forget • Wait for error • Wait for fsync • Wait for journal sync • Wait for replication Wednesday, December 5, 12
  • 142. Network Ack- Old Default Driver Primary write apply  in  memory Wednesday, December 5, 12
  • 143. Get last error - New default Driver Primary write getLastError apply  in  memory Wednesday, December 5, 12
  • 144. Wait for Journal Sync Driver Primary write getLastError apply  in  memory j:true Write  to  journal Wednesday, December 5, 12
  • 145. Wait for replication Driver Primary Secondary write getLastError apply  in  memory w:2 replicate Wednesday, December 5, 12
  • 146. Tunable Data Durability Memory Journal Secondary Other Data Center RDBMS network async ACK w=1 w=1 j=true sync w="majority" w=n w="myTag" Less More Wednesday, December 5, 12
  • 147. Eventual Consistency Using Replicas for Reads Read  preference • primary (only) • primaryPreferred • secondary (only) • secondaryPreferred • nearest Wednesday, December 5, 12
  • 148. Immediate Consistency Thread #1 Primary Insert v1 Read ✔ Update v2 Read ✔ Wednesday, December 5, 12
  • 149. Eventual Consistency Thread #1 Primary Secondary Thread #2 Insert v1 v1 does not exist Read ✔ ✖ v1 reads v1 Update v2 ✔ Read ✔ ✖ reads v1 v2 ✔ reads v2 Wednesday, December 5, 12
  • 150. Lab #10 Replication Primary, Secondary or both? • Show the latest "votes" for a tweet and/or user • Changing your profile picture • Showing your thumbnail with a tweet Wednesday, December 5, 12
  • 151. Summary • Schema design is different in MongoDB • Basic data design principals stay the same • Focus on how the application manipulates data • Rapidly evolve schema to meet your requirements • Consider sharding early • Understand the impact of eventual consistency Wednesday, December 5, 12
  • 152. download at mongodb.org conferences,  appearances,  and  meetups http://www.10gen.com/events Facebook                    |                  Twitter                  |                  LinkedIn http://bit.ly/mongo>   @mongodb http://linkd.in/joinmongo Wednesday, December 5, 12