SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Migrating to MongoDB
Why we moved from MySQL to Mongo
Getting to know Mongo
Demo app using Mongo with PHP
Reasons we looked for
alternative to RDBM setup
Issues with our RDBM setup

Architecture was highly distributed, number of
databases was becoming an issue
Storing similar objects with different structure
Options for scalability
Storing files
Many DBs
In a MySQL server (with MyISAM)...
  1 database = 1 directory
  1 table = more than 1 file in DB directory
Filesystem limits number of inodes per directory and it’s
not that big
Had a mix of MySQL with SQLite databases spreaded
across directory hierarchy
Many DBs
In a Mongo server ...
  No 1:1 relation between databases and files
  Stores data set of files pre-allocated with increasing
  size
  Number of files grows as needed
Using many collections within a single database
allowed to move everything in DB server
A “collection”?

 RDBM model:
   Database has tables which hold records
   Records in a table are identical
 Document-oriented storage
   Database has collections which hold documents
Obj. with differing structure

 For example, events where attributes vary based on
 type of event
   Event A: from, att1
   Event B: from, att1, att2
   Event C: from, att3, att4
 What’s your schema for this?
tbl_events_A
      id     from          Att1

      1      Jim           1237

      2      Dave          362                  tbl_events_C
      3      Bob           9283         id   from    Att3      Att4

                                        1    Bob     hello     7249

       tbl_events_B                     2    Bill   goodbye   23091

id   from           Att1         Att2   3    Jim    testing    2334

1    Bill       2938              23

2    Jim            632           9

3    Hugh      12832              14
tbl_events
id   type   from   Att1     Att2    Att3     Att4
1     A     Jim    1237    NULL     NULL     NULL
2     A     Dave   362     NULL     NULL     NULL
3     B     Bill   2938     23      NULL     NULL
4     C     Bob    NULL    NULL     hello    7249
5     A     Bob    9283    NULL     NULL     NULL
6     C     Bill   NULL    NULL    goodbye   23091
7     B     Jim    632       9      NULL     NULL
8     B     Hugh   12832    14      NULL     NULL
9     C     Jim    NULL    NULL    testing   2334
tbl_events
id   type   from                    Attributes
1     A     Jim                  “{‘att1’:1237}”
2     A     Dave                  “{‘att1’:362}”
3     B     Bill            “{‘att1’:2938, ‘att2’:23}”
4     C     Bob           “{‘att3’:‘hello’, ‘att4’:7249}”
5     A     Bob                  “{‘att1’:9283}”
6     C     Bill        “{‘att3’:‘goodbye’, ‘att4’:2391}”
7     B     Jim              “{‘att1’:632, ‘att2’:9}”
8     B     Hugh           “{‘att1’:12832, ‘att2’:14}”
9     C     Jim          “{‘att3’:‘testing’, ‘att4’:2334}”
tbl_events               tbl_events_attributes
id     type       from   id      eventId     name        value
1       A         Jim    1         1             att1    1237
2       A         Dave   2         2             att1    362
3       B         Bill   3         3             att1    2938
4       C         Bob    4         3             att2     23
5       A         Bob    5         4             att3    hello
6       C         Bill
                         6         4             att4    7249
7       B         Jim
                         7         5             att1    9283
8       B         Hugh
                         8         6             att3   goodbye
9       C         Jim
                         9         6             att4    2391
                         10        7             att1    632
                         11        7             att2     9
                                           ...
Obj. with differing structure

 Document-oriented storage link Mongo is schema-less
   1 collection for all events
   Each document has the structure applicable for its
   type
   Can index common attributes for queries
events collection :

{id:1,   type:’A’,   from:‘Jim’, att1:1237}
{id:2,   type:’A’,   from:‘Dave’, att1:362}
{id:5,   type:’A’,   from:‘Bob’, att1:9238}
{id:3,   type:’B’,   from:‘Bill’, att1:2938, att2:23}
{id:7,   type:’B’,   from:‘Jim’, att1:632, att2:9}
{id:8,   type:’B’,   from:‘Hugh’, att1:12832, att2:14}
{id:4,   type:’C’,   from:‘Bill’, att3:‘hello’, att4:7249}
{id:6,   type:’C’,   from:‘Jim’, att3:‘goodbye’, att4:23091}
{id:9,   type:’C’,   from:‘Hugh’, att3:‘testing’, att4:2334}
Options for scalability


 MySQL - Master-slave replication
 Mongo - Support master slave, replica pairs, master
 master and ... auto-sharding
Storing files

 In MySQL, you can use a table with BLOB field and
 other field for file meta data
 Mongo has GridFS
   Built for storage of large objects
   Split into chunks, also stores metadata
> db.fs.files.findOne();
{
! "_id" : ObjectId("4b9525096b00bd59b95f791f"),
! "filename" : "user.png",
! "length" : 43717,
! "chunkSize" : 262144,
! "uploadDate" : "Mon Mar 08 2010 11:25:45 GMT-0500 (EST)",
! "md5" : "3f6fcd4c0a51655d392fe95a99c29140",
! "mimeType" : "image/png"
}
> db.fs.chunks.findOne();
{
! "_id" : ObjectId("4b952509c568bb9fc8e3cddb"),
! "files_id" : ObjectId("4b9525096b00bd59b95f791f"),
! "n" : 0,
! "data" : BinData type: 2 len: 43721
}
Getting to know MongoDB
Basic concepts
A database has collections which holds documents
Documents in a collection can have any structure
Documents are JSON objects, stored as BSON
Data types:
  all basic JSON types: string, integer, boolean,
  double, null, array, object
  Special types: date, object id, binary, regexp, code
Important differences

 Collections instead of tables
 ObjectID instead of primary keys
 References instead of foreign keys
 JavaScript code execution instead of stored
 procedures
 [NULL] instead of joins
Inserting data
> doc = { author: 'joe',
  created : new Date('03-28-2009'),
  title : 'Yet another blog post',
  text : 'Here is the text...',
  tags : [ 'example', 'joe' ],
  comments : [
    { author: 'jim', comment: 'I disagree' },
    { author: 'nancy', comment: 'Good post' }
  ]
}
> db.posts.insert(doc);
Querying data
>   db.posts.find();
>   db.posts.find({‘author’:‘joe’});
>   db.posts.find({‘comments.author’:‘nancy’});
>   db.posts.find({‘comments.comment’: /disagree/i });

> db.posts.findOne({‘comment.author’:‘nancy’});
> db.posts.find({‘comment.author’:‘nancy’}).limit(5);

> db.posts.find({},{‘author’:true, ‘tags’:true});

> db.posts.find({‘author’:‘nancy’}).sort({‘created’:1});
Querying - advanced
features
  Support of OR conditions
  $ modifiers to introduce conditions
> db.posts.find({timestamp: {$gte:1268149684}});

  $where modifiers
> db.pictures.find({$where: function() { return
(this.creationTimestamp >= 1268149684) }})

  MapReduce
  Server-side code execution
> function getUniques() {
...   var uniques = [];
...   db.pictures.find({},{tags:true}).forEach(function(pic) {
...     pic.tags.forEach(function(tag) {
...       if (uniques.indexOf(tag) == -1) uniques.push(tag);
...     });
...   });
...   return uniques;
... }
> db.eval(getUniques);
[
! "firstTag",
! "thirdTag",
! "toto",
! "test",
! "comic",
! "secondTag"
]
Updating data
update( criteria, objNew, upsert, multi )
> db.myColl.update( { name: "Joe" }, { name: "Joe", age:
20 }, true, false );


save(object) - insert or update if _id exists
Update modifier operators

  $inc, $set, $unset, $push, $pushAll, $addToSet, $pop,
  $pull, $pullAll
> db.myColl.update({name:"Joe"}, { $set:{age:20}});

> db.posts.update({author:”Joe”},{$push:{tags:‘hockey’}});

> db.posts.update({},{$addToSet:{tags:‘hockey’}});
Removing data
> db.things.remove({});    // removes all
> db.things.remove({n:1}); // removes all where n == 1
> db.things.remove({_id: myobject._id});
References
>   p = db.postings.findOne();
{
!    "_id" : ObjectId("4b866f08234ae01d21d89604"),
!    "author" : "jim",
!    "title" : "Brewing Methods"
}
>   // get more info on author
>   db.users.findOne( { _id : p.author } )
{   "_id" : "jim", "email" : "jim@gmail.com" }
>   x = { name : 'Biology' }
{   "name" : "Biology" }
>   db.courses.save(x)
>   x
{   "name" : "Biology", "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }

> stu = { name : 'Joe', classes : [ new DBRef('courses', x._id) ] }
> db.students.save(stu)
> stu
{
        "name" : "Joe",
        "classes" : [
                 {
                        "$ref" : "courses",
                        "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1")
                 }
        ],
        "_id" : ObjectId("4b0552e4f0da7d1eb6f126a2")
}
> stu.classes[0]
{ "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }

> stu.classes[0].fetch()
{ "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1"), "name" : "Biology" }
Limitations to keep in mind


 Namespace limit (24 000 collections and indexes)
 Database size maxed to 2GB on 32-bit systems ... use
 a 64-bit production system!
Licensing

   MongoDB is GNU AGPL 3.0, supported drivers re
   Apache License v2.0
   From www.mongodb.org/display/DOCS/Licensing :
If you are using a vanilla MongoDB server from either source or binary packages you
have NO obligations. You can ignore the rest of this page.
Hands-on example
SQL schema
                                                               tags
            pictures
                                                   pictureId          int
pictureId           int
                                                   tag                varchar
title               varchar

creationTimestamp   int
content             blob




             users
userId              int                   comments
name                varchar   pictureId           int

                              userId              int
                              txt                 varchar

                              creationTimestamp   int
let’s see some code ...

Contenu connexe

En vedette

Memcached vs redis
Memcached vs redisMemcached vs redis
Memcached vs redisqianshi
 
Why Memcached?
Why Memcached?Why Memcached?
Why Memcached?Gear6
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for BeginnersEnoch Joshua
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDBDATAVERSITY
 
Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?Payara
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDBAlex Sharp
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 

En vedette (12)

Memcached vs redis
Memcached vs redisMemcached vs redis
Memcached vs redis
 
Why Memcached?
Why Memcached?Why Memcached?
Why Memcached?
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Similaire à ConFoo - Migrating To Mongo Db

Windows Azure Storage
Windows Azure StorageWindows Azure Storage
Windows Azure Storagegoodfriday
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Groupkchodorow
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingBoxed Ice
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingBoxed Ice
 
Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015Steve Smith
 

Similaire à ConFoo - Migrating To Mongo Db (7)

Windows Azure Storage
Windows Azure StorageWindows Azure Storage
Windows Azure Storage
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Group
 
Tricks
TricksTricks
Tricks
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueing
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueing
 
Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015
 
Git as NoSQL
Git as NoSQLGit as NoSQL
Git as NoSQL
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

ConFoo - Migrating To Mongo Db

  • 1. Migrating to MongoDB Why we moved from MySQL to Mongo Getting to know Mongo Demo app using Mongo with PHP
  • 2.
  • 3. Reasons we looked for alternative to RDBM setup
  • 4. Issues with our RDBM setup Architecture was highly distributed, number of databases was becoming an issue Storing similar objects with different structure Options for scalability Storing files
  • 5. Many DBs In a MySQL server (with MyISAM)... 1 database = 1 directory 1 table = more than 1 file in DB directory Filesystem limits number of inodes per directory and it’s not that big Had a mix of MySQL with SQLite databases spreaded across directory hierarchy
  • 6. Many DBs In a Mongo server ... No 1:1 relation between databases and files Stores data set of files pre-allocated with increasing size Number of files grows as needed Using many collections within a single database allowed to move everything in DB server
  • 7. A “collection”? RDBM model: Database has tables which hold records Records in a table are identical Document-oriented storage Database has collections which hold documents
  • 8. Obj. with differing structure For example, events where attributes vary based on type of event Event A: from, att1 Event B: from, att1, att2 Event C: from, att3, att4 What’s your schema for this?
  • 9. tbl_events_A id from Att1 1 Jim 1237 2 Dave 362 tbl_events_C 3 Bob 9283 id from Att3 Att4 1 Bob hello 7249 tbl_events_B 2 Bill goodbye 23091 id from Att1 Att2 3 Jim testing 2334 1 Bill 2938 23 2 Jim 632 9 3 Hugh 12832 14
  • 10. tbl_events id type from Att1 Att2 Att3 Att4 1 A Jim 1237 NULL NULL NULL 2 A Dave 362 NULL NULL NULL 3 B Bill 2938 23 NULL NULL 4 C Bob NULL NULL hello 7249 5 A Bob 9283 NULL NULL NULL 6 C Bill NULL NULL goodbye 23091 7 B Jim 632 9 NULL NULL 8 B Hugh 12832 14 NULL NULL 9 C Jim NULL NULL testing 2334
  • 11. tbl_events id type from Attributes 1 A Jim “{‘att1’:1237}” 2 A Dave “{‘att1’:362}” 3 B Bill “{‘att1’:2938, ‘att2’:23}” 4 C Bob “{‘att3’:‘hello’, ‘att4’:7249}” 5 A Bob “{‘att1’:9283}” 6 C Bill “{‘att3’:‘goodbye’, ‘att4’:2391}” 7 B Jim “{‘att1’:632, ‘att2’:9}” 8 B Hugh “{‘att1’:12832, ‘att2’:14}” 9 C Jim “{‘att3’:‘testing’, ‘att4’:2334}”
  • 12. tbl_events tbl_events_attributes id type from id eventId name value 1 A Jim 1 1 att1 1237 2 A Dave 2 2 att1 362 3 B Bill 3 3 att1 2938 4 C Bob 4 3 att2 23 5 A Bob 5 4 att3 hello 6 C Bill 6 4 att4 7249 7 B Jim 7 5 att1 9283 8 B Hugh 8 6 att3 goodbye 9 C Jim 9 6 att4 2391 10 7 att1 632 11 7 att2 9 ...
  • 13. Obj. with differing structure Document-oriented storage link Mongo is schema-less 1 collection for all events Each document has the structure applicable for its type Can index common attributes for queries
  • 14. events collection : {id:1, type:’A’, from:‘Jim’, att1:1237} {id:2, type:’A’, from:‘Dave’, att1:362} {id:5, type:’A’, from:‘Bob’, att1:9238} {id:3, type:’B’, from:‘Bill’, att1:2938, att2:23} {id:7, type:’B’, from:‘Jim’, att1:632, att2:9} {id:8, type:’B’, from:‘Hugh’, att1:12832, att2:14} {id:4, type:’C’, from:‘Bill’, att3:‘hello’, att4:7249} {id:6, type:’C’, from:‘Jim’, att3:‘goodbye’, att4:23091} {id:9, type:’C’, from:‘Hugh’, att3:‘testing’, att4:2334}
  • 15. Options for scalability MySQL - Master-slave replication Mongo - Support master slave, replica pairs, master master and ... auto-sharding
  • 16. Storing files In MySQL, you can use a table with BLOB field and other field for file meta data Mongo has GridFS Built for storage of large objects Split into chunks, also stores metadata
  • 17. > db.fs.files.findOne(); { ! "_id" : ObjectId("4b9525096b00bd59b95f791f"), ! "filename" : "user.png", ! "length" : 43717, ! "chunkSize" : 262144, ! "uploadDate" : "Mon Mar 08 2010 11:25:45 GMT-0500 (EST)", ! "md5" : "3f6fcd4c0a51655d392fe95a99c29140", ! "mimeType" : "image/png" } > db.fs.chunks.findOne(); { ! "_id" : ObjectId("4b952509c568bb9fc8e3cddb"), ! "files_id" : ObjectId("4b9525096b00bd59b95f791f"), ! "n" : 0, ! "data" : BinData type: 2 len: 43721 }
  • 18. Getting to know MongoDB
  • 19. Basic concepts A database has collections which holds documents Documents in a collection can have any structure Documents are JSON objects, stored as BSON Data types: all basic JSON types: string, integer, boolean, double, null, array, object Special types: date, object id, binary, regexp, code
  • 20. Important differences Collections instead of tables ObjectID instead of primary keys References instead of foreign keys JavaScript code execution instead of stored procedures [NULL] instead of joins
  • 21. Inserting data > doc = { author: 'joe', created : new Date('03-28-2009'), title : 'Yet another blog post', text : 'Here is the text...', tags : [ 'example', 'joe' ], comments : [ { author: 'jim', comment: 'I disagree' }, { author: 'nancy', comment: 'Good post' } ] } > db.posts.insert(doc);
  • 22. Querying data > db.posts.find(); > db.posts.find({‘author’:‘joe’}); > db.posts.find({‘comments.author’:‘nancy’}); > db.posts.find({‘comments.comment’: /disagree/i }); > db.posts.findOne({‘comment.author’:‘nancy’}); > db.posts.find({‘comment.author’:‘nancy’}).limit(5); > db.posts.find({},{‘author’:true, ‘tags’:true}); > db.posts.find({‘author’:‘nancy’}).sort({‘created’:1});
  • 23. Querying - advanced features Support of OR conditions $ modifiers to introduce conditions > db.posts.find({timestamp: {$gte:1268149684}}); $where modifiers > db.pictures.find({$where: function() { return (this.creationTimestamp >= 1268149684) }}) MapReduce Server-side code execution
  • 24. > function getUniques() { ... var uniques = []; ... db.pictures.find({},{tags:true}).forEach(function(pic) { ... pic.tags.forEach(function(tag) { ... if (uniques.indexOf(tag) == -1) uniques.push(tag); ... }); ... }); ... return uniques; ... } > db.eval(getUniques); [ ! "firstTag", ! "thirdTag", ! "toto", ! "test", ! "comic", ! "secondTag" ]
  • 25. Updating data update( criteria, objNew, upsert, multi ) > db.myColl.update( { name: "Joe" }, { name: "Joe", age: 20 }, true, false ); save(object) - insert or update if _id exists
  • 26. Update modifier operators $inc, $set, $unset, $push, $pushAll, $addToSet, $pop, $pull, $pullAll > db.myColl.update({name:"Joe"}, { $set:{age:20}}); > db.posts.update({author:”Joe”},{$push:{tags:‘hockey’}}); > db.posts.update({},{$addToSet:{tags:‘hockey’}});
  • 27. Removing data > db.things.remove({}); // removes all > db.things.remove({n:1}); // removes all where n == 1 > db.things.remove({_id: myobject._id});
  • 28. References > p = db.postings.findOne(); { ! "_id" : ObjectId("4b866f08234ae01d21d89604"), ! "author" : "jim", ! "title" : "Brewing Methods" } > // get more info on author > db.users.findOne( { _id : p.author } ) { "_id" : "jim", "email" : "jim@gmail.com" }
  • 29. > x = { name : 'Biology' } { "name" : "Biology" } > db.courses.save(x) > x { "name" : "Biology", "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } > stu = { name : 'Joe', classes : [ new DBRef('courses', x._id) ] } > db.students.save(stu) > stu { "name" : "Joe", "classes" : [ { "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } ], "_id" : ObjectId("4b0552e4f0da7d1eb6f126a2") } > stu.classes[0] { "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } > stu.classes[0].fetch() { "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1"), "name" : "Biology" }
  • 30. Limitations to keep in mind Namespace limit (24 000 collections and indexes) Database size maxed to 2GB on 32-bit systems ... use a 64-bit production system!
  • 31. Licensing MongoDB is GNU AGPL 3.0, supported drivers re Apache License v2.0 From www.mongodb.org/display/DOCS/Licensing : If you are using a vanilla MongoDB server from either source or binary packages you have NO obligations. You can ignore the rest of this page.
  • 33. SQL schema tags pictures pictureId int pictureId int tag varchar title varchar creationTimestamp int content blob users userId int comments name varchar pictureId int userId int txt varchar creationTimestamp int
  • 34. let’s see some code ...