Successfully reported this slideshow.

MongoDB - A Document NoSQL Database

1 857 vues

Publié le

Publié dans : Technologie
  • Soyez le premier à commenter

MongoDB - A Document NoSQL Database

  1. 1. MongoDBA NoSQL Document Oriented Database
  2. 2. Agenda● RelationalDBs● NoSQL– What, Why– Types– History– Features– Types● MongoDB– Indexes– Replication– Sharding– Querying– Mapping– MapReduce● Use Case: RealNetworks
  3. 3. Relational DBs● Born in the 70s– storage is expensive– schemas are simple● Based on Relational Model– Mathematical model for describing data structure– Data represented in „tuples“, grouped into „relations“● Queries based on Relational Algebra– union, intersection, difference, cartesian product, selection,projection, join, division● Constraints– Foreign Keys, Primary Keys, Indexes– Domain Integrity (DataTypes)
  4. 4. Joins
  5. 5. Relational Dbs● Normalization– minimize redundancy– avoid duplication
  6. 6. Normalization
  7. 7. Relational DBs - Transactions● Atomicity– If one part of the transaction fails, the whole transaction fails● Consistency– Transaction leaves the DB in a valid state● Isolation– One transaction doesnt see an intermediate state of the other● Durability– Transaction gets persisted
  8. 8. Relational Dbs - Use
  9. 9. NoSQL – Why?● Web2.0– Huge DataVolumes– Need for Speed– Accesibility● RDBMS are difficult to scale● Storage gets cheap● Commodity machines get cheap
  10. 10. NoSQL – What?● Simple storage of data● Looser consistency model (eventual consistency), inorder to achieve:– higher availability– horizontal scaling● No JOINs● Optimized for big data, when no relational features areneeded
  11. 11. Vertical ScaleHorizontal Scale
  12. 12. Vertical ScaleHorizontal ScaleEnforces parallel computing
  13. 13. Eventual Consistency● RDBMS: all users see a consistent viewof the data● ACID gets difficult when distributingdata across nodes● Eventual Consistency: inconsistenciesare transitory. The DB may have someinconsistencies at a point of time, but willeventually get consistent.● BASE (in contrast to ACID)– BasicallyAvailable Soft-state Eventually
  14. 14. CAP TheoremAll nodes seethe same dataat the same timeRequests alwaysget an immediate responseSystem continues to work,even if a part of it breaks
  15. 15. NoSQL - History● Term first used in 1998 by C. Strozzi to namehis RelationalDB that didnt use SQL● Term reused in 2009 by E.Evans to name thedistributed Dbs that didnt provide ACID● Some people traduce it as „Not Only SQL“● Should actually be called „NoRel“ (noRelational)
  16. 16. NoSQL – Some Features● Auto-Sharding● Replication● Caching● Dynamic Schema
  17. 17. NoSQL - Types● Document– „Map“ key-value, with a „Document“ (xml, json, pdf, ..) asvalue– MongoDB, CouchDB● Key-Value– „Map“ key-value, with an „Object“ (Integer, String, Order, ..)as value– Cassandra, Dynamo, Voldemort● Graph– Data stored in a graph structure – nodes have pointer toadjacent ones– Neo4J
  18. 18. MongoDB● OpenSource NoSQL Document DB written inC++● Started in 2009● Commercial Support by 10gen● From humongous (huge)● http://www.mongodb.org/
  19. 19. MongoDB – Document Oriented● No Document Structure - schemaless● Atomicity: only at document level (notransactions across documents)● Normalization is not easy to achieve:– Embed: +duplication, +performance– Reference: -duplication, +roundtrips
  20. 20. MongoDB●> db.users.save({ name: ruben,surname : inoto,age : 36 } )●> db.users.find()– { "_id" : ObjectId("519a3dd65f03c7847ca5f560"),"name" : "ruben","surname" : "inoto","age" : "36" }● > db.users.update({ name: ruben },{ $set: { age : 24 } } )Documents are stored in BSON format
  21. 21. MongoDB - Querying● find(): Returns a cursor containing a number of documents– All users– db.users.find()– User with id 42– db.users.find({ _id: 42})– Age between 20 and 30– db.users.find( { age: { $gt: 20, $lt: 30 } } )– Subdocuments: ZIP 5026– db.users.find( { address.zip: 5026 } )– OR: ruben or younger than 30– db.users.find({ $or: [{ name : "ruben" },{ age: { $lt: 30 } }]})– Projection: Deliver only name and age– db.users.find({ }, { name: 1, age: 1 }){"_id": 42,"name": "ruben","surname": "inoto",„age“: „36“,"address": {"street": "Glaserstraße","zip": "5026" }}
  22. 22. MongoDB - Saving● Insert– db.test.save( { _id: "42", name: "ruben" } )● Update– db.test.update( { _id : "42" }, { name : "harald" } )– db.test.update( { _id : "42" }, { name : "harald", age : 39 } )● Atomic Operators ($inc)– db.test.update( { _id : "42" }, { $inc: { age : 1 } } )● Arrays– { _id : "48", name : "david", hobbies : [ "bike", "judo" ] }– Add element to array atomic ($push)● db.test.update( { _id : "48" }, { $push: { hobbies : "swimming" } } )– $each, $pop, $pull, $addToSet...
  23. 23. MongoDB - Delete● db.test.remove ( { _id : „42“ } )
  24. 24. MongoDB – Indexes● Indexes on any attribute– > db.users.ensureIndex( { age : 1 } )● Compound indexes– > db.users.ensureIndex( { age : 1 }, { name:1 } )● Unique Indexes● >v2.4 → Text Indexing (search)
  25. 25. SQL → Mongo Mapping (I)SQL Statement Mongo Query LanguageCREATE TABLE USERS (a Number, bNumber)implicitINSERT INTO USERS VALUES(1,1) db.users.insert({a:1,b:1})SELECT a,b FROM users db.users.find({}, {a:1,b:1})SELECT * FROM users db.users.find()SELECT * FROM users WHERE age=33 db.users.find({age:33})SELECT * FROM users WHERE age=33ORDER BY namedb.users.find({age:33}).sort({name:1})
  26. 26. SQL → Mongo Mapping (I)SQL Statement Mongo Query LanguageSELECT * FROM users WHERE age>33 db.users.find({age:{$gt:33}})})CREATE INDEX myindexname ONusers(name)db.users.ensureIndex({name:1})SELECT * FROM users WHERE a=1 andb=qdb.users.find({a:1,b:q})SELECT * FROM users LIMIT 10 SKIP 20 db.users.find().limit(10).skip(20)SELECT * FROM users LIMIT 1 db.users.findOne()EXPLAIN PLAN FOR SELECT * FROM usersWHERE z=3db.users.find({z:3}).explain()SELECT DISTINCT last_name FROM users db.users.distinct(last_name)SELECT COUNT(*)FROM users where AGE > 30db.users.find({age: {$gt: 30}}).count()
  27. 27. Embed vs Reference
  28. 28. Relational
  29. 29. Documentuser: {id: "1",name: "ruben"}order: {id: "a",user_id: "1",items: [ {product_id: "x",quantity: 10,price: 300},{product_id: "y",quantity: 5,price: 300}]}referencedembedded
  30. 30. MongoDB – Replication (I)● Master-slave replication: primary and secondary nodes● replica set: cluster of mongod instances that replicate amongst oneanother and ensure automated failoverWriteConcern
  31. 31. MongoDB – Replication (II)● adds redundancy● helps to ensure high availability – automaticfailover● simplifies backups
  32. 32. WriteConcerns● Errors Ignored– even network errors are ignored● Unacknowledged– at least network errors are handled● Acknowledged– constraints are handled (default)● Journaled– persisted to journal log● Replica ACK– 1..n– Or majority
  33. 33. MongoDB – Sharding (I)● Scale Out● Distributes data to nodes automatically● Balances data and load accross machines
  34. 34. MongoDB – Sharding (II)● A sharded Cluster is composed of:– Shards: holds data.● Either one mongod instance (primary daemon process –handles data requests), or a replica set– config Servers:● mongod instance holding cluster metadata– mongos instances:● route application calls to the shards● No single point of failure
  35. 35. MongoDB – Sharding (III)
  36. 36. MongoDB – Sharding (IV)
  37. 37. MongoDB – Sharding (V)● Collection has a shard key: existing field(s) inall documents● Documents get distributed according to ranges● In a shard, documents are partitioned intochunks● Mongo tries to keep all chunks at the same size
  38. 38. MongoDB – Sharding (VI)● Shard Balancing– When a shard has too many chunks, mongo moveschunks to other shards● Only makes sense with huge amount of data
  39. 39. Object Mappers● C#, PHP, Scala, Erlang, Perl, Ruby● Java– Morphia– Spring MongoDB– mongo-jackson-mapper– jongo● ..
  40. 40. Jongo - ExampleDB db = new MongoClient().getDB("jongo");Jongo jongo = new Jongo(db);MongoCollection users = jongo.getCollection("users");User user = new User("ruben", "inoto", new Address("Musterstraße", "5026"));users.save(user);User ruben = users.findOne("{name: ruben}").as(User.class);public class User {private String name;private String surname;private Address address;public class Address {private String street;private String zip;{"_id" : ObjectId("51b0e1c4d78a1c14a26ada9e"),"name" : "ruben","surname" : "inoto","address" : {"street" : "Musterstraße","zip" : "5026"}}
  41. 41. TTL (TimeToLive)● Data with an expiryDate● After the specified TimeToLive, the data will beremoved from the DB● Implemented as an Index● Useful for logs, sessions, ..db.broadcastMessages.ensureIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
  42. 42. MapReduce● Programming model for processing large data sets with aparallel, distributed algorithm.● Handles complex aggregation tasks● Problem can be distributed in smaller tasks, distributed acrossnodes● map phase: selects the data– Associates a value with a key and a value pair– Values will be grouped by the key, and passed to the reduce function● reduce phase: transforms the data– Accepts two arguments: key and values– Reduces to a single object all the values associated with the key
  43. 43. MapReduce
  44. 44. MapReduce Use Example● Problem: Count how much money eachcustomer has paid in all its orders
  45. 45. Solution - Relationalselect customer_id, sum(price * quantity)from ordersgroup by customer_idorder_id customer_id price quantitya 1 350 2b 2 100 2c 1 20 1customer_id total1 7202 200
  46. 46. Solution - Sequentialvar customerTotals = new Map();for (Order order: orders) {var newTotal = order.price * order.quantity;if (customerTotals.containsKey(order.customerId)) {newTotal += customerTotals.get(order.customerId);}customerTotals.put(order.customerId, newTotal);}[{order_id: "a",customer_id: "1",price: 350,quantity: 2},{order_id: "b",customer_id: "2",price: 100,quantity: 2},{order_id: "c",customer_id: "1",price: 20,quantity: 1}]{ „1“: 720 }{ „2“: 200 }
  47. 47. Solution - MapReducedb.orders.insert([{order_id: "a",customer_id: "1",price: 350quantity: 2},{order_id: "b",customer_id: "2",price: 100,quantity: 2},{order_id: "c",customer_id: "1",price: 20,quantity: 1}]);var mapOrders = function() {var totalPrice = this.price * this.quantity;emit(this.customer_id, totalPrice);};var reduceOrders = function(customerId, tempTotal) {return Array.sum(tempTotal);};db.orders.mapReduce(mapOrders,reduceOrders,{ out: "map_reduce_orders" });> db.map_reduce_orders.find().pretty();{ "_id" : "1", "value" : 720 }{ "_id" : "2", "value" : 200 }
  48. 48. MapReduce
  49. 49. Who is using Mongo?● Craigslist● SourceForge● Disney● TheGuardian● Forbes● CERN● ….
  50. 50. „Real“ Use Case – AndroidNotifications● App to send „notifications“ (messages) to deviceswith an installed RealNetworks application (Music,RBT)● Scala, Scalatra, Lift, Jersey, Guice,ProtocolBuffers● MongoDB, Casbah, Salat● Mongo Collections– Devices: deviceId, msisdn, application– Messages: message, audience– SentMessages: deviceId, message, status
  51. 51. Criticism● Loss of data– Specially in a cluster
  52. 52. Conclusion● Not a silver bullet● Makes sense when:– Eventual consistency is acceptable– Prototyping– Performance– Object model doesnt suit in a Relational DB● Easy to learn

×