2. Who am I?
• Internship at Provikmo
• 3 years 6 months
• Competitive cyclist
3. What is MongoDB?
• The leading NoSQL database (http://db-engines.com/en/)
• Open source
• Non-relational JSON document store
• BSON (Binary JSON)
• Dynamic schema
• Agile
• Scalable through replicaton and sharding
3
4. The leading NoSQL database
4
• LinkedIn Job Skills
• Google Search
• Indeed.com Trends
19. Data Models
• Flexible schema
• Collections do not enforce document structure
• Consider how applications will use your database
• No foreign keys, no joins
• Relationships between data
• Embedded documents
• References
• Documents require a unique _id field that acts as a primary key
19
20. Data Models
• Denormalized
• Better read performance
• Single atomic write operation
• Document growth
• Dot notation
20
Embedded Data Models
21. Data Model
• One-to-One Relationship
21
Embedded Data Models
{
_id: "infasla",
name: "Anthony Slabinck",
address: {
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
}
26. GridFS
• BSON-document size limit of 16MB
• Divides a file into parts, or chunks and stores each of those chunks as
a separate document
• Two collections
• File chunks
• File metadata
• Reassemble chunks as needed
26
27. Capped Collections
• Fixed-size collections
• Insert and retrieve documents based on insertion order
• Automatically removes the oldest document
• Ideal for logging
27
28. Aggregation
• Operations that process data records and return computed results
• Simplifies application code
• Limits resource requirements
• Aggregation modalities
• Aggregation pipelines
• Map-Reduce
• Single purpose aggregation operations
28
32. Indexes
• Efficient execution of queries
• Data structure
• Stores the value of a specific
field or set of fields, ordered by
value the field
• Create indexes that support
your common and user-facing
queries
32
33. Indexes
• Default _id
• Single Field
• Compound Index
• Multikey Index
• Geospatial Index
• Text Indexes
• Hashed Indexes
33
Types
37. Replication
• A group of mongod instances
that host the same data set
• Primary receives all write
operations
• Primary logs all changes in its
oplog
• Secondaries apply operations
from the primary
37
Replica set
41. Sharding
• What?
• Storing data across multiple machines
• When?
• High query rates exhaust the CPU capacity of the server
• Larger data sets exceed the storage capacity of a single machine
• Working set sizes larger than the system’s RAM stress the I/O capacity of
disk drives
41
44. Sharding
• Shards store the data
• Query Routers interface with
client applications and direct
operations
• Config servers store the
cluster’s metadata
44
Sharded cluster
45. Sharding
• Collection level
• Shard key
• Indexed field or an indexed
compound field that exists in
every document
• Chunks
• Range based partitioning
• Hash based partitioning
• Automatic balancing
45
Data partitioning
47. MongoDB at scale
• Cluster scale
• Distributing across 100+ nodes in multiple data centers
• Performance scale
• 100K+ database reads and writes per second while maintaining strict SLAs
• Data scale
• Storing 1B+ documents in the database
47
Metrics
48. Lower TCO
• Dev/Ops savings
• Ease of use
• Fast, iterative development
• Hardware savings
• Commodity hardware
• Scale out
• Software/Support savings
• No upfront licence
48
Relational database