NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
2. Introduction
● Name derived from humongous (= gigantic)
● NoSQL (= not only SQL) database
● Document oriented database
– documents stored as binary JSON (BSON)
● Ad-hoc queries
● Server side Javascript execution
● Aggregation / MapReduce
● High performance, availability, scalability
3. MongoDB
Relational vs. document based: concepts
SQL
Person
Name AddressId
MongoDB
1
2
Mueller 1
Id
Address
City Street
1
2
<null> 2
Leipzig Burgstr. 1
Dresden <null>
Person
{
_id: ObjectId(“...“),
Name: “Mueller“,
Address: {
City: “Leipzig“,
Street: “Burgstr. 1“,
},
}, {
_id: ObjectId(“...“),
Address: {
City: “Leipzig“,
},
}
DB DB
Table CollectionColumn
Row
Document
Key: Value
FieldPK
FK
Relation
Embedded document
PK
PK: primary key, FK: foreign key
4. MongoDB
SELECT * FROM Person;
SELECT * FROM Person
WHERE name = “Mueller“;
SELECT * FROM Person
WHERE name like “M%“;
SELECT name FROM Person;
SELECT distinct(name)
FROM Person
WHERE name = “Mueller“;
Relational vs. document based: syntax (1/3)
db.getCollection(“Person“).find();
db.Person.find({ “name“: "Mueller“ });
db.Person.find({ “name“: /M.*/ });
db.Person.find({}, {name: 1, _id: 0});
db.Person.distinct(
“name“, { “name“: "Mueller“ });
5. MongoDB
SELECT * FROM Person
WHERE id > 10
AND name <> “Mueller“;
SELECT p.name FROM Person p
JOIN Address a
ON p.address = a.id
WHERE a.city = “Leipzig“
ORDER BY p.name DESC;
SELECT * FROM
WHERE name IS NOT NULL;
SELECT COUNT(*) FROM PERSON
WHERE name = “Mueller“;
Relational vs. document based: syntax (2/3)
db.Person.find({ $and: [
{ _id: { $gt: ObjectId("...") }},
{ name: { $ne: "Mueller" }}]});
db.Person.find(
{ Address.city: “Leipzig“ },
{ name: 1, _id: 0 }
).sort({ name: -1 });
db.Person.find( { name: {
$not: { $type: 10 }, $exists: true }});
db.Person.count({ name: “Mueller“ });
db.Person.find(
{ name: “Mueller“ }).count();
6. MongoDB
UPDATE Person
SET name = “Müller“
WHERE name = “Mueller“;
DELETE Person
WHERE name = “Mueller“;
INSERT Person (name, address)
VALUES (“Mueller“, 3);
ALTER TABLE PERSON
DROP COLUMN name;
DROP TABLE PERSON;
Relational vs. document based: syntax (3/3)
db.Person.updateMany(
{ name: “Mueller“ },
{ $set: { name: “Müller“} });
db.Person.remove( { name: “Mueller“ } );
db.Person.insert(
{ name: “Mueller“, Address: { … } });
db.Person.updateMany( {},
{ $unset: { name: 1 }} );
db.Person.drop();
7. MongoDB
● principle of least cardinality
● Store what you query for
schema design principles
8. MongoDB
● applicable for 1:1 and 1:n when
n can‘t get to large
● Embedded document cannot get
too large
● Embedded document not very
likely to change
● arrays that grow without bound
should never be embedded
schema design: embedded document
{
_id: ObjectId(“...“),
City: “Leipzig“,
Street: “Burgstr. 1“,
Person: [
{
Name: “Mueller“,
},
{
Name: “Schneider“,
},
]
}
Address
9. MongoDB
● applicable for :n when n can‘t
get to large
● Referenced document likely to
change often in future
● there are many referenced
documents expected, so storing
only the reference is cheaper
● there are large referenced
documents expected, so storing
only the reference is cheaper
● arrays that grow without bound
should never be embedded
● Address should be accessible on
its own
schema design: referencing
{
_id: ObjectId(“...“),
City: “Leipzig“,
Street: “Burgstr. 1“,
Person: [
ObjectId(“...“), ObjectId(“...“),
]
}
{
_id: ObjectId(“...“),
Name: “Mueller“,
}
Address
Person
10. MongoDB
● applicable for :n relations when
n can get very large (note: a
MongoDB document isn‘t
allowed to exceed 16MB)
● Joins are done on application
level
schema design: parent-referencing
{
_id: ObjectId(“...“),
City: “Dubai“,
Street: “1 Sheikh Mohammed
bin Rashid Blvd“,
}
{
_id: ObjectId(“...“),
Name: “Mueller“,
Address: ObjectId(“...“),
}
Address
Person
11. MongoDB
● applicable for m:n when n and m
can‘t get to large and application
requires to navigate both ends
● disadvantage: need to update
operations when changing
references
schema design: two way referencing
{
_id: ObjectId(“...“),
City: “Leipzig“,
Street: “Burgstr. 1“,
Person: [
ObjectId(“...“), ObjectId(“...“),
]
}
{
_id: ObjectId(“...“),
Name: “Mueller“,
Address: [
ObjectId(“...“), ObjectId(“...“),
]
}
Address
Person
12. MongoDB
● queries expected to filter by
certain fields of the referenced
document, so including this field
already in the hosts saves an
additional query at application
level
● disadvantage: two update
operations for duplicated field
● disadvantage: additional
memory consumption
schema design: denormalization
{
_id: ObjectId(“...“),
City: “Leipzig“,
Street: “Burgstr. 1“,
}
{
_id: ObjectId(“...“),
Name: “Mueller“,
Address: [
{
id: ObjectId(“...“),
city: “Leipzig“,
}, ...
]
}
Address
Person
13. MongoDB
● applicable for :n relations when
n can get very large and it‘s
expected that application will
use pagination anyway
● DB schema will already create
the chunks, the application will
later query for
schema design: bucketing
{
_id: ObjectId(“...“),
City: “Leipzig“,
Street: “Burgstr. 1“,
}
{
_id: ObjectId(“...“),
Address: ObjectId(“...“),
Page: 13,
Count: 50,
Persons: [
{ Name: “Mueller“ }, ...
]
}
Address
Person
16. MongoDB
Map-Reduce
● More control than aggregation framework, but slower
var map = function() {
if(this.name != "Fischer") emit(this.name, this.Address.city);
}
var reduce = function(key, values) {
var distinct = [];
for(value in values) {
if(distinct.indexOf(value) == -1) distinct.push(value);
}
return distinct.length;
}
db.Person.mapReduce(map, reduce,
{
out: "PersonCityCount2"
});
17. MongoDB
● Default _id index, assuring uniqueness
● Single field index: db.Person.createIndex( { name: 1 } );
● Compound index: db.Address.createIndex( { city: 1, street: -1 } );
– index sorts first asc. by city then desc. by street
– Index will also used when query only filters by one of the fields
● Multikey index: db.Person.createIndex( { Address.city: 1 } )
– Indexes content stored in arrays, an index entry is created foreach
● Geospatial index
● Text index
● Hashed index
Indexes
18. MongoDB
● uniqueness: insertion of duplicate field value will be rejected
● partial index: indexes only documents matching certain filter criteria
● sparse index: indexes only documents having the indexed field
● TTL index: automatically removes documents after certain time
● Query optimization: use db.MyCollection.find({ … }).explain() to check
whether query is answered using an index, and how many documents had
still to be scanned
● Covered queries: if a query only contains indexed fields, the results will
delivered directly from index without scanning or materializing any
documents
● Index intersection: can apply different indexes to cover query parts
Index properties
19. MongoDB
● Since MongoDB 3.0 WiredTiger is the default storage engine
– locking at document level enables concurrent writes on collection
– durability ensured via write-ahead transaction log and checkpoints (
Journaling)
– supports compression of collections and indexes (via snappy or zlib)
● MMAPv1 was the default storage until MongoDB 3.0
– since MongoDB 3.0 supports locking at collection level, before only
database level
– useful for selective updates, as WiredTiger always replace the hole
document in a update operation
Storage engines
22. MongoDB
● ACID → MongoDB is compliant to this only at document level
– Atomicity
– Consistency
– Isolation
– Durability
● CAP → MongoDB assures CP
– Consistency
– Availability
– Partition tolerance
transactions
BASE:
Basically Available, Soft state,
Eventual consistency
MongoDB doesn't support transactions
multi document updates can be
performed via Two-Phase-Commit
25. MongoDB
● Who uses MongoDB
● Case studies
● Arctic TimeSeries and Tick store
● uptime
Real world examples
MongoDB in Code For Germany projects
● Politik bei uns (Offenes Ratsinformationssystem), gescrapte Stadtratsdaten
werden gemäß dem OParl-Format in einer MongoDB gespeichert, siehe
auch Daten, Web-API und Oparl-Client
26. MongoDB
●
Choose
– mass data processing, like event data
– dynamic scheme
●
Not to choose
– static scheme with lot of relations
– strict transaction requirements
When to choose, when not to choose
27. MongoDB
●
MongoDB Schema Simulation
●
6 Rules of Thumb for MongoDB Schema Design
●
MongoDB Aggregation
●
MongoDB Indexes
●
Sharding
●
MongoDB University
●
Why Relational Databases are not the Cure-All
Links