5. SQL
●
Simplicity
●
Uniform representation
●
Runtime schema modifications
SELECT DISTINCT p.LastName, p.FirstName
FROM Person.Person AS p
JOIN HumanResources.Employee AS e
ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN
(SELECT Bonus
FROM Sales.SalesPerson AS sp
WHERE e.BusinessEntityID = sp.BusinessEntityID);
7. Strong consistency
SQL features like
Foreign and Primary Keys, Unique
fields
ACID (atomicity, consistency, isolation,
durability) transactions
Business transactions ~ system transactions
19. Aggregate oriented Databases
●
Document databases implement idea of Aggregate
oriented database.
●
Aggregate is a storage atom
●
Aggregate oriented databsaes are closer to application
domain.
●
Ensures atomic operations with aggregate
●
Aggregate might be replicated or sharded efficiently
●
Major question: to embed or not to embed
23. MongoDB Basics
MongoDB is document-
oriented and DBMS
MongoDB is Client-Server
DBMS
JSON/JavaScript is major
language to access
Mongo DB = Collections + Indexes
24. Collections
Name
Documents
Indexes
Two documents from the same
collection might be completly different
Simple creating (during first insert).
25. Document
Identifier (_id)
Body i JSON (Internally BSON)
{
"fullName" : "Fedor Buhankin",
"course" : 5,
"univercity" : "ONPU",
"faculty" : "IKS",
"_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" }
"homeAddress" : "Ukraine, Odessa 23/34",
"averageAssessment" : 5,
"subjects" : [
"math",
"literature",
"drawing",
"psychology"
]
}
●
Major bricks: scalar value, map and list
●
Any part of the ducument can be indexed
●
Max document size is 16M
37. Queries between collections
●
Remember, MongoDB = no JOINs
●
1 approach: Perform multiple queries (lazy loading)
●
2 approach: use MapReduce framework
●
3 approach: use Aggregation Framework
38. Map Reduce Framework
●
Is used to perform complex grouping with collection
documents
●
Is able to manipulate over multiple collections
●
Uses MapReduce pattern
●
Use JavaScript language
●
Support sharded environment
●
The result is similar to materialized views
39. Map Reduce Concept
Launch map Launch reduce
For every elem
a11 map
map b1
1
a22 map
map b2
2
a33 map
map b3
3
a44 map
map b4
4
reduce
reduce c
a55 map
map b5
5
a66 map
map b6
6
... ...
ann map
map bn
n
f map : A → B f reduce : B[ ]→ C
40. How it works
Input Implement REDUCE function
Implement MAP function
Collection X
MAP Execute MAP func:
Mark each document
with specific color
REDUCE
Execute REDUCE func:
Merge each colored set
into single element
Output
41. Take amount of orders for each customer
db.cutomers_orders.remove();
mapUsers = function() {
emit( this.customerId, {count: 1, this.customerId} );
};
reduce = function(key, values) {
var result = {count: 0, customerId:key};
values.forEach(function(value) {
result.count += value.count;
});
return result;
};
db.customers.mapReduce(mapUsers, reduce, {"out": {"replace"
"cutomers_orders"}});
Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]
42. Aggregation and
Aggregation Framework
●
Simplify most used mapreduce operarions like
group by criteria
●
Restriction on pipeline size is 16MB
●
Support sharded environment (Aggregation
Framework only)
45. Access via API
Use Official MongoDB Java Driver (just include mongo.jar)
Mongo m = new Mongo();
// or
Mongo m = new Mongo( "localhost" );
// or
Mongo m = new Mongo( "localhost" , 27017 );
// or, to connect to a replica set, supply a seed list of members
Mongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017),
new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)))
DB db = m.getDB( "mydb" );
DBCollection coll = db.getCollection("customers");
ArrayList list = new ArrayList();
list.add(new BasicDBObject("city", "Odessa"));
BasicDBObject doc= new BasicDBObject();
doc.put("name", "Kaktus");
doc.put("billingAddress", list);
coll.insert(doc);
46. Closer to Domain model
●
Morphia http://code.google.com/p/morphia/
●
Spring Data for MongoDB
http://www.springsource.org/spring-data/mongodb
Major features:
●
Type-safe POJO centric model
●
Annotations based mapping behavior
●
Good performance
●
DAO templates
●
Simple criterias
47. Example with Morphia
@Entity("Customers")
class Customer {
@Id ObjectId id; // auto-generated, if not set (see ObjectId)
@Indexed String name; // value types are automatically persisted
List<Address> billingAddress; // by default fields are @Embedded
Key<Customer> bestFriend; //referenceto external document
@Reference List<Customer> partners = new ArrayList<Customer>(); //refs are
stored and loaded automatically
// ... getters and setters
//Lifecycle methods -- Pre/PostLoad, Pre/PostPersist...
@PostLoad void postLoad(DBObject dbObj) { ... }
}
Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")
morphia.map(Customer.class);
Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...));
Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();
48. To embed or not to embed
●
Separate collections are good if you need
to select individual documents, need
more control over querying, or have huge
documents.
●
Embedded documents are good when
you want the entire document, size of the
document is predicted. Embedded
documents provide perfect performance.
49. Schema migration
●
Schemaless
●
Main focus is how the aplication will behave when
new field will has been added
●
Incremental migration technque (version field)
Use Cases :
– removing field
– renaming fields
– refactoring aggregate
50. Data Consistency
●
Transactional consistency
– domain design should take into account aggregate atomicity
●
Replication consistency
– Take into account Inconsistency window (sticky sessions)
●
Eventual consistency
●
Accept CAP theorem
– it is impossible for a distributed computer system to simultaneously provide all
three of the following guarantees: consistency, availability and partition
tolerance.
52. Scaling options
●
Autosharding
●
Master-Slave replication
●
Replica Set clusterization
●
Sharding + Replica Set
53. Sharding
●
MongoDB supports autosharding
●
Just specify shard key and pattern
●
Sharding increases writes
●
Major way for scaling the system
54. Master-Slave replication
●
One master, many slaves
●
Slaves might be hidden or can be used to read
●
Master-Slave increase
reades and provides
reliability
55. Replica Set clusterization
●
The replica set automatically elects a primary (master)
●
Master shares the same state between all replicas
●
Limitation (limit: 12 nodes)
●
WriteConcern option
●
Benefits:
– Failover and Reliability
– Distributing read load
– maintance without downtime
57. MongoDB Criticism
●
Dataloss reports on heavy-write configurations
●
Atomic operatons over multiple documents
When not to use
●
Heavy cross-document atomic operations
●
Queries against varying aggregate structure
58. Tips
●
Do not use autoincrement ids
●
Small names are are preffered
●
By default DAO methods are async
●
Think twise on collection design
●
Use atomic modifications for a document
59. Out of scope
●
MapReduce options
●
Indexes
●
Capped collections