Presented by Mike Obrebski, Senior Solution Architect, Nuxeo
MongoDB can be used in the Nuxeo Platform as a replacement for traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source software platform, can now completely rely on MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will discuss different implementation strategies. In this session, you will learn more about the migrations to MongoDB and how we were able to achieve increased performance gains.
3. NUXEONUXEO
Nuxeo
we provide a Platform that developers can use to build highly
customized Content Applications
we provide components, and the tools to assemble them
everything we do is open source (for real)
various customers - various use cases
me: developer & CTO - joined the Nuxeo project 10+ years ago
Track game builds Electronic Flight Bags Central repository for Models Food industry PLM
https://github.com/nuxeo
5. DOCUMENT REPOSITORYDOCUMENT REPOSITORY
Storage abstraction : be able to choose the right storage
depending on the constraints
depending on the environment
Manage Content Model
Schemas, Mixins, facets
Manage Data level Security
Document level permissions
Blob level permissions
Versioning
Life-Cycle
Blob management
Efficient storage & CDN
6. HISTORY : NUXEO REPOSITORY & STORAGEHISTORY : NUXEO REPOSITORY & STORAGE
2006: Nuxeo Repository is based on ZODB (Python / Zope based)
This is not JSON in NoSQL, but Python serialization in ObjectDB
Conccurency and performances issues, Bad transaction handling
2007: Nuxeo Platform 5.1 - Apache JackRabbit (JCR based)
Mix SQL + Java Serialization + Lucene
Transaction and consistency issues
2009: Nuxeo 5.2 - Nuxeo VCS
SQL based repository : MVCC & ACID
very reliable, but some use cases can not fit in a SQL DB !
2014: Nuxeo 5.9 - Nuxeo DBS
Document Based Storage repository
MongoDB is the reference backend
Object DB
Document DB
SQL DB
7. FROM SQL TO NOSQLFROM SQL TO NOSQL
Understanding the motivations
for moving to MongoDB
8. SQL BASED REPOSITORY - VCSSQL BASED REPOSITORY - VCS
Search API is the most used :
search is the main scalability challenge
9. KEY LIMITATIONS OF THE SQL APPROACHKEY LIMITATIONS OF THE SQL APPROACH
Impedance issue
storing Documents in tables is not easy
requires Caching and Lazy loading
Scalability
Document repository can become very large (versions, workflows ...)
scaling out SQL DB is very complex (and never transparent)
Concurrency model
Heavy write is an issue (Quotas, Inheritance)
Hard to maintain good Read & Write performances
10. NEED A DIFFERENT STORAGE MODEL !NEED A DIFFERENT STORAGE MODEL !
12. NO SQL WITH MONGODBNO SQL WITH MONGODB
No Impedance issue
One Nuxeo Document = One MongoDB Document
No Scalability issue for CRUD
native distributed architecture allows scale out
No Concurrency performance issue
Document Level "Transactions"
No application level cache is needed
No need to manage invalidations
13. THAT'S WHY WE INTEGRATED MONGODBTHAT'S WHY WE INTEGRATED MONGODB
let's see the technical details
21. CONSISTENCY CHALLENGESCONSISTENCY CHALLENGES
Unitary Document Operations are safe
No impedance issue
Large batch updates is not so much of an issue
SQL DB do not like long running transactions anyway
Multi-documents transactions are an issue
Workflows is a typical use case
Isolation issue
Other transactions can see intermediate states
Possible interleaving
Find a way to mitigate consistency issues
Transactions can not span across multiple documents
22. MITIGATING CONSISTENCY ISSUESMITIGATING CONSISTENCY ISSUES
Transient State Manager
Run all operations in Memory
Flush to MongoDB as late as possible
Populate an Undo Log
Replay backward in case of Rollback
Recover partial Transaction Management
Commit / Rollback model
But complete isolation is not possible
Need to flush transient state for queries
"uncommited" changes are visible to others
"read uncommited" at best
23. WHEN TO USE MONGODB OVER TRADITIONAL SQL ?WHEN TO USE MONGODB OVER TRADITIONAL SQL ?
25. THERE IS NOT ONE UNIQUE SOLUTIONTHERE IS NOT ONE UNIQUE SOLUTION
Use each storage solution for what it does the best
SQL DB
store content in an ACID way
consistency over availability
MongoDB
store content in a BASE way
availability over consistency
elasticsearch
provide powerful and scalable queries
Storage does not impact application : this can be a deployment choice!
Atomic Consistent
Isolated Durable
Basic Availability
Soft state
Eventually consistent
27. HUGE REPOSITORY - HEAVY LOADINGHUGE REPOSITORY - HEAVY LOADING
Massive amount of Documents
x00,000,000
Automatic versioning
create a version for each single change
Write intensive access
daily imports or updates
recursive updates (quotas, inheritance)
SQL DB collapses (on commodity hardware)
MongoDB handles the volume
29. BENCHMARKING READ + WRITEBENCHMARKING READ + WRITE
Read & Write Operations
are competing
Write Operations
are not blocked
C4.xlarge (nuxeo)
C4.2Xlarge (DB)
SQL
30. DATA LOADING OVERFLOWDATA LOADING OVERFLOW
Lot of lazy loading
Very large Objects = lots of fragments
lot of lazy loading = create latency issues
Cache trashing issue
SQL mapping requires caching
read lots of documents inside a single transaction
MongoDB has no impedance mismatch
no lazy loading
fast loading of big documents
no need for 2nd level cache
Side effects of impedance miss match
31. BENCHMARKING IMPEDANCE EFFECTBENCHMARKING IMPEDANCE EFFECT
Process 20,000 documents
700 documents/s with SQL backend (cold cache)
6,000 documents/s with MongoDB / mmapv1: x9
11,000 documents/s with MongoDB / wiredTiger: x15
Process 100,000 documents
750 documents/s with SQL backend (cold cache)
9,500 documents/s with MongoDB / mmapv1: x13
11,500 documents/s with MongoDB / wiredTiger: x15
Process 200,000 documents
750 documents/s with SQL backend (cold cache)
14,000 documents/s with MongoDB/mmapv1: x18
11,000 documents/s with MongoDB/wiredTiger: x15
processing benchmark
based on a real use case
32. ROBUST ARCHITECTUREROBUST ARCHITECTURE
native distributed architecture
ReplicaSet : data redundancy & fault tolerance
Geographically Redundant Replica Set : host data on multiple hosting sites
active
active
34. A REAL LIFE EXAMPLE - CONTEXTA REAL LIFE EXAMPLE - CONTEXT
Who: US Network Carrier
Goal: Provide VOD services
Requirements:
store videos
manage meta-data
manage workflows
generate thumbs
generate conversions
manage availability
They chose Nuxeo to build their Video repository
35. A REAL LIFE EXAMPLE - CHALLENGESA REAL LIFE EXAMPLE - CHALLENGES
Very Large Objects:
lots of meta-data (dublincore, ADI, ratings ...)
Massive daily updates
updates on rights and availability
Need to track all changes
prove what was the availability for a given date
looks like a good use case for MongoDB
lots of data + lots of updates
36. A REAL LIFE EXAMPLE - MONGODB CHOICEA REAL LIFE EXAMPLE - MONGODB CHOICE
because they have a good use case for MongoDB
Lots of large objects, lots of updates
because they wanted to use MongoDB
change work habits (Opensouces, NoSQL)
doing a project with MongoDB is cool
they chose MongoDB
they are happy with it !
37. ANY QUESTIONS ?ANY QUESTIONS ?
Thank You !
https://github.com/nuxeo
http://www.nuxeo.com/careers/