SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
USING MONGODBUSING MONGODB
TO BUILD A FAST AND SCALABLETO BUILD A FAST AND SCALABLE
CONTENT REPOSITORYCONTENT REPOSITORY
SOME CONTEXTSOME CONTEXT
What we Do and What Problems We Try to Solve
NUXEONUXEO
Nuxeo
​we provide a Platform that developers can use to build highly
customized Content Applications
we provide components, and the tools to assemble them
everything we do is open source (for real)
various customers - various use cases
me: developer & CTO - joined the Nuxeo project 10+ years ago
Track game builds Electronic Flight Bags Central repository for Models Food industry PLM
https://github.com/nuxeo
Document Oriented Database
Document Repository
Store JSON Documents
Manage Document attributes,
hierarchy, blobs, security, lifecycle, versions
DOCUMENT REPOSITORYDOCUMENT REPOSITORY
Storage abstraction : be able to choose the right storage
​depending on the constraints
depending on the environment
Manage Content Model
Schemas, Mixins, facets
​​Manage Data level Security
​Document level permissions
Blob level permissions
Versioning
Life-Cycle
Blob management
​Efficient storage & CDN
HISTORY : NUXEO REPOSITORY & STORAGEHISTORY : NUXEO REPOSITORY & STORAGE
2006: Nuxeo Repository is based on ZODB (Python / Zope based)
This is not JSON in NoSQL, but Python serialization in ObjectDB
Conccurency and performances issues, Bad transaction handling
2007: Nuxeo Platform 5.1 - Apache JackRabbit (JCR based)
Mix SQL + Java Serialization + Lucene
Transaction and consistency issues
2009: Nuxeo 5.2 - Nuxeo VCS
SQL based repository : MVCC & ACID
very reliable, but some use cases can not fit in a SQL DB !
2014: Nuxeo 5.9 - Nuxeo DBS
Document Based Storage repository
MongoDB is the reference backend​
Object DB
Document DB
SQL DB
FROM SQL TO NOSQLFROM SQL TO NOSQL
Understanding the motivations
for moving to MongoDB
SQL BASED REPOSITORY - VCSSQL BASED REPOSITORY - VCS
Search API is the most used :
search is the main scalability challenge
KEY LIMITATIONS OF THE SQL APPROACHKEY LIMITATIONS OF THE SQL APPROACH
Impedance issue
storing Documents in tables is not easy
requires Caching and Lazy loading
Scalability
Document repository can become very large (versions, workflows ...)
​scaling out SQL DB is very complex (and never transparent)
Concurrency model
Heavy write is an issue (Quotas, Inheritance)
​​Hard to maintain good Read & Write performances
NEED A DIFFERENT STORAGE MODEL !NEED A DIFFERENT STORAGE MODEL !
FROM SQL TO NO SQLFROM SQL TO NO SQL
NO SQL WITH MONGODBNO SQL WITH MONGODB
No Impedance issue
One Nuxeo Document = One MongoDB Document
No Scalability issue for CRUD
​native distributed architecture allows scale out
No Concurrency performance issue
​​Document Level "Transactions"
No application level cache is needed
No need to manage invalidations
THAT'S WHY WE INTEGRATED MONGODBTHAT'S WHY WE INTEGRATED MONGODB
let's see the technical details
INTEGRATING MONGODBINTEGRATING MONGODB
Inside nuxeo-dbs storage adapter
DOCUMENT BASE STORAGE & MONGODBDOCUMENT BASE STORAGE & MONGODB
DOCUMENT BASE STORAGE & MONGODBDOCUMENT BASE STORAGE & MONGODB
STORING NUXEO DOCUMENTS IN MONGODBSTORING NUXEO DOCUMENTS IN MONGODB
{
"ecm:id":"52a7352b-041e-49ed-8676-328ce90cc103",
"ecm:primaryType":"MyFile",
"ecm:majorVersion":NumberLong(2),
"ecm:minorVersion":NumberLong(0),
"dc:title":"My Document",
"dc:contributors":[ "bob", "pete", "mary" ],
"dc:created": ISODate("2014-07-03T12:15:07+0200"),
...
"cust:primaryAddress":{
"street":"1 rue René Clair", "zip":"75018", "city":"Paris", "country":"France"},
"files:files":[
{ "name":"doc.txt", "length":1234, "mime-type":"plain/text",
"data":"0111fefdc8b14738067e54f30e568115"
},
{
"name":"doc.pdf", "length":29344, "mime-type":"application/pdf",
"data":"20f42df3221d61cb3e6ab8916b248216"
}
],
"ecm:acp":[
{
name:"local",
acl:[ { "grant":false, "perm":"Write", "user":"bob"},
{ "grant":true, "perm":"Read", "user":"members" } ]
}]
...
}
40+ fields by default
​depends on config
18 indexes
HIERARCHYHIERARCHY
Parent-child relationship
Recursion optimized through array
• Maintained by framework (create, delete, move, copy)
ecm:parentId
ecm:ancestorIds
{ ...
"ecm:parentId" : "3d7efffe-e36b-44bd-8d2e-d8a70c233e9d",
"ecm:ancestorIds" : [ "00000000-0000-0000-0000-000000000000",
"4f5c0e28-86cf-47b3-8269-2db2d8055848",
"3d7efffe-e36b-44bd-8d2e-d8a70c233e9d" ]
...}
SECURITYSECURITY
Generic ACP stored in ecm:acp field
Precomputed Read ACLs to avoid post-filtering on search
• Simple set of identities having access
• Semantic restrictions on blocking
• Maintained by framework
• Search matches if intersection
ecm:racl: ["Management", "Supervisors", "bob"]
db.default.find({"ecm:racl": {"$in": ["bob", "members", "Everyone"]}})
{...
"ecm:acp":[ {
name:"local",
acl:[ { "grant":false, "perm":"Write", "user":"bob"},
{ "grant":true, "perm":"Read", "user":"members" } ]}]
...}
SEARCHSEARCH
db.default.find({
$and: [
{"dc:title": { $in: ["Workspaces", "Sections"] } },
{"ecm:racl": {"$in": ["bob", "members", "Everyone"]}}
]
}
)
SELECT * FROM Document WHERE dc:title = 'Sections' OR dc:title = 'Workspaces'
CONSISTENCY CHALLENGESCONSISTENCY CHALLENGES
Unitary Document Operations are safe
No impedance issue
Large batch updates is not so much of an issue
SQL DB do not like long running transactions anyway
Multi-documents transactions are an issue
Workflows is a typical use case
Isolation issue
Other transactions can see intermediate states
Possible interleaving
Find a way to mitigate consistency issues
Transactions can not span across multiple documents
MITIGATING CONSISTENCY ISSUESMITIGATING CONSISTENCY ISSUES
Transient State Manager
Run all operations in Memory
Flush to MongoDB as late as possible
Populate an Undo Log
Replay backward in case of Rollback
Recover partial Transaction Management
Commit / Rollback model
But complete isolation is not possible
Need to flush transient state for queries
"uncommited" changes are visible to others
"​read uncommited" at best
WHEN TO USE MONGODB OVER TRADITIONAL SQL ?WHEN TO USE MONGODB OVER TRADITIONAL SQL ?
MONGODB REPOSITORYMONGODB REPOSITORY
Typical use cases
THERE IS NOT ONE UNIQUE SOLUTIONTHERE IS NOT ONE UNIQUE SOLUTION
Use each storage solution for what it does the best
SQL DB
store content in an ACID way
consistency over availability
MongoDB
store content in a BASE way
availability over consistency
elasticsearch
provide powerful and scalable queries
Storage does not impact application : this can be a deployment choice!
Atomic Consistent
Isolated Durable
Basic Availability
Soft state
Eventually consistent
IDEAL USE CASES FOR MONGODBIDEAL USE CASES FOR MONGODB
HUGE REPOSITORY - HEAVY LOADINGHUGE REPOSITORY - HEAVY LOADING
Massive amount of Documents
x00,000,000
Automatic versioning
create a version for each single change
Write intensive access
​daily imports or updates
recursive updates (quotas, inheritance)
SQL DB collapses (on commodity hardware)
MongoDB handles the volume
BENCHMARKING MASS IMPORTBENCHMARKING MASS IMPORT
SQL
with tunning
commodity hardware
SQL
7x faster
BENCHMARKING READ + WRITEBENCHMARKING READ + WRITE
Read & Write Operations
are competing
Write Operations
are not blocked
C4.xlarge (nuxeo)
C4.2Xlarge (DB)
SQL
DATA LOADING OVERFLOWDATA LOADING OVERFLOW
Lot of lazy loading
Very large Objects = lots of fragments
lot of lazy loading = create latency issues
​
​Cache trashing issue
SQL mapping requires caching
read lots of documents inside a single transaction
MongoDB has no impedance mismatch
no lazy loading
fast loading of big documents
no need for 2nd level cache
Side effects of impedance miss match
BENCHMARKING IMPEDANCE EFFECTBENCHMARKING IMPEDANCE EFFECT
Process 20,000 documents
700 documents/s with SQL backend (cold cache)
6,000 documents/s with MongoDB / mmapv1: x9
11,000 documents/s with MongoDB / wiredTiger: x15
Process 100,000 documents
750 documents/s with SQL backend (cold cache)
9,500 documents/s with MongoDB / mmapv1: x13
11,500 documents/s with MongoDB / wiredTiger: x15
Process 200,000 documents
750 documents/s with SQL backend (cold cache)
14,000 documents/s with MongoDB/mmapv1: x18
11,000 documents/s with MongoDB/wiredTiger: x15
processing benchmark
based on a real use case
ROBUST ARCHITECTUREROBUST ARCHITECTURE
native distributed architecture
ReplicaSet : data redundancy & fault tolerance
Geographically Redundant Replica Set : host data on multiple hosting sites​
active
active
A REAL LIFE EXAMPLEA REAL LIFE EXAMPLE
A REAL LIFE EXAMPLE - CONTEXTA REAL LIFE EXAMPLE - CONTEXT
Who: US Network Carrier
Goal: Provide VOD services
Requirements:
store videos
manage meta-data
manage workflows
generate thumbs
generate conversions
manage availability​
They chose Nuxeo to build their Video repository
A REAL LIFE EXAMPLE - CHALLENGESA REAL LIFE EXAMPLE - CHALLENGES
Very Large Objects:
lots of meta-data (dublincore, ADI, ratings ...)
Massive daily updates
updates on rights and availability
Need to track all changes
prove what was the availability for a given date
looks like a good use case for MongoDB
lots of data + lots of updates
A REAL LIFE EXAMPLE - MONGODB CHOICEA REAL LIFE EXAMPLE - MONGODB CHOICE
because they have a good use case for MongoDB
​Lots of large objects, lots of updates
because they wanted to use MongoDB
change work habits (Opensouces, NoSQL)
​doing a project with MongoDB is cool
they chose MongoDB
they are happy with it !
ANY QUESTIONS ?ANY QUESTIONS ?
Thank You !
https://github.com/nuxeo
http://www.nuxeo.com/careers/

Contenu connexe

Tendances

Tendances (20)

Microservice-based software architecture
Microservice-based software architectureMicroservice-based software architecture
Microservice-based software architecture
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
 
Migrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikMigrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at Wordnik
 
AWS Lambda, Step Functions & MongoDB Atlas Tutorial
AWS Lambda, Step Functions & MongoDB Atlas TutorialAWS Lambda, Step Functions & MongoDB Atlas Tutorial
AWS Lambda, Step Functions & MongoDB Atlas Tutorial
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
Discover some "Big Data" architectural concepts with Redis
Discover some  "Big Data" architectural concepts with  Redis Discover some  "Big Data" architectural concepts with  Redis
Discover some "Big Data" architectural concepts with Redis
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICR
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 

Similaire à Using MongoDB to Build a Fast and Scalable Content Repository

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 

Similaire à Using MongoDB to Build a Fast and Scalable Content Repository (20)

Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Scaling the Content Repository with Elasticsearch
Scaling the Content Repository with ElasticsearchScaling the Content Repository with Elasticsearch
Scaling the Content Repository with Elasticsearch
 
OrientDB the database for the web 1.1
OrientDB the database for the web 1.1OrientDB the database for the web 1.1
OrientDB the database for the web 1.1
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
GWAB 2015 - Data Plaraform
GWAB 2015 - Data PlaraformGWAB 2015 - Data Plaraform
GWAB 2015 - Data Plaraform
 
DCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker Containers
 
Squeak DBX
Squeak DBXSqueak DBX
Squeak DBX
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
 
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 

Plus de MongoDB

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Dernier

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Using MongoDB to Build a Fast and Scalable Content Repository

  • 1. USING MONGODBUSING MONGODB TO BUILD A FAST AND SCALABLETO BUILD A FAST AND SCALABLE CONTENT REPOSITORYCONTENT REPOSITORY
  • 2. SOME CONTEXTSOME CONTEXT What we Do and What Problems We Try to Solve
  • 3. NUXEONUXEO Nuxeo ​we provide a Platform that developers can use to build highly customized Content Applications we provide components, and the tools to assemble them everything we do is open source (for real) various customers - various use cases me: developer & CTO - joined the Nuxeo project 10+ years ago Track game builds Electronic Flight Bags Central repository for Models Food industry PLM https://github.com/nuxeo
  • 4. Document Oriented Database Document Repository Store JSON Documents Manage Document attributes, hierarchy, blobs, security, lifecycle, versions
  • 5. DOCUMENT REPOSITORYDOCUMENT REPOSITORY Storage abstraction : be able to choose the right storage ​depending on the constraints depending on the environment Manage Content Model Schemas, Mixins, facets ​​Manage Data level Security ​Document level permissions Blob level permissions Versioning Life-Cycle Blob management ​Efficient storage & CDN
  • 6. HISTORY : NUXEO REPOSITORY & STORAGEHISTORY : NUXEO REPOSITORY & STORAGE 2006: Nuxeo Repository is based on ZODB (Python / Zope based) This is not JSON in NoSQL, but Python serialization in ObjectDB Conccurency and performances issues, Bad transaction handling 2007: Nuxeo Platform 5.1 - Apache JackRabbit (JCR based) Mix SQL + Java Serialization + Lucene Transaction and consistency issues 2009: Nuxeo 5.2 - Nuxeo VCS SQL based repository : MVCC & ACID very reliable, but some use cases can not fit in a SQL DB ! 2014: Nuxeo 5.9 - Nuxeo DBS Document Based Storage repository MongoDB is the reference backend​ Object DB Document DB SQL DB
  • 7. FROM SQL TO NOSQLFROM SQL TO NOSQL Understanding the motivations for moving to MongoDB
  • 8. SQL BASED REPOSITORY - VCSSQL BASED REPOSITORY - VCS Search API is the most used : search is the main scalability challenge
  • 9. KEY LIMITATIONS OF THE SQL APPROACHKEY LIMITATIONS OF THE SQL APPROACH Impedance issue storing Documents in tables is not easy requires Caching and Lazy loading Scalability Document repository can become very large (versions, workflows ...) ​scaling out SQL DB is very complex (and never transparent) Concurrency model Heavy write is an issue (Quotas, Inheritance) ​​Hard to maintain good Read & Write performances
  • 10. NEED A DIFFERENT STORAGE MODEL !NEED A DIFFERENT STORAGE MODEL !
  • 11. FROM SQL TO NO SQLFROM SQL TO NO SQL
  • 12. NO SQL WITH MONGODBNO SQL WITH MONGODB No Impedance issue One Nuxeo Document = One MongoDB Document No Scalability issue for CRUD ​native distributed architecture allows scale out No Concurrency performance issue ​​Document Level "Transactions" No application level cache is needed No need to manage invalidations
  • 13. THAT'S WHY WE INTEGRATED MONGODBTHAT'S WHY WE INTEGRATED MONGODB let's see the technical details
  • 15. DOCUMENT BASE STORAGE & MONGODBDOCUMENT BASE STORAGE & MONGODB
  • 16. DOCUMENT BASE STORAGE & MONGODBDOCUMENT BASE STORAGE & MONGODB
  • 17. STORING NUXEO DOCUMENTS IN MONGODBSTORING NUXEO DOCUMENTS IN MONGODB { "ecm:id":"52a7352b-041e-49ed-8676-328ce90cc103", "ecm:primaryType":"MyFile", "ecm:majorVersion":NumberLong(2), "ecm:minorVersion":NumberLong(0), "dc:title":"My Document", "dc:contributors":[ "bob", "pete", "mary" ], "dc:created": ISODate("2014-07-03T12:15:07+0200"), ... "cust:primaryAddress":{ "street":"1 rue René Clair", "zip":"75018", "city":"Paris", "country":"France"}, "files:files":[ { "name":"doc.txt", "length":1234, "mime-type":"plain/text", "data":"0111fefdc8b14738067e54f30e568115" }, { "name":"doc.pdf", "length":29344, "mime-type":"application/pdf", "data":"20f42df3221d61cb3e6ab8916b248216" } ], "ecm:acp":[ { name:"local", acl:[ { "grant":false, "perm":"Write", "user":"bob"}, { "grant":true, "perm":"Read", "user":"members" } ] }] ... } 40+ fields by default ​depends on config 18 indexes
  • 18. HIERARCHYHIERARCHY Parent-child relationship Recursion optimized through array • Maintained by framework (create, delete, move, copy) ecm:parentId ecm:ancestorIds { ... "ecm:parentId" : "3d7efffe-e36b-44bd-8d2e-d8a70c233e9d", "ecm:ancestorIds" : [ "00000000-0000-0000-0000-000000000000", "4f5c0e28-86cf-47b3-8269-2db2d8055848", "3d7efffe-e36b-44bd-8d2e-d8a70c233e9d" ] ...}
  • 19. SECURITYSECURITY Generic ACP stored in ecm:acp field Precomputed Read ACLs to avoid post-filtering on search • Simple set of identities having access • Semantic restrictions on blocking • Maintained by framework • Search matches if intersection ecm:racl: ["Management", "Supervisors", "bob"] db.default.find({"ecm:racl": {"$in": ["bob", "members", "Everyone"]}}) {... "ecm:acp":[ { name:"local", acl:[ { "grant":false, "perm":"Write", "user":"bob"}, { "grant":true, "perm":"Read", "user":"members" } ]}] ...}
  • 20. SEARCHSEARCH db.default.find({ $and: [ {"dc:title": { $in: ["Workspaces", "Sections"] } }, {"ecm:racl": {"$in": ["bob", "members", "Everyone"]}} ] } ) SELECT * FROM Document WHERE dc:title = 'Sections' OR dc:title = 'Workspaces'
  • 21. CONSISTENCY CHALLENGESCONSISTENCY CHALLENGES Unitary Document Operations are safe No impedance issue Large batch updates is not so much of an issue SQL DB do not like long running transactions anyway Multi-documents transactions are an issue Workflows is a typical use case Isolation issue Other transactions can see intermediate states Possible interleaving Find a way to mitigate consistency issues Transactions can not span across multiple documents
  • 22. MITIGATING CONSISTENCY ISSUESMITIGATING CONSISTENCY ISSUES Transient State Manager Run all operations in Memory Flush to MongoDB as late as possible Populate an Undo Log Replay backward in case of Rollback Recover partial Transaction Management Commit / Rollback model But complete isolation is not possible Need to flush transient state for queries "uncommited" changes are visible to others "​read uncommited" at best
  • 23. WHEN TO USE MONGODB OVER TRADITIONAL SQL ?WHEN TO USE MONGODB OVER TRADITIONAL SQL ?
  • 25. THERE IS NOT ONE UNIQUE SOLUTIONTHERE IS NOT ONE UNIQUE SOLUTION Use each storage solution for what it does the best SQL DB store content in an ACID way consistency over availability MongoDB store content in a BASE way availability over consistency elasticsearch provide powerful and scalable queries Storage does not impact application : this can be a deployment choice! Atomic Consistent Isolated Durable Basic Availability Soft state Eventually consistent
  • 26. IDEAL USE CASES FOR MONGODBIDEAL USE CASES FOR MONGODB
  • 27. HUGE REPOSITORY - HEAVY LOADINGHUGE REPOSITORY - HEAVY LOADING Massive amount of Documents x00,000,000 Automatic versioning create a version for each single change Write intensive access ​daily imports or updates recursive updates (quotas, inheritance) SQL DB collapses (on commodity hardware) MongoDB handles the volume
  • 28. BENCHMARKING MASS IMPORTBENCHMARKING MASS IMPORT SQL with tunning commodity hardware SQL 7x faster
  • 29. BENCHMARKING READ + WRITEBENCHMARKING READ + WRITE Read & Write Operations are competing Write Operations are not blocked C4.xlarge (nuxeo) C4.2Xlarge (DB) SQL
  • 30. DATA LOADING OVERFLOWDATA LOADING OVERFLOW Lot of lazy loading Very large Objects = lots of fragments lot of lazy loading = create latency issues ​ ​Cache trashing issue SQL mapping requires caching read lots of documents inside a single transaction MongoDB has no impedance mismatch no lazy loading fast loading of big documents no need for 2nd level cache Side effects of impedance miss match
  • 31. BENCHMARKING IMPEDANCE EFFECTBENCHMARKING IMPEDANCE EFFECT Process 20,000 documents 700 documents/s with SQL backend (cold cache) 6,000 documents/s with MongoDB / mmapv1: x9 11,000 documents/s with MongoDB / wiredTiger: x15 Process 100,000 documents 750 documents/s with SQL backend (cold cache) 9,500 documents/s with MongoDB / mmapv1: x13 11,500 documents/s with MongoDB / wiredTiger: x15 Process 200,000 documents 750 documents/s with SQL backend (cold cache) 14,000 documents/s with MongoDB/mmapv1: x18 11,000 documents/s with MongoDB/wiredTiger: x15 processing benchmark based on a real use case
  • 32. ROBUST ARCHITECTUREROBUST ARCHITECTURE native distributed architecture ReplicaSet : data redundancy & fault tolerance Geographically Redundant Replica Set : host data on multiple hosting sites​ active active
  • 33. A REAL LIFE EXAMPLEA REAL LIFE EXAMPLE
  • 34. A REAL LIFE EXAMPLE - CONTEXTA REAL LIFE EXAMPLE - CONTEXT Who: US Network Carrier Goal: Provide VOD services Requirements: store videos manage meta-data manage workflows generate thumbs generate conversions manage availability​ They chose Nuxeo to build their Video repository
  • 35. A REAL LIFE EXAMPLE - CHALLENGESA REAL LIFE EXAMPLE - CHALLENGES Very Large Objects: lots of meta-data (dublincore, ADI, ratings ...) Massive daily updates updates on rights and availability Need to track all changes prove what was the availability for a given date looks like a good use case for MongoDB lots of data + lots of updates
  • 36. A REAL LIFE EXAMPLE - MONGODB CHOICEA REAL LIFE EXAMPLE - MONGODB CHOICE because they have a good use case for MongoDB ​Lots of large objects, lots of updates because they wanted to use MongoDB change work habits (Opensouces, NoSQL) ​doing a project with MongoDB is cool they chose MongoDB they are happy with it !
  • 37. ANY QUESTIONS ?ANY QUESTIONS ? Thank You ! https://github.com/nuxeo http://www.nuxeo.com/careers/