The Nuxeo Platforms used to store its content and metadata in a SQL Database. Adding a new NoSQL backend, like MongoDB, was a good opportunity to review several abstractions and fully transparently adapt to SQL and NoSQL. This presentation compares the two models and reviews how high level concepts (security, relations, search...) are implemented in each model.
6. Nuxeo Core — Rich Documents
• Complex properties and lists of them
• primaryAddress = { street = "1 rue René Clair", zip = "75018",
city = "Paris", country = "France" }
• files = [
• { name = "doc.txt", length = 1234, mime-type = "plain/text",
data = 0111fefdc8b14738067e54f30e568115 }
• { name = "doc.pdf", length = 29344, mime-type = "application/pdf",
data = 20f42df3221d61cb3e6ab8916b248216 }
]
7. Nuxeo Core — Rich Operations
• CRUD
• Create
• Retrieve
• Update
• Delete
• Move
• Copy
• ... but in a Hierarchy
8. Nuxeo Core — Rich Features
• Security based on ACLs and inheritance
• block bob for Write, allow members for Read
• Proxies (multi-filing)
• Versioning
• Placeless documents (versions, tags, relations...)
• Facets (dynamic typing)
• Locking
• Search (NXQL)
SELECT * FROM File WHERE files/*/name = 'doc.txt'
9. Nuxeo Core — Hierarchy
• Parent-child relationship
• Recursion
• Find all the children to change something
• Lifecycle state
• Security
• Search on a subset of the hierarchy
• ... AND ecm:path STARTSWITH '/workspaces/receipts'
11. Storage — SQL
• Stores data in a set of JOINed tables
• Star schema, around the main hierarchy
• Lists as JOINed table with item/pos
• Complex properties as sub-documents (children)
• Lists of complex properties as ordered sub-documents
• Id generated by application or database
• String / native UUID / serial integer
15. Storage — MongoDB
• Standard JSON documents
• Property names fully prefixed
• Lists as arrays of scalars
• Complex properties as sub-documents
• Complex lists as arrays of sub-documents
• Id generated by MongoDB
• Counter using findAndModify, $inc and returnNew
18. Hierarchy — SQL
• Parent-child relationship
• hierarchy.parentid column
• Recursion optimized through ancestors table
• For each document list all its ancestors
• Maintained by database triggers (create, delete, move, copy)
• Alternative for PostgreSQL: array column with all ancestors
22. Proxies — SQL
• Reference to target document
• proxies.targetid column
• Holds only hierarchy-based information, no content
• Parent, name, ACL...
• Additional JOIN during search
23. Proxies — MongoDB
• Copy of the target document
• ecm:proxyTargetId field
• Target document knows who's pointing to it
• ecm:proxyIds field
• Maintained by framework
• Copy needs to be kept up to date when target changes
• Maintained by framework
24. Proxies — Semantics
• What to do when:
• Target removed (→ forbid)
• Proxy removed
• Proxy + target removed at the same time (→ ok)
• Target copied
• Proxy copied (→ new proxy to original target)
• Proxy + target copied at the same time (todo)
25. Security — SQL
• Generic ACP stored in acls table
• Precomputed Read ACLs needed for search
• Ordered list of identities having access, with blocking
["Management", "Supervisors", "-Temps", "bob"]
• Read ACLs are given an identifier
• Identities having access to which Read ACL is precomputed
• Maintained by database triggers
• Search matches using JOIN
28. Security — MongoDB
• Generic ACP stored in ecm:acp field
• Precomputed Read ACLs needed for search
• Simple set of identities having access
ecm:racl: ["Management", "Supervisors", "bob"]!
• Semantic restrictions on blocking
• Maintained by framework
• Search matches if intersection
{"ecm:racl": {"$in": ["bob", "members", "Everyone"]}}
29. Search — SQL
• Translated from NXQL to SQL
• JOIN of all required star/list/complex properties tables
• Additional UNION + JOINs for proxies
• Additional JOIN for security
• Can have correlations (reuse same JOIN)
• Fulltext index(es) on fulltext.simpletext /
fulltext.binarytext columns
30. • Translated from NXQL to MongoDB syntax
• Proxies queried directly
• Security queried by set intersection
• One fulltext index for ecm:fulltextSimple /
ecm:fulltextBinary fields
• Some limitations
Search — MongoDB
31. Search — MongoDB Limitations
• Only one fulltext search per query, restrictions on position
• No generic boolean NOT, must be pushed down as
negative operators
• Search is field/value based
• No multi-field operators (title = description,
expirationDate > modificationDate)
• No multi-field arithmetic (amount + bonus < 1000)
• Subdocument correlation with $elemMatch is less generic than
full JOINs
32. Transactions — SQL
• Standard SQL database capabilities
• Atomic commit
• Two-phase commit (prepare/commit) also useable, although
costly
• Rollback
• Transient data is data modified in the database but not
yet committed
• Transient data is visible along committed data for retrieval and
search
33. Transactions — MongoDB
• No atomic commit beyond a single document
• Commit using a big batch of create/delete/update
accumulated in-memory
• Not atomic, others can see partial state
• No transient space
• Emulate transient space in-memory, flush at commit time
• All accesses and searches must check the transient space as
well as MongoDB
34. Transactions — MongoDB
• No rollback
• Rollback by dropping the in-memory transient space
• Operations involving several documents in relation
• Move, delete, copy, ancestors or recursion checks
• Using transient space + MongoDB for them is too complex
• Flush to MongoDB before doing them (commit)
• Must be able to be rolled back if needed (transaction
compensation)
• Others can see state that's eventually invalid
35. MongoDB — Restrictions
• Eventual consistency and no transactions
• Prevents strong checks
• Duplicate name in a folder
• Move creating cycles
• Remove target before proxy
• Create document in a deleted folder
• Prevents full consistency of hierarchical processing
• Read ACLs, quotas
• Needs background jobs that check consistency
36. MongoDB — Features
• Bulk operations
• Map-reduce for aggregations
• Quotas / count / folder content last modified
• Conditional updates
• Locks
• Prevent dirty writes
• GridFS to store binaries
• Sharding
38. Future Work
• DBS used for more services
• Directories / Vocabularies / User database
• Audit log
• DBS for other backends
• Elasticsearch
• Redis
• PostgreSQL / JSON
• Other...