1er décembre 2015
Groupe Azure
Sujet: Introduction à DocumentDB
Conférencier: Vicent-Philippe Lauzon, Microsoft
Azure DocumentDB est une base de données de type NoSQL. Lors de cette introduction à DocumentDB, vous verrez:
• Ce qu'est une base de données NoSQL
• Comment DocumentDB se compare t-il face aux autres base de données Azure
• Comment DocumentDB se compare t-il face aux autres base de données NoSQL
• Comment créer et gérer une base DocumentDB
• Comment l'utiliser (outils + C#)
• Sécurité
• Performance / Capacité
Vincent-Philippe Lauzon est un Microsoft Azure Solution Architect & Machine Learning / Consultant Sénior chez CGI. Vous pouvez lire son blog http://vincentlauzon.com et le suivre sur Twitter https://twitter.com/vplauzon
4. No SQL
NO SQL → Not Only SQL
Other than Tabular / Relational model
Less / No up-front (schema) design
Easier to scale horizontally (cluster)
Make them a better match for big data
Each Product makes different tradeoffs
Younger & less complete feature set
5. No SQL on Azure
Fully Managed
Table Storage (Key Value)
Redis (Key Value)
Hadoop HBase (Wide Column)
Hadoop Hive (ad hoc tables)
DocumentDB (Document)
Through Marketplace
MongoDB (Document)
CouchBase (Document)
Cassandra (Wide Column)
Neo4J (Graph)
6. Azure DocumentDB
NoSQL document database as-a-service
Query JSON docs: whole docs are indexed
Familiar languages: SQL & JavaScript
Fast / Predictable Performance (SSD)
Tunable consistency
Flexible document schema without sacrificing
queryability
Will be available on Azure Stack (on premise)
9. Collections
Collections are not tables
Unit of partitioning / Scaling Unit
Transaction boundary
No enforced schema, flexible
Queries or updates stay within one collection
Size of 10 Gb
For more, you need to shard through multiple collections
e.g. Spill-over, Range
11. More on querying
Visit the Querying Playground:
https://www.documentdb.com/sql/demo
Use the cheat sheet
http://aka.ms/docdbcheatsheet
Data Migration Tool:
https://azure.microsoft.com/en-
us/documentation/articles/documentdb-import-data/
12. Querying limitation
Within a collection
No inter-document joins (yet?)
Beside filtering, only ORDER BY is supported
No aggregation yet
No COUNT
No GROUP BY
No SUM, AVG, etc.
SQL for queries only
No batch UPDATE or DELETE or CREATE
13. Indexing
Every property is indexed!
Unless you opt out
You can opt-out selectively
Leave out paths
Per collection (policy) or per document
Indexing mode: consistent vs lazy
Kind: hash, range & spatial
Automatic vs manual
You might want to fiddle with it: indexes take space
You can now change them online!
17. DocumentDB at Microsoft
over 425 millionunique users
store 20TB of JSON document data
under 15ms writes and single digit ms reads
store for 40+ app / device combinations
available globally to serve all markets
user data store
18. Other objects
101
010
DocumentDB account Databases
Users
Permissions
Collections Documents Attachments
Stored procedures
Triggers
User-defined functions
your
Documents
here
{ }
{ }
JS
JS
JS
21. Data Modeling –
Polymorphism
Put every document type in a collection
Discriminate on document type somehow
documentType property or other mechanism
Use Collection as scaling units not as categorization
unit
22. Data Modeling –
Denormalization
Optimize for read (no inter-doc joins)
Embed relationships in document
One-to-few relationships
Data changing infrequently
Data that is integral to documents
When embed provides better reading perf
Make sure the pattern fit: you read much more than
you write
Gone wrong: blog posts & comments
Leverage x-doc transaction (stored procs)
23. Data Modeling –
Normalization
Normalize
One-to-many (unbound)
Many-to-many
Frequent changes
If nothing in here fits: stick with relational
24. Integration within Azure
Indexer for Azure Search
https://azure.microsoft.com/en-
us/documentation/articles/search-howto-connecting-
azure-sql-database-to-azure-search-using-indexers-
2015-02-28/
Power BI:
https://azure.microsoft.com/en-us/blog/unleashing-
insights-from-data-in-documentdb-with-power-bi/
Data Factory: both source & sink
Sink in Stream Analytics:
https://azure.microsoft.com/en-us/blog/azure-stream-
analytics-and-documentdb-for-your-iot-application/