5. 5
Agenda
Date Time Webinar
25-Mai-2016 11:00 CET Introduction à NoSQL
09-Juin-2016 14:00 CET Votre première application MongoDB
23-Juin-2016 14:00 CET Schema Design – Pensez Documents
13-Jullet-2016 14:00 CET Advanced Indexing : Text et Geo-Spatial Indexes
28-Juillet-2016 14:00 CET Introduction au Framework d’Aggregation
01-Sept-2016 14:00 CET Déploiement en production
12. 12
Bases clé / valeur
• Simple tableau associatif : 1 clé 1 valeur
• Requêtes sur la clé primaire uniquement
• Très rapides pour les requêtes simples
• Pas adapté pour les “reverse lookups”
Key Value
12345 4567.3456787
12346 { addr1 : “The Grange”, addr2: “Dublin” }
12347 “top secret password”
12358 “Shopping basket value : 24560”
12787 12345
13. 13
Revision : Bases relationnelles
• Données stockées par ligne (RDBMS traditionnelles, ie. MySQL)
• Chaque lecture récupère une ligne entière
• Les lectures nécessitant une ou deux colonnes ne sont pas
optimisées.
ID Name Salary Start Date
1 Joe D $24000 1/Jun/1970
2 Peter J $28000 1/Feb/1972
3 Phil G $23000 1/Jan/1973
1 Joe D $24000 1/Jun/1970 2 Peter J $28000 1/Feb/1972 3 Phil G $23000 1/Jan/1973
14. 14
Bases de données orientées colonnes
1 2 3
ID Name Salary Start Date
1 Joe D $24000 1/Jun/1970
2 Peter J $28000 1/Feb/1972
3 Phil G $23000 1/Jan/1973
Joe D Peter J Phil G $24000 $28000 $23000 1/Jun/1970 1/Feb/1972 1/Jan/1973
15. 15
Quel est l’intérêt ?
• L’ensemble des données d’une colonne peut être récupéré de
façon très efficace.
• La compression est bien meilleure
• Les opérations de lectures récupèrent plus de données par accès
• En cas de besoin de quelques colonnes, pas besoin de lire
l’ensemble des lignes.
• MAIS:
– L’update et le delete sont très coûteux
• Append only de préférence
• Plus adapté à OLAP qu’à OLTP
16. 16
Graph Stores
• Stocke des graphs (arrêtes et sommets)
• Par exemple : réseau social
• Conçu pour optimiser la traversée des graphs
• Parfait pour représenter les connexions
• Peut être implémenté dans un key/value store
• Si le cas d’usage n’est pas “graph”, vous n’avez pas besoin d’une
base Graph.
17. 17
Multi-Model Databases
• Combine plusieurs modèles d’accès / stockage
• Souvent une base graph + “quelque chose d’autre”
• Permet d’éviter d’avoir recours à plusieurs bases (complexe de
garantir la cohérence – “Polyglot Persistance”)
• Derniers arrivés dans le monde NoSQL
18. 18
Document Store
• Documents != PDFs, Microsoft Word ou HTML
• Structures hiérarchiques définies en JSON (Javascript Object Notation)
{
name : “Remi Forest”,
title : “Senior Solution Architect”,
Address : {
address1 : “Latin Hall”,
address2 : “Golden Lane”,
eircode : “D09 N623”,
}
expertise: [ “MongoDB”, “Python”, “Javascript” ],
employee_number : 320,
location : [ 53.34, -6.26 ]
}
20. 20
MongoDB comprend les documents JSON
• JSON database depuis le début
• Stocke le JSON sous format binaire (BSON)
• Efficace pour les transferts sur le réseau
• MongoDB peut indexer n’importe quel partie du document
• (Nous couvrirons ça plus tard)
21. 21
Pourquoi des Documents?
• Schema dynamique
• Elimination de la couche Object/Relational Mapping
• Dénormalisation des données pour améliorer la performance
22. 22
Pourquoi des Documents?
• Schema dynamique
• Elimination de la couche Object/Relational Mapping
• Dénormalisation des données pour améliorer la performance
23. 23
Modèles de requètes
Rich
Queries
• Quelle est la voiture de Paul
• Qui avait une voiture entre 1970 et 1980
Geospatial
• Qui possède une voiture dans les 5km
autour de la place de la Concorde ?
Text Search
• Quels sont les voitures avec des sièges
en cuir ?
Aggregation
• Quelle est la valeur moyenne des
véhicules par département ?
Map
Reduce
• Quelle est l’évolution des couleurs en
fonction de l’année ? (est-ce que le rose
est tendance en Chine ?)
32. 32
Scalabilité horizontale – Sharding
• La clé de sharding partitionne les données
• MongoDB distribue automatiquement les donnée
• Les shards peuvent être ajoutés à la volée
• L’équilibrage se fait en tâche de fond
• La clé de sharding est fixe
• La clé permet de diriger les requêtes vers les shards appropriés
• Une query sans la clé de sharding est envoyée à tous les shards
34. 34
Routage des requêtes
• Avec un cluster shardé, on utilise une couche de routage pour
distribuer les requêtes.
• C’est effectué par le process MongoS (Mongo Shard Router)
• C’est un process stateless
• On peut en lancer autant que nécessaire
• Typiquement un process par serveur d’application
35. 35
En résumé
• Pourquoi NoSQL
• Les différents types de bases NoSQL
• Vue d’ensemble de MongoDB
• Disponibilité / Durabilité – MongoDB Replica Sets
• Scalabilité – MongoDB Sharding
36. 36
Prochain Webinar – Votre première application MongoDB
• 9 Juin 2016 – 14:00 CET.
• Inscrivez-vous !
• Apprenez à construire votre première application MongoDB
• Création des bases et des collections
• Requêtes
• Création d’indexes
• Introduction aux critères de performance
• Feedbacks : back-to-basics@mongodb.com
Delighted to have you here. Hope you can make it to all the sessions. Sessions will be recorded so we can send them out afterwards so don’t worry if you miss one.
If you have questions please pop them in the sidebar.
A lot of people expect us to come in and bash relational database or say we don’t think they’re good. And that’s simply not true.
Relational databases has laid the foundation for what you’d want out of a database, and we absolutely think there are capabilities that remain critical today
Expressive query language & secondary Indexes. Users should be able to access and manipulate their data in sophisticated ways – and you need a query language that let’s you do all that out of the box. Indexes are a critical part of providing efficient access to data. We believe these are table stakes for a database.
Strong consistency. Strong consistency has become second nature for how we think about building applications, and for good reason. The database should always provide access to the most up-to-date copy of the data. Strong consistency is the right way to design a database.
Enterprise Management and Integrations. Finally, databases are just one piece of the puzzle, and they need to fit into the enterprise IT stack. Organizations need a database that can be secured, monitored, automated, and integrated with their existing IT infrastructure and staff, such as operations teams, DBAs, and data analysts.
But of course the world has changed a lot since the 1980s when the relational database first came about.
First of all, data and risk are significantly up.
In terms of data
90% data created in last 2 years - think about that for a moment, of all the data ever created, 90% of it was in the last 2 years
80% of enterprise data is unstructured - this is data that doesn’t fit into the neat tables of a relational database
Unstructured data is growing 2X rate of structured data
At the same time, risks of running a database are higher than ever before. You are now faced with:
More users - Apps have shifted from small internal departmental system with thousands of users to large external audiences with millions of users
No downtime - It’s no longer the case that apps only need to be available during standard business hours. They must be up 24/7.
All across the globe - your users are everywhere, and they are always connected
On the other hand, time and costs are way down.
There’s less time to build apps than ever before. You’re being asked to:
Ship apps in a few months not years - Development methods have shifted from a waterfall process to an iterative process that ships new functionality in weeks and in some cases multiple times per day at companies like Facebook and Amazon.
And costs are way down too. Companies want to:
Pay for value over time - Companies have shifted to open-source business and SaaS models that allow them to pay for value over time
Use cloud and commodity resources - to reduce the time to provision their infrastructure, and to lower their total cost of ownership
Because the relational database was not designed for modern applications, starting about 10 years ago a number of companies began to build their own databases that are fundamentally different. The market calls these NoSQL.
NoSQL databases were designed for this new world…
Flexibility. All of them have some kind of flexible data model to allow for faster iteration and to accommodate the data we see dominating modern applications. While they all have different approaches, what they have in common is they want to be more flexible.
Scalability + Performance. Similarly, they were all built with a focus on scalability, so they all include some form of sharding or partitioning. And they're all designed to deliver great performance. Some are better at reads, some are better at writes, but more or less they all strive to have better performance than a relational database.
Always-On Global Deployments. Lastly, NoSQL databases are designed for highly available systems that provide a consistent, high quality experience for users all over the world. They are designed to run on many computers, and they include replication to automatically synchronize the data across servers, racks, and data centers.
However, when you take a closer look at these NoSQL systems, it turns out they have thrown out the baby with the bathwater. They have sacrificed the core database capabilities you’ve come to expect and rely on in order to build fully functional apps, like rich querying and secondary indexes, strong consistency, and enterprise management.
MongoDB was built to address the way the world has changed while preserving the core database capabilities required to build modern applications.
Our vision is to leverage the work that Oracle and others have done over the last 40 years to make relational databases what they are today, and to take the reins from here. We pick up where they left off, incorporating the work that internet pioneers like Google and Amazon did to address the requirements of modern applications.
MongoDB is the only database that harnesses the innovations of NoSQL and maintains the foundation of relational databases – and we call this our Nexus Architecture.
Think redis, memcached or Couchbase.
Column stores you know and love, HP Vertica, Cassandra.
Rich queries, text search, geospatial, aggregation, mapreduce are types of things you can build based on the richness of the query model.