SlideShare une entreprise Scribd logo
1  sur  39
NoSQL databases
STATE OF THE ART
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 1
I - Overview
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 2
What is NoSQL?
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 3
(typically) NoSQL is …
Non-relational
Distributed
Horizontally scalable
Big data
Performant
Open source
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 4
Relational VS NoSQL
Property Relational NoSQL
Performance for high
data volume
Low High
Horizontal scalability Complex, error-prone Simple
Flexibility Low High
Consistency Strong (ACID) Eventual (BASE)
Indexing Multiple columns Single column
Data duplication Not possible Allowed
Standard query
language
Yes No
Data model Single Multiple
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 5
II - Models
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 6
Main NoSQL database models
Key-value
Document
Column
Graph
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 7
Key-value store. Data model
Key 1
Key 2
Key 3
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 8
Value 1
Value 2
Value 3
KEYS VALUES
Key-value store. Characteristics
PROS
Frequent reads / writes
Simple data model
Rapid query execution
CONS
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 9
Small reads / writes
Simple data model
Poor query capabilities
Key-value store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 10
Document store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 11
Document 1 – ID 1
{
id: ‘1’
name: ‘foo’
attributeX: ‘bar’
}
JSON
Document 2 – ID 2
{
id: ‘2’
name: ‘bar’
}
JSON
Document 3 – ID 3
<element>
<name>A</name>
<content>
<type>B</type>
<color>red</color>
</content>
</element>
XML
Document 4 – ID 4
<element>
<name>B</name>
<value>5</value>
</element>
XML
Document store. Characteristics
Flexible
Object in single document
Rich querying capabilities
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 12
PROS CONS
No joins
Document store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 13
Column store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 14
Column Family
Row1
Row2
Row
Key1
Row
Key2
Column1
name1 : value1
timestamp1
Column2
name2 : value2
timestamp2
ColumnN
nameN : valueN
timestampN
Column1
name1 : value1
timestamp1
Column3
name3 : value3
timestamp3
ColumnM
nameM : valueM
timestampM
Column store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 15
Super Column Family
Row1
Row
Key1
SuperColumnX
…
name1
value1
time
stamp1
nameN
valueN
time
stampN
SuperColumnY
…
name1
value1
time
stamp1
nameM
valueM
time
stamp
M
Column store. Characteristics
Large number of data
(in dynamic columns)
Fast queries on columns
(usually reads)
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 16
PROS CONS
Slow queries on rows (usually
writes)
Column store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 17
Graph store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 18
Node1
Node2
Node4
Node3
Node6
Node5
Edge1
Property1
Property2
Property3
Edge2
Edge3
Edge4
Edge5
Edge6
Graph store. Characteristics
Network modelling
Graph-like queries
Rapid deep traversal
Fully ACID
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 19
PROS CONS
No sharding
Poor horizontal scalability
Complex data model
Graph store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 20
Other NoSQL database models
• Based on few other modelsMultimodel
• Follows OOP principlesObject-oriented
• Mutli-valued attributesMultiValue
• Optimized to managa time series dataTime series
• …And many more
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 21
Comparison of NoSQL models *
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 22
Model Performance Scalability Flexibility Complexity Functionality
Key-value high high high none variable (none)
Document high variable (high) high low variable (low)
Column high high moderate low minimal
Graph variable variable high high graph theory
Relational variable variable low moderate relational
algebra
* Summary of a presentation by Ben Scofield: https://www.slideshare.net/bscofield/nosql-codemash-2010
Comparison by data size / complexity
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 23
Key-value Column Document Graph
Data size
Data complexity
III – Software
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 24
Criteria for evaluation
Popularity rank *
Data model
Consistency
Availability
Concurrency
Scalability
Querying
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 25
* According to DB-Engines ranking https://db-engines.com/en/ranking (April 2017). Relational DBMSs where discarded.
TOP 4 Systems
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 26
MongoDB
Cassandra
Redis
Elasticsearch
1
2
3
4
Document
Column + key-value
In-memory key-value
Document (search engine)
Consistency
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 27
MongoDB
• Configurable
• Strong by default
Cassandra
• Configurable
Redis
• Eventual
Elasticsearch
• Configurable
• Consistent, with
options
Availability
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 28
MongoDB
• Replicated
Cassandra
• Distributed
Redis
• Replicated
Elasticsearch
• Replicated
High
availability
Concurrency
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 29
• Multi-
granularity
locking
(MGL)
MongoDB
• Multiversion
concurrency
control
(MVCC)
Cassandra
• Optimistic
concurrency
control (OCC)
Redis
• Optimistic
concurrency
control (OCC)
Elasticsearch
Scalability
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 30
• High (automatic
data sharding)
MongoDB
• High (automatic
addition /
removal of
nodes in cluster)
Cassandra
• Poor
Redis
• High (dynamic
sharding on live
cluster)
Elasticsearch
Querying
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 31
• Internal API
(MapReduce)
• Complex query
support
MongoDB
• Internal API, CQL
SQL-like
• Complex query
support
Cassandra
• By key or value
range
• Rapid
• No complex
queries
Redis
• Own query
language (Query
DSL)
• Full text search,
filters
Elasticsearch
IV – Geospatial
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 32
GIS (geographic information system)
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 33
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 34
Idea behind GIS « magic »
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 35
Geospatial
data
Geohash API
GIS
support
Available solutions
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 36
Solutions
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 37
New document format GeoJSON (MongoDB)
GeoMesa + Apache Spark (Hadoop)
CQL extension (Cassandra)
GeoCouch extension (CouchDB)
Fast I/O in-memory geospatial operations (Redis)
Library Neo4j Spatial (Neo4j)
V - Conclusion
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 38
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 39

Contenu connexe

Similaire à NoSQL databases

Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Introduction to MySQL Document Store
Introduction to MySQL Document StoreIntroduction to MySQL Document Store
Introduction to MySQL Document StoreFrederic Descamps
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
fast nearest neighbor search with keywords
fast nearest neighbor search with keywordsfast nearest neighbor search with keywords
fast nearest neighbor search with keywordsswathi78
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...Mydbops
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsPanagiotis Papadopoulos
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architectureNikolay Matvienko
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...Amazon Web Services
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 

Similaire à NoSQL databases (20)

Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Introduction to MySQL Document Store
Introduction to MySQL Document StoreIntroduction to MySQL Document Store
Introduction to MySQL Document Store
 
Neo4jrb
Neo4jrbNeo4jrb
Neo4jrb
 
Node.js and NoSQL
Node.js and NoSQLNode.js and NoSQL
Node.js and NoSQL
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
fast nearest neighbor search with keywords
fast nearest neighbor search with keywordsfast nearest neighbor search with keywords
fast nearest neighbor search with keywords
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
No sql
No sqlNo sql
No sql
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 

Dernier

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Dernier (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

NoSQL databases

  • 1. NoSQL databases STATE OF THE ART 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 1
  • 2. I - Overview 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 2
  • 3. What is NoSQL? 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 3
  • 4. (typically) NoSQL is … Non-relational Distributed Horizontally scalable Big data Performant Open source 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 4
  • 5. Relational VS NoSQL Property Relational NoSQL Performance for high data volume Low High Horizontal scalability Complex, error-prone Simple Flexibility Low High Consistency Strong (ACID) Eventual (BASE) Indexing Multiple columns Single column Data duplication Not possible Allowed Standard query language Yes No Data model Single Multiple 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 5
  • 6. II - Models 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 6
  • 7. Main NoSQL database models Key-value Document Column Graph 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 7
  • 8. Key-value store. Data model Key 1 Key 2 Key 3 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 8 Value 1 Value 2 Value 3 KEYS VALUES
  • 9. Key-value store. Characteristics PROS Frequent reads / writes Simple data model Rapid query execution CONS 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 9 Small reads / writes Simple data model Poor query capabilities
  • 10. Key-value store. Implementations 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 10
  • 11. Document store. Data model 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 11 Document 1 – ID 1 { id: ‘1’ name: ‘foo’ attributeX: ‘bar’ } JSON Document 2 – ID 2 { id: ‘2’ name: ‘bar’ } JSON Document 3 – ID 3 <element> <name>A</name> <content> <type>B</type> <color>red</color> </content> </element> XML Document 4 – ID 4 <element> <name>B</name> <value>5</value> </element> XML
  • 12. Document store. Characteristics Flexible Object in single document Rich querying capabilities 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 12 PROS CONS No joins
  • 13. Document store. Implementations 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 13
  • 14. Column store. Data model 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 14 Column Family Row1 Row2 Row Key1 Row Key2 Column1 name1 : value1 timestamp1 Column2 name2 : value2 timestamp2 ColumnN nameN : valueN timestampN Column1 name1 : value1 timestamp1 Column3 name3 : value3 timestamp3 ColumnM nameM : valueM timestampM
  • 15. Column store. Data model 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 15 Super Column Family Row1 Row Key1 SuperColumnX … name1 value1 time stamp1 nameN valueN time stampN SuperColumnY … name1 value1 time stamp1 nameM valueM time stamp M
  • 16. Column store. Characteristics Large number of data (in dynamic columns) Fast queries on columns (usually reads) 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 16 PROS CONS Slow queries on rows (usually writes)
  • 17. Column store. Implementations 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 17
  • 18. Graph store. Data model 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 18 Node1 Node2 Node4 Node3 Node6 Node5 Edge1 Property1 Property2 Property3 Edge2 Edge3 Edge4 Edge5 Edge6
  • 19. Graph store. Characteristics Network modelling Graph-like queries Rapid deep traversal Fully ACID 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 19 PROS CONS No sharding Poor horizontal scalability Complex data model
  • 20. Graph store. Implementations 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 20
  • 21. Other NoSQL database models • Based on few other modelsMultimodel • Follows OOP principlesObject-oriented • Mutli-valued attributesMultiValue • Optimized to managa time series dataTime series • …And many more 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 21
  • 22. Comparison of NoSQL models * 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 22 Model Performance Scalability Flexibility Complexity Functionality Key-value high high high none variable (none) Document high variable (high) high low variable (low) Column high high moderate low minimal Graph variable variable high high graph theory Relational variable variable low moderate relational algebra * Summary of a presentation by Ben Scofield: https://www.slideshare.net/bscofield/nosql-codemash-2010
  • 23. Comparison by data size / complexity 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 23 Key-value Column Document Graph Data size Data complexity
  • 24. III – Software 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 24
  • 25. Criteria for evaluation Popularity rank * Data model Consistency Availability Concurrency Scalability Querying 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 25 * According to DB-Engines ranking https://db-engines.com/en/ranking (April 2017). Relational DBMSs where discarded.
  • 26. TOP 4 Systems 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 26 MongoDB Cassandra Redis Elasticsearch 1 2 3 4 Document Column + key-value In-memory key-value Document (search engine)
  • 27. Consistency 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 27 MongoDB • Configurable • Strong by default Cassandra • Configurable Redis • Eventual Elasticsearch • Configurable • Consistent, with options
  • 28. Availability 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 28 MongoDB • Replicated Cassandra • Distributed Redis • Replicated Elasticsearch • Replicated High availability
  • 29. Concurrency 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 29 • Multi- granularity locking (MGL) MongoDB • Multiversion concurrency control (MVCC) Cassandra • Optimistic concurrency control (OCC) Redis • Optimistic concurrency control (OCC) Elasticsearch
  • 30. Scalability 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 30 • High (automatic data sharding) MongoDB • High (automatic addition / removal of nodes in cluster) Cassandra • Poor Redis • High (dynamic sharding on live cluster) Elasticsearch
  • 31. Querying 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 31 • Internal API (MapReduce) • Complex query support MongoDB • Internal API, CQL SQL-like • Complex query support Cassandra • By key or value range • Rapid • No complex queries Redis • Own query language (Query DSL) • Full text search, filters Elasticsearch
  • 32. IV – Geospatial 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 32
  • 33. GIS (geographic information system) 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 33
  • 34. 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 34
  • 35. Idea behind GIS « magic » 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 35 Geospatial data Geohash API GIS support
  • 36. Available solutions 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 36
  • 37. Solutions 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 37 New document format GeoJSON (MongoDB) GeoMesa + Apache Spark (Hadoop) CQL extension (Cassandra) GeoCouch extension (CouchDB) Fast I/O in-memory geospatial operations (Redis) Library Neo4j Spatial (Neo4j)
  • 38. V - Conclusion 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 38
  • 39. 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 39

Notes de l'éditeur

  1. Quick look on NoSQL
  2. NoSQL = Not Only SQL … OK, but what kind of properties does it have?
  3. Normally, with quite a few exceptions, NoSQL systems should satisfy following list. All of them are non-relational, and one could argue that this is the main difference. Distributed, meaning working on clusters of machines. Therefore, they should be horizontally scalable. This means that one could easily add new node to cluster without time-consuming process of restructuring database. NoSQL systems are mostly designed for storing massive volumes of data and keeping high level of performance. Usually, they are open source.
  4. NoSQL is designed to work with big data and still show high levels of performance. On the contrary, relational DBs work well until they are dealing with large amounts of data. Opposed to NoSQL, relational DBs require hard work in order to scale them horizontally. Flexibility here means ease of INSERT / UPDATE operations. For relational case data must be in predefined form, for NoSQL – arbitrary form. Relational are always ACID (atomicity, consistency, isolation, durability), however NoSQL proposes the concept of eventual consistency. BASE (Basically Available, Soft state, Eventual consistency) There are many differences, but two very important are standard query language and single data model.
  5. As NoSQL may be presented with a number of varying models, we have to review them.
  6. These are 4 main models of NoSQL databases that we are going to study in details. First, KV is just a dictionary (collection of kv pairs). Next we have Document, which is a collection of different documents such as JSON, XML and others. Column DB consists column family that contains varying in name and size column collections – we will see it later. And graph model – this one focuses on connections between entities.
  7. The data model is straightforward: a collection of kv pairs, where each key has only one corresponding value. Keys are used as indexes and values may contain any data
  8. Rapid query execution because of the simple model and keys as indexes.
  9. Main elements of these databases are documents, which are hierarchical tree data structures. Each document is represented by an indexed key (unique identifier that may be a string, URI or path). Information about given object is stored in a single document, unlike it is organised in relational databases (scattered over different tables). Documents may be of different types (JSON, XML, etc.).
  10. they may offer an API that would enable users to query documents based on their internal structure and content No joins = instead one would have to collect connected data manually
  11. Central elements of database are columns. A column contains name (unique identifier), value (data itself) and timestamp (it allows to determine whether the content is valid, i.e. up to date). Then we have row with row key and associated set of columns. Collection of rows forms column family. Each row of the column family may contain a different number of columns and, additionally, there may be various column names.
  12. Also it is possible to have supercolumns – the column, value of which is a map of columns.
  13. Fast queries on columns: For example if I'm looking at a database of Sales and I want to see how Price has changed over time, I need to look at the Price field for a lot of records, so it's nice to have those stored together in one column. Slow queries on rows: on the contrary, the query that the column store doesn’t like is something like "show me all the information about a particular Sale“ or add a Sale to database. Here you want lots of fields, but for a small number of rows – one. 
  14. This type of databases uses graph structures to represent, store and manage data. Graph database has concepts of edges, nodes and properties. The relationships (edges) link entities (nodes) directly using pointers (unlike in relational databases). Properties can be applied to both nodes and edges, and they help to query data.
  15. Well-suited for networ modelling, such as social networks Graph-like queries such as search for the shortest path between nodes Use of pointers allows to retrieve connected data in one operation (instead of searching through the data and using join operations as it is in relational approach). This enables rapid and deep traversal of the graph structure Unlike other NoSQL models, graph databases fully support ACID properties Does not support data sharding, meaning that all data must be stored on single server Hence, poor horizontal scalability
  16. As we have seen before, there are lots of different systems on the market. Now we will take a look at only few of them and we will try to evaluate them. For that we need some criteria…
  17. We selected top 4 systems which are … They use corresponding models …
  18. all systems have configurable consistency, except Redis.
  19. Replicated means that data is divided in several replica sets – shards, usually in this case master-slave model is used. Distributed means that each node in the cluster is responsible for a given data set. All of them are highly available.
  20. Concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible. Each system uses different method end ensures concurrency. MGL - locks objects that contain other objects. It exploits the hierarchical nature of MongoDB documents. MVCC takes a different approach: each user connected to the database sees a snapshot of the database at a particular instant in time. Any changes made by a writer will not be seen by other users of the database until the transaction has been committed. OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted
  21. In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria (for example, text specified by a user). MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster
  22. This is interesting for us in context of DataBio project that Softeam participates in.
  23. GIS allows you to record a map with a geospatial referencing system such as longitude or latitude and then to add additional layers of other information.   Layers can be linked.  Analysis of the information can then be undertaken using the statistical and analytical tools that are provided as part of the GIS. It is possible to provide visual representations of data.  These representations can often reveal patterns and trends that might otherwise have gone unnoticed without the use of GIS techniques. Usecases: Mapping of data (visual representation of data on map) Proximity analysis (distance between objects, points, polygons etc.) Finding clusters Find nearest What’s in area?
  24. Taxi manager example implemented with GeoMesa that is used in Hadoop-based NoSQL systems
  25. NoSQL database must support geospatial data store geohash as integer index (e.g. quadtree, R-Tree or Hilbert curves index) converted from 2D, 3D or 4D coordinates and time. Provide API / query language to work with data As a result – NoSQL DB can be used in GIS