SlideShare une entreprise Scribd logo
1  sur  69
Beyond relational databases
NoSQL, NewSQL, TimeSeries DB
Grégory BoissinotJanuary 2015
v3
Objectives
Understand the dominance of relational databases
Know the existence of alternative technologies for differing needs
Provide you enough background on how NoSQL databases work
Make you know the existence of others movements
Presentation Content
RDBMS
Stability
Some RDBMS problems
Unsuitable use cases with RDBMS
NoSQL
Why the emergence of this movement?
Transactions and scalability issues
NoSQL types
Relational Databases: already achievement of
maturity
Files
DB
Hierarchical
DB
Network
DB
Relational
DB
temps
1970
RDBMS (Relational Database Management System)
Classic way to store data in the world of enterprise applications
Often used for all database needs
A powerful tool used for many more decades
Providing persistence, concurrency control
Accessible from many programming languages
Mostly standard
Widely understood
The degree of standardisation is enough to keep things familiar
SQL used as an integration mechanism between applications
ACID transactions to modify multiple rows and multiple tables
Atomic,
Consistent,
Isolated, etc
Durable
RDBMS Schema & Normalization
Relational databases require an explicitly
defined schema
A schema is a specification that describes
the structure of an object
Data normalization is the process of
organizing data into tables in such way to
reduce the potential for data anomalies
(an inconsistency in the data)
Joining process
Often the need to read data from multiple tables : a join operation on the data is
performed.
The join is very easier to use in the SQL syntax
As the size of table grows, the join operation take longer as more
data blocks need to be read
RDMS - A stability for more than more decades
Stability of RDBMS
Change in
langages
Change in
architectures
temps
… 1980
Change in
platforms
Change in
processes
Some RDBMS Problems
SCALE OUT
IS HARD
(Limited scale)
RIGID
SCHEMA
IMPEDANCE
MISMATCH
BAD COST
CONTROL
Relational Model Example
Everything is normalized
No data is repeated in
multiple tables.
We have referential
integrity
RIGID
SCHEMA
Changing relational database schema
is hard
Relational model is a set of structured data: tables with tuples and relations
A tuple is a limited data structure
We can’t use List, Map
Can’t nest one tuple within another to get nested records
Promote the data normalization
No data is duplicated
We referential integrity
Data are modeled independently from their usage
Enable to think on data manipulation as operation that have
As input tuples, etc
Return tuples
RIGID
SCHEMA
A relational database used as an integration DB
Very used in 80’
For a relational database, SQL is used as an integration mechanism between applications
● Simple
● Transactional
● Triggers are available (implementation specific)
Shared database integration style
Relational databases are not designed
to run on clusters
But it’s cheaper and more effective to scale
horizontally by buying lots of machines. However it
requires DBA expertise
With relation database, for scaling
you have to buy a bigger machine
SCALE OUT
IS HARD
(with RDBMS)
Difference between the relational model
and the in-memory data structures
A lot of application development effort is spent on mapping data
between in-memory data structures and a relational databases
IMPEDANCE
MISMATCH
Tentatives for helping to map data
OODBMS ORM
(JPA, Hibernate, etc)
IBatis Spring Data jOOQ
IMPEDANCE
MISMATCH
Often difficult to control cost
with relational database
BAD COST
CONTROL
Multiple criterias
● Number of users to access database
● Number of servers
● The volume of the data
Unsuitable use cases for RDBMS
Unpredictable Data
(Accepts entry of any form and
size)
User or Session data, Log,
Sensor Data from IoT
Connected Data
Social data,
Recommendation System
Real time
Analytics
Always context dependant
Performance
Responsiveness
Why NoSQL?
A new challenger for a new world!
There's a huge demand for things other than SQL
Scalability
NoSQL favors new factors
Arrival of Internet and new
Web Application needs
● Large volume of read
and write operations
● Low Latency response
time
● High availability
Flexibility
Cost Control
Availability
Supporting large volume of data: an old objective
New use cases with huge amount of data
Oracle RAC
SQL server
Influence of
Google and Amazon
(adopter of large clusters)
New NoSQL products
Google → BigTable
Amazon → Dynamo
Several actors have already addressed this in the past
NoSQL and the BigData Galaxy
A combination of V
NoSQL: a movement
Driven by a set a common characteristics
Open-sourceNot using a
relational
database
Running well
on clusters
Schemaless
NoSQL: very ill-defined
Not Only SQL
Polyglot Persistence
M.Fowler approach
NoSQL databases types
Key-Value database
Document database
Column Family database
Graph databases
Key-Value database
Are based on distributed hash tables
● 3 operations: set, get, delete
Data in RAM (cache) or persisted in SSD or disk (true db)
A lot of examples: Ehcache, MemcacheD, Redis, Amazon DynamoDB, Riak,
Voldemort, Basho, ...
Document database
A document is a set of ordered key-value pairs
Any document could be different from all previous inserted
documents
⇒ Document databases are designed to accommodate
variations in documents within a collection
Collections are groups of similar documents
Document database
Similar to Key-Value DBs where the Value is semi-structured, it is the
with arbitrary, nested data formats and varying format
Document DBs enable you to query and filter based on elements
Sharding can be based on a field
that is not the key
Secondary indexes on nested columns
Column-oriented database
Row-based systems are designed to efficiently return data for an entire row
Column-oriented systems are more efficient when an aggregate needs to be
computed over many rows but only for a small subset of all columns of data
Examples: BigTable, HBase, Druid
Cassandra is a hybrid between a key-value and a column-oriented database
10:001,12:002,11:003,22:004;
Smith:001,Jones:002,Johnson:003,Jones:004;
Joe:001,Mary:002,Cathy:003,Bob:004;
40000:001,50000:002,44000:003,55000:004;
001:10,Smith,Joe,40000;
002:12,Jones,Mary,50000;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000;
Graph DB
No need to create tables to model many-to-many relations
Instead they are explicitly modeling using edges
Several use cases: Social Graph, Maps use cases, etc
NoSQL avantages
SchemalessScalability
Rich
Content
Cost
Control
Favor Scale-out over Scale-up
With NoSQL, adding server has
often no Impact
NoSQL are designed to utilize
available in a cluster with minimal
intervention by DBA
Scale up Scale out
With RDBMS, adding CPU,
Memory, Processors rises
migration issues or buying a
new server maybe rises
downtime
Scalability
Flexible schema
Schemaless
Denormalization keeps data that is frequently used
together in the document
Embedded
document
All NoSQL DB promote denormalization and that eliminates, or at least reduces,
the need for joins
Improve query performance over more normalized models (Join is a costly
operation)
Denormalization
Schemaless Schemafree
Aggregate Data Model
A more complex structure than a set of tuples
An aggregate is a collection of related objects that we wish to
treat as a unit for data manipulation, management a
consistency
Eric Evant’s DDD
● We can think on term of complex record that allows: List,Map and other data structures
to be nested inside it
● We like to update aggregates with atomic operation
RICH
CONTENT
Aggregate Data Model Example
● The customer contains a list
of billing addresses;
The order contains a list of:
order items,
a shipping address,
and payments
The payment itself contains a
billing address for that
payment
A single address appears 3 times,
but instead of using an id it is
copied each time
We like to communicate with our data storage in terms of aggregates
RICH
CONTENT
Aggregate Models
Different approach of relational data model
● Relation database are don’t have the concept of aggregate (aggregate-ignorant)
● With aggregates, there is often no need for joins
RICH
CONTENT
Aggregate Boundaries
Two aggregates: Customer and Order
Links between aggregates are relationships
Instead of using an id, a same data can be stored several
times (e.g. the address)
We can draw our aggregate differently
//Customer
{
"id": 1,
"name": "Fabio",
"billingAddress": [
{
"city": "Paris"
}
]
}
//Orders
{
"id": 99,
"customerId": 1,
"orderItems": [ ..],
"shippingAddress": [ {"city": "Paris”} ],
"orderPayment": [
"billingAddress": [ {"city": "Paris”} ],
….
]
}
RICH
CONTENT
Aggregates, the trade-off
Solve the
impedance
mismatch
Easier to work
on cluster
(Unit for replication
and sharding)
NoSQL doesn’t
support Atomicity
that spans multiple
aggregate
Not adaptable for
all the needs
(e.g. analyze its product sales
over the last months)
RICH
CONTENT
Aggregate with NoSQL types
Key-Value and Document databases are strongly aggregate-oriented
With key-value DBs
the aggregate is opaque (Blob)
the aggregate can be any type of object
the aggregate is only accessed by the key
With Document DBs, we can see a structure in the aggregate
we define structure on the data
can submit queries based on fields
Aggregate : not a systematic solution
Advanced data denormalization with Redis
NoSQL are often free of cost
COST
CONTROL
The major open source are free
No licence
No politics based on the number of users
No politics depends on the numbers of server
Most companies behind the NoSQL products provide commercial
support, advanced (frequently indispensable) monitoring tools, in
collaboration with SaaS solutions
Sharding & Replication
Sharding (or partitioning depending of the products...)
● Divided into disjoint sets
● To scale out
Replication
● Duplicate the data (on different node)
● To ensure high-availability
Both: each shard is replicated
Sharding: goodness and costliness
We shard data to allow scale out
● Scale up means use a more powerful machine
● Scale out means use more machines
Scale out to increase
● The throughput or the total amount of data or ...
The main cost of sharding is about distributed locks and transactions
● Give up TX and rely on atomic operations on aggregate is a solution to
achieve linear horizontal scalability
Replication: the way to achieve HA
Replication can be
● Synchronous or asynchronous
○ A trade off between performance and consistency
● Master/slaves or peer-to-peer
○ master/slaves is better to implement locks (no-distributed)
○ peer-to-peer is better to HA (no election when a failure occurs)
Main motivations
● Mostly to increase the “High Availability”
Example 1: sharding and primary/slaves replicas
Copy schema from old
commercial presentation
(page 40, CVAT)
Example 2: Sharding and p2p replicas
Cassandra is well suited for write intensive applications
Mainly because each node performs APPENDS on the file systems
Tunable consistency
Focus on Cassandra with P2P architecture
CAP Theorem
Distributed databases cannot have
consistency (C), availability (A) and partition protection (P) at the same time
Consistency: A read is guaranteed to return
the most recent write for a given client
Availability: every request received by a
non-failing node in the system must result in a response
Partition Tolerance: the system continues
to operate despite arbitrary partitioning due to network failures
Also known as the Brewer’s theorem
CAP theorem gotchas
Consistent != global state
There are several definitions of Consistency. It more about linearization: find a point of view
(so an order of events respectful of causality) where the final state is correct
Availability != Vivacity
A failing node do not remove the availability property. But a dead system is not very useful.
Because a read-only system is more convenient, we will prefer “CP” to “CA” for distributed systems.
Networks are not reliable
NoSQL Quorum to the rescue
A quorum is the number of servers that must respond to a read
or write operations for the operation to be considered OK.
A big enough is often required to ensure the wished consistency
Availability & Consistency in Distributed Databases
We often sacrifice Consistency
for Scalability, Availability or Performance
However many enterprise use case needs
(Strong) Consistency
Eventual Consistency
“There may be times when the data is inconsistent”
Eventually consistent means that some replicas might be inconsistent for some period for time
but will become consistent at some point
Two Phase Commit (2PC)
A two-phase commit is a transaction that require writing data to two separate
locations
Help ensure consistency
With 2PC, the DB favors consistency but at the risk of the most recent data not
being available for a brief period of time
While the 2PC is executing, transactions are longer. The updated data is
delayed until the 2PC finishes (the lock takes more time)
Favor Consistency over availability
BASE Transactions for NoSQL
BA
Basically available
S
Soft safe
E
Eventually consistency
BA: There can be partial failure in some parts of the distributed system and the rest of teh
system continues to function
S: It refers to the fact that data may eventually be overwritten with more recent data (this
property overlaps with eventual consistency)
E: There may be times when the database is in an inconsistent state
Schemaless in depth
Schemaless DBs do not require formal structure specification
It doesn’t make sense to require data modelers to specify all possible document
fields prior to building and populating the database
Attention: Schemaless doesn’t mean no schema
Schema is often implicit in the code
Polymorphic Schema
Polymorphic Schema
Derived from Latin and literally means “many shapes”
Each document can have a different structure
Created dynamically when the document is inserted
Which NoSQL database ?
Multiple criteria
- Volume of reads and write (throughput)
- Tolerance for inconsistent data in replicas
- The nature of relations between entities and how that
affects query patterns
- Availability and disaster recovery requirements
- The need for flexibility in data models
- Latency requirement
- Volume of data
Quizz - NoSQL DBs Uses cases
Application that
use JSON data
structure
?
Frequent small
reads and writes
along with simple
data models
?
Caching data from
relational DBs to
improve performance
?
Application that are
geographically
distributed over
multiple data
centers
?
Social networking
?
Additional Key-value DBs Uses cases
Backend support
for websites with
high volumes of
reads and write
Key-Value DBs
Storing large
objects such as
images and audio
files
Key-Value DBs
Tracking transient
attributes in a web
application such as a
shopping cart
Key-Value DBs
Additional Document DBs Uses cases
Application that
use JSON data
structure
Document DBs
Tracking variable
type of metadata
Document DBs
Storing
configuration and
user information for
mobile applications
Document DBs
Additional Column family DBs Uses cases
Application with the
potential for truly large
volumes of data such as
hundreds of terabytes
Colum family
DBs
Applications with
dynamic fields
Colum family
DBs
Additional Graph DBs Uses cases
Network and IT
infrastructure
management
Graph DBs
Recommending
products and
services
Graph DBs
Quizz - NoSQL DBs Uses cases
Application that
use JSON data
structure
Document DBs such
as MongoDB
Frequent small
reads and writes
along with simple
data models
Key-Value DBs
such as Redis
Caching data from
relational DBs to
improve performance
Key-Value DBs
such as Redis
Application that are
geographically
distributed over
multiple data
centers Colum DBs such
as Cassandra
Social networking
GraphDB such
as Neo4j
NewSQL movement
The co-existence between of RDBMS and NoSQL features in the same product
NewSQL s a class of modern RDBMS’s that seek to provide
The same scalable performance of NoSQL systems for read-write workloads
ACID guarantees of a traditional relational database system.
TimeSeries DB
● Consists of sequence of values or events
changing with time
○ Data is recorded at regular intervals
● Very used within Microservices
Architecture and with DDD approaches
● Applications
○ Financial: stock price, inflation
○ Biomedical: blood pressure
○ Meteorological: precipitation
● Already several technologies
○ DruidDB
○ InfluxDB
○ Redis
Treat the database as a Application database
The responsibility for database integrity is put in the service
With application database,
the database is only acceded by a single
application codebase ⇒ a single team /
a single application
Only the team need to know the
database structure
We favor application communication by Web
Services
Give more freedom to choose a database
Polyglot Persistence
Several DBs technologies for a single application
● We use Service wrapping
pattern for each DB
● Developers want different
APIs for different problems
● Most organizations have for
now a mix of data storage
technologies for different
circumstances
Suitable for Microservices Architecture
● Each Service manages its
own data
○ The data consistency
is delegated to the
service
● Each is an independent
functional unit
Conclusion
Four factors favors NoSQL usage: Scalability, Cost, Flexibility and Availability
RDBMS and SQL is going to continue to exist
The solution is likely to be an hybrid of multiple technologies
Always the choice depends on your needs
RDBMS stayed a good choice in many scenarios (strong legacy, critical data,
etc)
We are entering in a world of Polyglot Persistence
Annexe - Reference List Books

Contenu connexe

Tendances

OpenGL Introduction.
OpenGL Introduction.OpenGL Introduction.
OpenGL Introduction.
Girish Ghate
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithm
Mohd Arif
 
2 d transformation
2 d transformation2 d transformation
2 d transformation
Ankit Garg
 

Tendances (20)

Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
 
Data structure , stack , queue
Data structure , stack , queueData structure , stack , queue
Data structure , stack , queue
 
OpenGL Introduction.
OpenGL Introduction.OpenGL Introduction.
OpenGL Introduction.
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithm
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
VB.net
VB.netVB.net
VB.net
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Data Types, Variables, and Constants in C# Programming
Data Types, Variables, and Constants in C# ProgrammingData Types, Variables, and Constants in C# Programming
Data Types, Variables, and Constants in C# Programming
 
Mid point circle algorithm
Mid point circle algorithmMid point circle algorithm
Mid point circle algorithm
 
Logic Circuits Design - "Chapter 1: Digital Systems and Information"
Logic Circuits Design - "Chapter 1: Digital Systems and Information"Logic Circuits Design - "Chapter 1: Digital Systems and Information"
Logic Circuits Design - "Chapter 1: Digital Systems and Information"
 
OpenGL basics
OpenGL basicsOpenGL basics
OpenGL basics
 
Array in c programming
Array in c programmingArray in c programming
Array in c programming
 
2D transformation (Computer Graphics)
2D transformation (Computer Graphics)2D transformation (Computer Graphics)
2D transformation (Computer Graphics)
 
2 d transformation
2 d transformation2 d transformation
2 d transformation
 
1. reason why study spl
1. reason why study spl1. reason why study spl
1. reason why study spl
 
Unit 3 daa
Unit 3 daaUnit 3 daa
Unit 3 daa
 
C# program structure
C# program structureC# program structure
C# program structure
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 
Automata
AutomataAutomata
Automata
 

Similaire à Beyond Relational Databases

Assignment_4
Assignment_4Assignment_4
Assignment_4
Kirti J
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 

Similaire à Beyond Relational Databases (20)

Nosql
NosqlNosql
Nosql
 
Nosql
NosqlNosql
Nosql
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
No sql
No sqlNo sql
No sql
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
No sql – rise of the clusters
No sql – rise of the clustersNo sql – rise of the clusters
No sql – rise of the clusters
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
NOSQL
NOSQLNOSQL
NOSQL
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
No sql database
No sql databaseNo sql database
No sql database
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 

Plus de Gregory Boissinot

JENKINS_OWF11_OSDC_PARIS20110924
JENKINS_OWF11_OSDC_PARIS20110924JENKINS_OWF11_OSDC_PARIS20110924
JENKINS_OWF11_OSDC_PARIS20110924
Gregory Boissinot
 
Jenkins_UserMeetup_Paris_201105
Jenkins_UserMeetup_Paris_201105Jenkins_UserMeetup_Paris_201105
Jenkins_UserMeetup_Paris_201105
Gregory Boissinot
 

Plus de Gregory Boissinot (20)

Practical Software Architecture DDD
Practical Software Architecture DDDPractical Software Architecture DDD
Practical Software Architecture DDD
 
DDD Introduction
DDD IntroductionDDD Introduction
DDD Introduction
 
SOAT Agile Day 2017 DDD
SOAT Agile Day 2017 DDDSOAT Agile Day 2017 DDD
SOAT Agile Day 2017 DDD
 
DevDay2017 ESGI Essential DDD
DevDay2017 ESGI Essential DDDDevDay2017 ESGI Essential DDD
DevDay2017 ESGI Essential DDD
 
Paris Redis Meetup Introduction
Paris Redis Meetup IntroductionParis Redis Meetup Introduction
Paris Redis Meetup Introduction
 
Paris Redis Meetup Starting
Paris Redis Meetup StartingParis Redis Meetup Starting
Paris Redis Meetup Starting
 
PZ_Microservices101_20150210
PZ_Microservices101_20150210PZ_Microservices101_20150210
PZ_Microservices101_20150210
 
Spring Integration JUG SummerCamp 2013
Spring Integration JUG SummerCamp 2013Spring Integration JUG SummerCamp 2013
Spring Integration JUG SummerCamp 2013
 
gradle_nantesjug
gradle_nantesjuggradle_nantesjug
gradle_nantesjug
 
gradle_lavajug
gradle_lavajuggradle_lavajug
gradle_lavajug
 
Jenkins-meetup
Jenkins-meetupJenkins-meetup
Jenkins-meetup
 
JENKINS_BreizhJUG_20111003
JENKINS_BreizhJUG_20111003JENKINS_BreizhJUG_20111003
JENKINS_BreizhJUG_20111003
 
JENKINS_OWF11_OSDC_PARIS20110924
JENKINS_OWF11_OSDC_PARIS20110924JENKINS_OWF11_OSDC_PARIS20110924
JENKINS_OWF11_OSDC_PARIS20110924
 
Gradle_Paris2010
Gradle_Paris2010Gradle_Paris2010
Gradle_Paris2010
 
Gradle_LyonJUG
Gradle_LyonJUGGradle_LyonJUG
Gradle_LyonJUG
 
Gradle_NormandyJUG
Gradle_NormandyJUGGradle_NormandyJUG
Gradle_NormandyJUG
 
Gradle_BreizJUG
Gradle_BreizJUGGradle_BreizJUG
Gradle_BreizJUG
 
Gradle_BordeauxJUG
Gradle_BordeauxJUGGradle_BordeauxJUG
Gradle_BordeauxJUG
 
Gradle_ToulouseJUG
Gradle_ToulouseJUGGradle_ToulouseJUG
Gradle_ToulouseJUG
 
Jenkins_UserMeetup_Paris_201105
Jenkins_UserMeetup_Paris_201105Jenkins_UserMeetup_Paris_201105
Jenkins_UserMeetup_Paris_201105
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Beyond Relational Databases

  • 1. Beyond relational databases NoSQL, NewSQL, TimeSeries DB Grégory BoissinotJanuary 2015 v3
  • 2. Objectives Understand the dominance of relational databases Know the existence of alternative technologies for differing needs Provide you enough background on how NoSQL databases work Make you know the existence of others movements
  • 3. Presentation Content RDBMS Stability Some RDBMS problems Unsuitable use cases with RDBMS NoSQL Why the emergence of this movement? Transactions and scalability issues NoSQL types
  • 4. Relational Databases: already achievement of maturity Files DB Hierarchical DB Network DB Relational DB temps 1970
  • 5. RDBMS (Relational Database Management System) Classic way to store data in the world of enterprise applications Often used for all database needs A powerful tool used for many more decades Providing persistence, concurrency control Accessible from many programming languages Mostly standard Widely understood The degree of standardisation is enough to keep things familiar SQL used as an integration mechanism between applications ACID transactions to modify multiple rows and multiple tables Atomic, Consistent, Isolated, etc Durable
  • 6. RDBMS Schema & Normalization Relational databases require an explicitly defined schema A schema is a specification that describes the structure of an object Data normalization is the process of organizing data into tables in such way to reduce the potential for data anomalies (an inconsistency in the data)
  • 7. Joining process Often the need to read data from multiple tables : a join operation on the data is performed. The join is very easier to use in the SQL syntax As the size of table grows, the join operation take longer as more data blocks need to be read
  • 8. RDMS - A stability for more than more decades Stability of RDBMS Change in langages Change in architectures temps … 1980 Change in platforms Change in processes
  • 9. Some RDBMS Problems SCALE OUT IS HARD (Limited scale) RIGID SCHEMA IMPEDANCE MISMATCH BAD COST CONTROL
  • 10. Relational Model Example Everything is normalized No data is repeated in multiple tables. We have referential integrity RIGID SCHEMA
  • 11. Changing relational database schema is hard Relational model is a set of structured data: tables with tuples and relations A tuple is a limited data structure We can’t use List, Map Can’t nest one tuple within another to get nested records Promote the data normalization No data is duplicated We referential integrity Data are modeled independently from their usage Enable to think on data manipulation as operation that have As input tuples, etc Return tuples RIGID SCHEMA
  • 12. A relational database used as an integration DB Very used in 80’ For a relational database, SQL is used as an integration mechanism between applications ● Simple ● Transactional ● Triggers are available (implementation specific) Shared database integration style
  • 13. Relational databases are not designed to run on clusters But it’s cheaper and more effective to scale horizontally by buying lots of machines. However it requires DBA expertise With relation database, for scaling you have to buy a bigger machine SCALE OUT IS HARD (with RDBMS)
  • 14. Difference between the relational model and the in-memory data structures A lot of application development effort is spent on mapping data between in-memory data structures and a relational databases IMPEDANCE MISMATCH
  • 15. Tentatives for helping to map data OODBMS ORM (JPA, Hibernate, etc) IBatis Spring Data jOOQ IMPEDANCE MISMATCH
  • 16. Often difficult to control cost with relational database BAD COST CONTROL Multiple criterias ● Number of users to access database ● Number of servers ● The volume of the data
  • 17. Unsuitable use cases for RDBMS Unpredictable Data (Accepts entry of any form and size) User or Session data, Log, Sensor Data from IoT Connected Data Social data, Recommendation System Real time Analytics Always context dependant Performance Responsiveness
  • 18. Why NoSQL? A new challenger for a new world! There's a huge demand for things other than SQL
  • 19. Scalability NoSQL favors new factors Arrival of Internet and new Web Application needs ● Large volume of read and write operations ● Low Latency response time ● High availability Flexibility Cost Control Availability
  • 20. Supporting large volume of data: an old objective New use cases with huge amount of data Oracle RAC SQL server Influence of Google and Amazon (adopter of large clusters) New NoSQL products Google → BigTable Amazon → Dynamo Several actors have already addressed this in the past
  • 21. NoSQL and the BigData Galaxy A combination of V
  • 22. NoSQL: a movement Driven by a set a common characteristics Open-sourceNot using a relational database Running well on clusters Schemaless
  • 23. NoSQL: very ill-defined Not Only SQL Polyglot Persistence M.Fowler approach
  • 24. NoSQL databases types Key-Value database Document database Column Family database Graph databases
  • 25. Key-Value database Are based on distributed hash tables ● 3 operations: set, get, delete Data in RAM (cache) or persisted in SSD or disk (true db) A lot of examples: Ehcache, MemcacheD, Redis, Amazon DynamoDB, Riak, Voldemort, Basho, ...
  • 26. Document database A document is a set of ordered key-value pairs Any document could be different from all previous inserted documents ⇒ Document databases are designed to accommodate variations in documents within a collection Collections are groups of similar documents
  • 27. Document database Similar to Key-Value DBs where the Value is semi-structured, it is the with arbitrary, nested data formats and varying format Document DBs enable you to query and filter based on elements Sharding can be based on a field that is not the key Secondary indexes on nested columns
  • 28. Column-oriented database Row-based systems are designed to efficiently return data for an entire row Column-oriented systems are more efficient when an aggregate needs to be computed over many rows but only for a small subset of all columns of data Examples: BigTable, HBase, Druid Cassandra is a hybrid between a key-value and a column-oriented database 10:001,12:002,11:003,22:004; Smith:001,Jones:002,Johnson:003,Jones:004; Joe:001,Mary:002,Cathy:003,Bob:004; 40000:001,50000:002,44000:003,55000:004; 001:10,Smith,Joe,40000; 002:12,Jones,Mary,50000; 003:11,Johnson,Cathy,44000; 004:22,Jones,Bob,55000;
  • 29. Graph DB No need to create tables to model many-to-many relations Instead they are explicitly modeling using edges Several use cases: Social Graph, Maps use cases, etc
  • 31. Favor Scale-out over Scale-up With NoSQL, adding server has often no Impact NoSQL are designed to utilize available in a cluster with minimal intervention by DBA Scale up Scale out With RDBMS, adding CPU, Memory, Processors rises migration issues or buying a new server maybe rises downtime Scalability
  • 32. Flexible schema Schemaless Denormalization keeps data that is frequently used together in the document Embedded document
  • 33. All NoSQL DB promote denormalization and that eliminates, or at least reduces, the need for joins Improve query performance over more normalized models (Join is a costly operation) Denormalization Schemaless Schemafree
  • 34. Aggregate Data Model A more complex structure than a set of tuples An aggregate is a collection of related objects that we wish to treat as a unit for data manipulation, management a consistency Eric Evant’s DDD ● We can think on term of complex record that allows: List,Map and other data structures to be nested inside it ● We like to update aggregates with atomic operation RICH CONTENT
  • 35. Aggregate Data Model Example ● The customer contains a list of billing addresses; The order contains a list of: order items, a shipping address, and payments The payment itself contains a billing address for that payment A single address appears 3 times, but instead of using an id it is copied each time We like to communicate with our data storage in terms of aggregates RICH CONTENT
  • 36. Aggregate Models Different approach of relational data model ● Relation database are don’t have the concept of aggregate (aggregate-ignorant) ● With aggregates, there is often no need for joins RICH CONTENT
  • 37. Aggregate Boundaries Two aggregates: Customer and Order Links between aggregates are relationships Instead of using an id, a same data can be stored several times (e.g. the address) We can draw our aggregate differently //Customer { "id": 1, "name": "Fabio", "billingAddress": [ { "city": "Paris" } ] } //Orders { "id": 99, "customerId": 1, "orderItems": [ ..], "shippingAddress": [ {"city": "Paris”} ], "orderPayment": [ "billingAddress": [ {"city": "Paris”} ], …. ] } RICH CONTENT
  • 38. Aggregates, the trade-off Solve the impedance mismatch Easier to work on cluster (Unit for replication and sharding) NoSQL doesn’t support Atomicity that spans multiple aggregate Not adaptable for all the needs (e.g. analyze its product sales over the last months) RICH CONTENT
  • 39. Aggregate with NoSQL types Key-Value and Document databases are strongly aggregate-oriented With key-value DBs the aggregate is opaque (Blob) the aggregate can be any type of object the aggregate is only accessed by the key With Document DBs, we can see a structure in the aggregate we define structure on the data can submit queries based on fields
  • 40. Aggregate : not a systematic solution Advanced data denormalization with Redis
  • 41. NoSQL are often free of cost COST CONTROL The major open source are free No licence No politics based on the number of users No politics depends on the numbers of server Most companies behind the NoSQL products provide commercial support, advanced (frequently indispensable) monitoring tools, in collaboration with SaaS solutions
  • 42. Sharding & Replication Sharding (or partitioning depending of the products...) ● Divided into disjoint sets ● To scale out Replication ● Duplicate the data (on different node) ● To ensure high-availability Both: each shard is replicated
  • 43. Sharding: goodness and costliness We shard data to allow scale out ● Scale up means use a more powerful machine ● Scale out means use more machines Scale out to increase ● The throughput or the total amount of data or ... The main cost of sharding is about distributed locks and transactions ● Give up TX and rely on atomic operations on aggregate is a solution to achieve linear horizontal scalability
  • 44. Replication: the way to achieve HA Replication can be ● Synchronous or asynchronous ○ A trade off between performance and consistency ● Master/slaves or peer-to-peer ○ master/slaves is better to implement locks (no-distributed) ○ peer-to-peer is better to HA (no election when a failure occurs) Main motivations ● Mostly to increase the “High Availability”
  • 45. Example 1: sharding and primary/slaves replicas Copy schema from old commercial presentation (page 40, CVAT)
  • 46. Example 2: Sharding and p2p replicas
  • 47. Cassandra is well suited for write intensive applications Mainly because each node performs APPENDS on the file systems Tunable consistency Focus on Cassandra with P2P architecture
  • 48. CAP Theorem Distributed databases cannot have consistency (C), availability (A) and partition protection (P) at the same time Consistency: A read is guaranteed to return the most recent write for a given client Availability: every request received by a non-failing node in the system must result in a response Partition Tolerance: the system continues to operate despite arbitrary partitioning due to network failures Also known as the Brewer’s theorem
  • 49. CAP theorem gotchas Consistent != global state There are several definitions of Consistency. It more about linearization: find a point of view (so an order of events respectful of causality) where the final state is correct Availability != Vivacity A failing node do not remove the availability property. But a dead system is not very useful. Because a read-only system is more convenient, we will prefer “CP” to “CA” for distributed systems. Networks are not reliable
  • 50. NoSQL Quorum to the rescue A quorum is the number of servers that must respond to a read or write operations for the operation to be considered OK. A big enough is often required to ensure the wished consistency
  • 51. Availability & Consistency in Distributed Databases We often sacrifice Consistency for Scalability, Availability or Performance However many enterprise use case needs (Strong) Consistency Eventual Consistency “There may be times when the data is inconsistent” Eventually consistent means that some replicas might be inconsistent for some period for time but will become consistent at some point
  • 52. Two Phase Commit (2PC) A two-phase commit is a transaction that require writing data to two separate locations Help ensure consistency With 2PC, the DB favors consistency but at the risk of the most recent data not being available for a brief period of time While the 2PC is executing, transactions are longer. The updated data is delayed until the 2PC finishes (the lock takes more time) Favor Consistency over availability
  • 53. BASE Transactions for NoSQL BA Basically available S Soft safe E Eventually consistency BA: There can be partial failure in some parts of the distributed system and the rest of teh system continues to function S: It refers to the fact that data may eventually be overwritten with more recent data (this property overlaps with eventual consistency) E: There may be times when the database is in an inconsistent state
  • 54. Schemaless in depth Schemaless DBs do not require formal structure specification It doesn’t make sense to require data modelers to specify all possible document fields prior to building and populating the database Attention: Schemaless doesn’t mean no schema Schema is often implicit in the code
  • 55. Polymorphic Schema Polymorphic Schema Derived from Latin and literally means “many shapes” Each document can have a different structure Created dynamically when the document is inserted
  • 56. Which NoSQL database ? Multiple criteria - Volume of reads and write (throughput) - Tolerance for inconsistent data in replicas - The nature of relations between entities and how that affects query patterns - Availability and disaster recovery requirements - The need for flexibility in data models - Latency requirement - Volume of data
  • 57. Quizz - NoSQL DBs Uses cases Application that use JSON data structure ? Frequent small reads and writes along with simple data models ? Caching data from relational DBs to improve performance ? Application that are geographically distributed over multiple data centers ? Social networking ?
  • 58. Additional Key-value DBs Uses cases Backend support for websites with high volumes of reads and write Key-Value DBs Storing large objects such as images and audio files Key-Value DBs Tracking transient attributes in a web application such as a shopping cart Key-Value DBs
  • 59. Additional Document DBs Uses cases Application that use JSON data structure Document DBs Tracking variable type of metadata Document DBs Storing configuration and user information for mobile applications Document DBs
  • 60. Additional Column family DBs Uses cases Application with the potential for truly large volumes of data such as hundreds of terabytes Colum family DBs Applications with dynamic fields Colum family DBs
  • 61. Additional Graph DBs Uses cases Network and IT infrastructure management Graph DBs Recommending products and services Graph DBs
  • 62. Quizz - NoSQL DBs Uses cases Application that use JSON data structure Document DBs such as MongoDB Frequent small reads and writes along with simple data models Key-Value DBs such as Redis Caching data from relational DBs to improve performance Key-Value DBs such as Redis Application that are geographically distributed over multiple data centers Colum DBs such as Cassandra Social networking GraphDB such as Neo4j
  • 63. NewSQL movement The co-existence between of RDBMS and NoSQL features in the same product NewSQL s a class of modern RDBMS’s that seek to provide The same scalable performance of NoSQL systems for read-write workloads ACID guarantees of a traditional relational database system.
  • 64. TimeSeries DB ● Consists of sequence of values or events changing with time ○ Data is recorded at regular intervals ● Very used within Microservices Architecture and with DDD approaches ● Applications ○ Financial: stock price, inflation ○ Biomedical: blood pressure ○ Meteorological: precipitation ● Already several technologies ○ DruidDB ○ InfluxDB ○ Redis
  • 65. Treat the database as a Application database The responsibility for database integrity is put in the service With application database, the database is only acceded by a single application codebase ⇒ a single team / a single application Only the team need to know the database structure We favor application communication by Web Services Give more freedom to choose a database
  • 66. Polyglot Persistence Several DBs technologies for a single application ● We use Service wrapping pattern for each DB ● Developers want different APIs for different problems ● Most organizations have for now a mix of data storage technologies for different circumstances
  • 67. Suitable for Microservices Architecture ● Each Service manages its own data ○ The data consistency is delegated to the service ● Each is an independent functional unit
  • 68. Conclusion Four factors favors NoSQL usage: Scalability, Cost, Flexibility and Availability RDBMS and SQL is going to continue to exist The solution is likely to be an hybrid of multiple technologies Always the choice depends on your needs RDBMS stayed a good choice in many scenarios (strong legacy, critical data, etc) We are entering in a world of Polyglot Persistence
  • 69. Annexe - Reference List Books