SlideShare une entreprise Scribd logo
1  sur  44
DOTNETMÁLAGA // MalagaMakers // 5th Nov 2015
• Relational vs. NoSQL
• Definitions and examples
• Other database classifications
• 9 Databases in 40 minutes!
• Polyglot Persistence
• Some statistics
• Summary
What is NoSQL?
SQL
Commercial example: Oracle | OS example: (Oracle) MySQL
NoSQL
“Mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.”
“Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally
scalable.”
NoSQL systems are also sometimes called "Not only SQL".
SQL? ACID? Relations? Distributed?
Commercial example: DynamoDB | OS example: MongoDB
NewSQL
Modern relational database management systems that seek to provide the same
scalable performance of NoSQL systems for online transaction processing (OLTP)
read-write workloads while still maintaining the ACID guarantees of a traditional
database system.
OS example: VoltDB
Y
A
X B
NoSQL vs. SQL vs. NewSQL
Wikipedia
No-sql.org
More Database classifications
On premises vs. Cloud “As a service” (Azure DocumentDB)
Memory / Disk vs. Only in memory (OrigoDB, Redis, SQL Server)
OLTP vs. OLAP
Databases vs. Not a database but a data store (Zookeeper, Kafka)
CAP classifications
And more…
In action…
Key-value stores (Redis)
Document stores (RavenDB …ok, MongoDB)
Wide column stores (Cassandra)
Graph DBMS (Neo4j)
Search engines (Elastic Search)
Time Series DBMS (InfluxDB)
Event Stores (Event Store)
MultiModel (OrientDB)
Relational DBMS (MS SQL Server 2016)
Use cases…
Show latest items
Count items
Leaderboards
Unique items
Pub/Sub
Queues
Cache
As the main database
Key Value
Some C# code
Use cases…
Log data
Product catalog
Metadata / asset management
CMS
Prototyping
As the main database
Document Store
Some Javascript (Meteor) code…
Use cases…
Time series analytics
Huge # writes
As the main database
(for big data storage!)
Wide Column
Some CQL + C# code…
CQL vs. Internal structure (Cassandra CLI)
cqlsh:test> SELECT * FROM tweets;
user | time | lat | long | tweet
--------------+--------------------------+--------+---------+---------------------
softwaredoug | 2013-07-13 08:21:54-0400 | 38.162 | -78.549 | Having chest pain.
softwaredoug | 2013-07-21 12:15:27-0400 | 38.093 | -78.573 | Speedo self shot.
jnbrymn | 2013-06-29 20:53:15-0400 | 38.092 | -78.453 | I like programming.
jnbrymn | 2013-07-14 22:55:45-0400 | 38.073 | -78.659 | Who likes cats?
jnbrymn | 2013-07-24 06:23:54-0400 | 38.073 | -78.647 | My coffee is cold.
[default@test] list tweets;
-------------------
RowKey: softwaredoug
=> (column=2013-07-13 08:21:54-0400:, value=,
timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:lat, value=4218a5e3,
timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:long, value=c29d1917,
timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:tweet,
value=486176696e67206368657374207061696e2e, timestamp=1374673155373000)
=> (column=2013-07-21 12:15:27-0400:, value=,
timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:lat, value=42185f3b,
timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:long, value=c29d2560,
timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:tweet,
value=53706565646f2073656c662073686f742e, timestamp=1374673155407000)
-------------------
RowKey: jnbrymn
=> (column=2013-06-29 20:53:15-0400:, value=,
timestamp=1374673155419000)
=> (column=2013-06-29 20:53:15-0400:lat, value=42185e35,
timestamp=1374673155419000)
=> (column=2013-06-29 20:53:15-0400:long, value=c29ce7f0,
timestamp=1374673155419000)
=> (column=2013-06-29 20:53:15-0400:tweet,
value=49206c696b652070726f6772616d6d696e672e,
timestamp=1374673155419000)
=> (column=2013-07-14 22:55:45-0400:, value=,
timestamp=1374673155434000)
=> (column=2013-07-14 22:55:45-0400:lat, value=42184ac1,
timestamp=1374673155434000)
=> (column=2013-07-14 22:55:45-0400:long, value=c29d5168,
timestamp=1374673155434000)
=> (column=2013-07-14 22:55:45-0400:tweet,
value=57686f206c696b657320636174733f, timestamp=1374673155434000)
=> (column=2013-07-24 06:23:54-0400:, value=,
timestamp=1374673155485000)
user – partition key time – clustering key
Use cases…
General data management
Network and IT operations
Recommendation engines
Fraud detection
Social networks
Graph DBs
Just a few slides remaining…
Some C# code…
Some C# code… log4net + ElasticSearch + Kibana
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"LogEvent": {
"properties": {
"timeStamp": {
"type": "date",
"format": "dateOptionalTime"
},
"message": {
"type": "string"
},
"messageObject": {
"type": "object"
},
"exception": {
"type": "object"
},
….
2 ElasticSearch general purpose libraries for .Net:
• Nest – High level
• ElasticSearch.Net – Low level
C# + InfluxDB + Grafana + … IoT?
InfluxDB + Grafana <> ElasticSearch + Kibana
Time series (metrics) <> Structured data, e.g. logs
CQRS
https://msdn.microsoft.com/en-us/library/jj591559.aspx
CQRS…
WITH an ORM WITH Event Store
https://msdn.microsoft.com/en-us/library/jj591559.aspx
Too good to be true…?
http://orientdb.com/why-orientdb/
The Beast 
• SQL and NoSQL (JSON support)
• In-Memory tables
• Row level security
• Always Encrypted
• Query Store
• Polybase  Hadoop / Azure blob storage
Polyglot persistence
Any decent sized enterprise will have a variety of
different data storage technologies for different
kinds of data
before…
https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin
after…
https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin
Some stats
(from DB-Engines.com)
Key Takeaways
Always think about the schema
(even with schema less DBs)
Best DB? “It depends”
• Prototyping?
• Domain?
• How the data is going to be used?
Most of us don’t work with “big data” but “small or medium”
DOTNETMÁLAGA
MálagaMakers
Docker images used
spotify/cassandra
balsamiq/docker-elasticsearch
balsamiq/docker-kibana
tutum/influxdb
neo4j/neo4j
wkruse/eventstore
redis
Resources
Different DB images: https://www.thoughtworks.com/insights/blog/nosql-databases-
overview
Polyglot persistence images: http://www.slideshare.net/mongodb/webinar-
mongodb-and-polyglot-persistence-architecture
DATABASE NAME AVAILABLE FOR WINDOWS?
Redis Yes (C)
MongoDB Yes (C++)
Cassandra Yes (Java)
Neo4j Yes (Java)
ElasticSearch Yes (Java)
InfluxDB Yes (Go)
EventStore Yes
OrientDB Yes (Java)
SQL Server Yes (C++)

Contenu connexe

Tendances

The hardest part of microservices: your data
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your data
Christian Posta
 

Tendances (20)

Effective cloud-ready apps with MicroProfile
Effective cloud-ready apps with MicroProfileEffective cloud-ready apps with MicroProfile
Effective cloud-ready apps with MicroProfile
 
Gradual migration to MicroProfile
Gradual migration to MicroProfileGradual migration to MicroProfile
Gradual migration to MicroProfile
 
Node and Micro-Services at IBM
Node and Micro-Services at IBMNode and Micro-Services at IBM
Node and Micro-Services at IBM
 
MicroServices on Azure
MicroServices on AzureMicroServices on Azure
MicroServices on Azure
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
Container Patterns
Container PatternsContainer Patterns
Container Patterns
 
SOA to Microservices
SOA to MicroservicesSOA to Microservices
SOA to Microservices
 
Introduction to Micronaut - JBCNConf 2019
Introduction to Micronaut - JBCNConf 2019Introduction to Micronaut - JBCNConf 2019
Introduction to Micronaut - JBCNConf 2019
 
Monitor Micro-service with MicroProfile metrics
Monitor Micro-service with MicroProfile metricsMonitor Micro-service with MicroProfile metrics
Monitor Micro-service with MicroProfile metrics
 
JavaCro'15 - Service Discovery in OSGi Beyond the JVM using Docker and Consul...
JavaCro'15 - Service Discovery in OSGi Beyond the JVM using Docker and Consul...JavaCro'15 - Service Discovery in OSGi Beyond the JVM using Docker and Consul...
JavaCro'15 - Service Discovery in OSGi Beyond the JVM using Docker and Consul...
 
Microservices with Spring Cloud, Netflix OSS and Kubernetes
Microservices with Spring Cloud, Netflix OSS and Kubernetes Microservices with Spring Cloud, Netflix OSS and Kubernetes
Microservices with Spring Cloud, Netflix OSS and Kubernetes
 
Architecting Microservices in .Net
Architecting Microservices in .NetArchitecting Microservices in .Net
Architecting Microservices in .Net
 
Serverless: The future of application delivery
Serverless: The future of application deliveryServerless: The future of application delivery
Serverless: The future of application delivery
 
Micronaut Deep Dive - Codeone 2019
Micronaut Deep Dive - Codeone 2019Micronaut Deep Dive - Codeone 2019
Micronaut Deep Dive - Codeone 2019
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
 
An evolution of application networking: service mesh
An evolution of application networking: service meshAn evolution of application networking: service mesh
An evolution of application networking: service mesh
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
 
Asynchronous Microservices in nodejs
Asynchronous Microservices in nodejsAsynchronous Microservices in nodejs
Asynchronous Microservices in nodejs
 
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For ScalaScala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
 
The hardest part of microservices: your data
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your data
 

Similaire à Data stores: beyond relational databases

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 

Similaire à Data stores: beyond relational databases (20)

Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Utah Codecamp Cloud Computing
Utah Codecamp Cloud ComputingUtah Codecamp Cloud Computing
Utah Codecamp Cloud Computing
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Dernier (20)

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

Data stores: beyond relational databases

  • 2. • Relational vs. NoSQL • Definitions and examples • Other database classifications • 9 Databases in 40 minutes! • Polyglot Persistence • Some statistics • Summary
  • 4. SQL Commercial example: Oracle | OS example: (Oracle) MySQL NoSQL “Mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.” “Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.” NoSQL systems are also sometimes called "Not only SQL". SQL? ACID? Relations? Distributed? Commercial example: DynamoDB | OS example: MongoDB NewSQL Modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system. OS example: VoltDB Y A X B NoSQL vs. SQL vs. NewSQL Wikipedia No-sql.org
  • 5. More Database classifications On premises vs. Cloud “As a service” (Azure DocumentDB) Memory / Disk vs. Only in memory (OrigoDB, Redis, SQL Server) OLTP vs. OLAP Databases vs. Not a database but a data store (Zookeeper, Kafka) CAP classifications
  • 7. In action… Key-value stores (Redis) Document stores (RavenDB …ok, MongoDB) Wide column stores (Cassandra) Graph DBMS (Neo4j) Search engines (Elastic Search) Time Series DBMS (InfluxDB) Event Stores (Event Store) MultiModel (OrientDB) Relational DBMS (MS SQL Server 2016)
  • 8.
  • 9. Use cases… Show latest items Count items Leaderboards Unique items Pub/Sub Queues Cache As the main database Key Value
  • 11.
  • 12.
  • 13. Use cases… Log data Product catalog Metadata / asset management CMS Prototyping As the main database Document Store
  • 15.
  • 16.
  • 17. Use cases… Time series analytics Huge # writes As the main database (for big data storage!) Wide Column
  • 18. Some CQL + C# code…
  • 19. CQL vs. Internal structure (Cassandra CLI) cqlsh:test> SELECT * FROM tweets; user | time | lat | long | tweet --------------+--------------------------+--------+---------+--------------------- softwaredoug | 2013-07-13 08:21:54-0400 | 38.162 | -78.549 | Having chest pain. softwaredoug | 2013-07-21 12:15:27-0400 | 38.093 | -78.573 | Speedo self shot. jnbrymn | 2013-06-29 20:53:15-0400 | 38.092 | -78.453 | I like programming. jnbrymn | 2013-07-14 22:55:45-0400 | 38.073 | -78.659 | Who likes cats? jnbrymn | 2013-07-24 06:23:54-0400 | 38.073 | -78.647 | My coffee is cold. [default@test] list tweets; ------------------- RowKey: softwaredoug => (column=2013-07-13 08:21:54-0400:, value=, timestamp=1374673155373000) => (column=2013-07-13 08:21:54-0400:lat, value=4218a5e3, timestamp=1374673155373000) => (column=2013-07-13 08:21:54-0400:long, value=c29d1917, timestamp=1374673155373000) => (column=2013-07-13 08:21:54-0400:tweet, value=486176696e67206368657374207061696e2e, timestamp=1374673155373000) => (column=2013-07-21 12:15:27-0400:, value=, timestamp=1374673155407000) => (column=2013-07-21 12:15:27-0400:lat, value=42185f3b, timestamp=1374673155407000) => (column=2013-07-21 12:15:27-0400:long, value=c29d2560, timestamp=1374673155407000) => (column=2013-07-21 12:15:27-0400:tweet, value=53706565646f2073656c662073686f742e, timestamp=1374673155407000) ------------------- RowKey: jnbrymn => (column=2013-06-29 20:53:15-0400:, value=, timestamp=1374673155419000) => (column=2013-06-29 20:53:15-0400:lat, value=42185e35, timestamp=1374673155419000) => (column=2013-06-29 20:53:15-0400:long, value=c29ce7f0, timestamp=1374673155419000) => (column=2013-06-29 20:53:15-0400:tweet, value=49206c696b652070726f6772616d6d696e672e, timestamp=1374673155419000) => (column=2013-07-14 22:55:45-0400:, value=, timestamp=1374673155434000) => (column=2013-07-14 22:55:45-0400:lat, value=42184ac1, timestamp=1374673155434000) => (column=2013-07-14 22:55:45-0400:long, value=c29d5168, timestamp=1374673155434000) => (column=2013-07-14 22:55:45-0400:tweet, value=57686f206c696b657320636174733f, timestamp=1374673155434000) => (column=2013-07-24 06:23:54-0400:, value=, timestamp=1374673155485000) user – partition key time – clustering key
  • 20.
  • 21.
  • 22. Use cases… General data management Network and IT operations Recommendation engines Fraud detection Social networks Graph DBs Just a few slides remaining…
  • 24.
  • 25.
  • 26. Some C# code… log4net + ElasticSearch + Kibana { "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0 } }, "mappings": { "LogEvent": { "properties": { "timeStamp": { "type": "date", "format": "dateOptionalTime" }, "message": { "type": "string" }, "messageObject": { "type": "object" }, "exception": { "type": "object" }, …. 2 ElasticSearch general purpose libraries for .Net: • Nest – High level • ElasticSearch.Net – Low level
  • 27.
  • 28.
  • 29. C# + InfluxDB + Grafana + … IoT? InfluxDB + Grafana <> ElasticSearch + Kibana Time series (metrics) <> Structured data, e.g. logs
  • 30.
  • 32. CQRS… WITH an ORM WITH Event Store https://msdn.microsoft.com/en-us/library/jj591559.aspx
  • 33.
  • 34. Too good to be true…? http://orientdb.com/why-orientdb/
  • 35.
  • 36. The Beast  • SQL and NoSQL (JSON support) • In-Memory tables • Row level security • Always Encrypted • Query Store • Polybase  Hadoop / Azure blob storage
  • 37. Polyglot persistence Any decent sized enterprise will have a variety of different data storage technologies for different kinds of data
  • 41. Key Takeaways Always think about the schema (even with schema less DBs) Best DB? “It depends” • Prototyping? • Domain? • How the data is going to be used? Most of us don’t work with “big data” but “small or medium”
  • 44. Resources Different DB images: https://www.thoughtworks.com/insights/blog/nosql-databases- overview Polyglot persistence images: http://www.slideshare.net/mongodb/webinar- mongodb-and-polyglot-persistence-architecture DATABASE NAME AVAILABLE FOR WINDOWS? Redis Yes (C) MongoDB Yes (C++) Cassandra Yes (Java) Neo4j Yes (Java) ElasticSearch Yes (Java) InfluxDB Yes (Go) EventStore Yes OrientDB Yes (Java) SQL Server Yes (C++)

Notes de l'éditeur

  1. ·         Welcome ·         About me & Sequel Business Solutions ·         Thanks MalagaMakers & dotnetMalaga
  2. We only have 1 hour, and we have a lot to talk about Start with a brief review of several key concepts we are going to work with o   Difference between the “trending” NoSQL movement and the old school SQL o   Review 9 different databases (this talk could have been called 9 databases in 1 hour but I didn’t know how many I was going to review when I setup the meeting inmeetup.com) o   Docker ·         Briefly talk about Polyglot persistence ·         Statistics about databases ·         Caveats o   Not exhaustive o   Tools / Info on how to choose o   Don´t ask me “what is the difference between X and Y”
  3. ·         How many developers are in this room? ·         What is NoSQL? ·         A few ambiguous definitions: o   Not using the relational model o   Running well on clusters o   Mostly open-source o   Built for the 21st century web estates o   Schema-less ·         We can always find an example of NoSQL database that violates theses sentences
  4. ·         I have tried to summarise what NoSQL is in this slide by copying definitions from the internet ·         Not sure if it is clear yet… ·         Notice there are even a new generation of databases called NewSQL
  5. ·         The databases can be stored in your local PC/Cluster or they can be in the cloud, as SASS. An example of a cloud database is Azure DocumentDB where you can use Javascript directly inside the database engine. You can scale storage and throughput linearly with cost via combinable units as our application grows, or tune consistency via levels (strong, session, eventual) to suit application scenarios. All via the Azure web management console - SLIDER ·         There are circumstances in which you might prefer to store data in memory, no need to save that to disk. The in memory database can be restored from disk on startup. Microsoft SQL Server allows you to have some tables in memory so it is amazingly fast. – CONTENTION ·         OLTP is online transaction processing. This is what we usually we when users read / write to our database through the application. OLAP is the process is online analytical processing, to perform multidimensional analysis of business data, for complex calculations, etc. ·         There are some “data stores” that cannot be considered databases but can help with the management of data. A couple of examples are zookeeper to keep data in distributed environments, usually for synchronization. Kafka is a pub/sub messaging system but it works like a transaction log. These 2 work as a base for other data storage systems. ·         CAP is a classification where consistency, availability and partition tolerance are the 3 angles in a triangle and we can only have 2. Some authors say that the concepts can be very strict and are no longer relevant.
  6. ·         I’ve just put this here to show you how Microsoft scores at the Gartner Magic Quadrant, although this means probably nothing to you.
  7. ·         These are the databases we are going to cover. How many of those sound familiar? ·         I will try to do a very quick demo with some aspect of each database so you can get at least “a feeling” on what those DBs are about. I was going to use RavenDB as the example but then this presentation would be too “Microsoft” 
  8. ·         Key Value are those databases in which we will index everything based on a particular key, so in theory you cannot have 2 keys with the same value. ·         Redis in particular is used in a lot of big companies, and is particularly useful to speed up legacy applications for example. ·         Redis is called a data structure server, as the “value” bit can be a list, map, string, binary, etc. ·         Redis is fast
  9. Redis is called the data structure server Key-value stores are the simplest NoSQL data stores to use from an API perspective. The client can either get the value for the key, put a value for a key, or delete a key from the data store. The value is a blob that the data store just stores, without caring or knowing what's inside; it's the responsibility of the application to understand what was stored. Since key-value stores always use primary-key access, they generally have great performance and can be easily scaled. Some of the popular key-value databases are Riak, Redis (often referred to as Data Structure server), Memcached and its flavors, Berkeley DB, HamsterDB (especially suited for embedded use), Amazon DynamoDB (not open-source), Project Voldemort and Couchbase. All key-value databases are not the same, there are major differences between these products, for example: Memcached data is not persistent while in Riak it is, these features are important when implementing certain solutions. Lets consider we need to implement caching of user preferences, implementing them in memcached means when the node goes down all the data is lost and needs to be refreshed from source system, if we store the same data in Riak we may not need to worry about losing data but we must also consider how to update stale data. Its important to not only choose a key-value database based on your requirements, it's also important to choose which key-value database.
  10. DEMO – Redis PUB SUB   ·         Redis Desktop ·         Run 2 instances This is a bit of code in C#: Store a string, TTL,  JSON, You can also store lists, sets, etc. and do operations with them
  11. ·         This was the first well known NoSQL database ·         Most used document database
  12. ·         The documents have an ID ·         Documents contain documents, are similar to each other but do not have to be exactly the same. ·         Documents contain references to other documents (but there are no joins) Documents are the main concept in document databases. The database stores and retrieves documents, which can be XML, JSON, BSON, and so on. These documents are self-describing, hierarchical tree data structures which can consist of maps, collections, and scalar values. The documents stored are similar to each other but do not have to be exactly the same. Document databases store documents in the value part of the key-value store; think about document databases as key-value stores where the value is examinable. Document databases such as MongoDB provide a rich query language and constructs such as database, indexes etc allowing for easier transition from relational databases. Some of the popular document databases we have seen are MongoDB, CouchDB , Terrastore, OrientDB, RavenDB, and of course the well-known and often reviled Lotus Notes that uses document storage.
  13. DEMO ·         Meteor ·         MongoDB visualizer ·         For this example I am going to use Meteor, a framework to create web applications based on NodeJS and MongDB. The whole thing is fully integrated and even some code is shared between client and server. Meteor is reactive on its foundations, things like changing things on 1 session are immediately reflected in other sessions.
  14. ·         Wide Column databases are those that can handle millions of columns without any trouble. ·         Cassandra in particular is known by its high availability via clustering and performance, reading and writing. ·         It is being used by monsters like ebay, Spotify and Netflix ·         Netflix – 50 clusters, 750 nodes, in AWS. Nearly all film metadata is there, user ratings, recommendations ·         Spotify – Playlist storage, like a version control system, more than 1 billion playlists, > 40k requests per second, concurrent changes
  15. 1) Why should I choose C* ?   a. linear scalability, throughputs scale "almost" linearly with number of nodes   b. almost unbounded extensivity (there is no limit, or at least huge limit in term of number of nodes you can have on a cluster)   c. operational simplicity due to master-less architecture. This feature is, although quite transparent for developers, is a key selling point. Having suffered when installing manually a Hadoop cluster, I happen to love the deployment simplicity of C*, only one process per node, no moving parts. d. high availability. C* trades consistency for availability clearly so you can expect to have something like 99.99% of uptime. Very selling point for critical business which need to be up all the time e. support for multi data centers out of the box. Again, on the operational side, it's a great feature if you plan a worldwide deployment That's all I can see for now 2) Why shouldn't I choose C* ? a. need for a strong consistency most of the time. Although you can perform all requests with Consistency level ALL, it's clearly not the best use of C*. You'll suffer for higher latency and reduced availability. Even the new "lightweight transaction" feature is not meant to be use on large scale b. very complicated and changing queries. Denormalizing is great when you know ahead of time exactly how you'll query your data. Once done, any new way of querying will require new coding & new tables to support it c. ridiculous data load. I've seen people in prod using C* for only 200Gb because they want to be trendy and use bleeding edge technologies. They'd better off using a classical RDBMS solution that fit perfectly their load
  16. the main principle in designing the table is not the relationship of the table to other tables, as it is in relational database modeling. Data in Cassandra is often arranged as one query per table, and data is repeated amongst many tables ·         CQL exposes a Cassandra DB in a very similar way to SQL (but there are no joins) ·         Sets, lists, maps – we can easily store denormalised data in a row ·         Speed up reads by writing in several places – in the same way an index is automatically maintained by the database, we are responsible of maintaining all the column families (tables) in sync
  17. ·         Just wanted to show you how the rows in CQL are actually stored as columns in Cassandra ·         Partition key + Clustering key
  18. ·         Graph databases are very trendy lately, and Neo4j is one of the most famous ones. ·         They are used by companies like meetic and infojobs (to store relationships)
  19. ·         We have nodes with properties and then we have the relationships between nodes. ·         Use cases
  20. DEMO
  21. ·         This is a very specific databases as even it can be used as the single database in an application it main objective is to perform searches, and that’s how it is used in those 2 companies. We usually feed elasticsearch with another data source, like for example setting up replication between couchDB (documental) and elastic search.
  22. ·         I am going to show you another example and is the analysis of logs for an application. There are commercial cloud solutions like Raygun.io where you can log analyse anything but you can get something similar in house in .Net with the elasticsearch logger for log4net (similar to log4j). ·         We can then visualise the results with Kibana, which is a generic data visualizer for ElasticSearch.
  23. ·         Time series databases are used by companies like SoundCloud or Digital Ocean (hosting provider). ·         They store a series of data entries that change with time, for example, temperature from sensors. ·         Particularly good at answering queries based on time windows, etc. ·         They can be easily replaced by a database like PostgreSQL if the size of your data is not HUGE. There is one on top of Cassandra called OpenTSDB.
  24. ·         In the same way Kibana is there for ElasticSearch, for a few time series databases (not just InfluxDB) we have Grafana.
  25. ·         This is another very specific database as it cannot be used as the main database and is not a general purpose database. ·         Its mission is to store events in an event sourcing application. ·         If we have a bank account there are 2 ways to store information about that bank account: o   Store the amount of money you have o   Store the movements that lead to that final scenario – This is what these DBs are for (DEPOSIT / WITHDRAWAL)
  26.    Event sourcing is typically used in CQRS pattern, when you separate reads and writes in two different systems / objects / repositories.
  27.   CQRS can be done with or without event sourcing. Event sourcing database does not contain a DELETE action.
  28. ·         We have seen a few different DBs, each one is strong in a particular area but… what is stopping DB creators from mixing characteristics from different databases? ·         For example OrientDB is graph database AND a document database.
  29. ·         On paper everything looks great, I have read it is a bit buggy. ·         I don’t have any demos (sorry)
  30. ·         I didn’t want to leave this without quickly going through the existing relational databases
  31. https://redmondmag.com/articles/2015/06/03/features-to-sql-server-2016.aspx
  32. ·         Who is the guy at the top left? Martin Fowler ·         We have seen a few databases in the last hour, but what we’ll usually end up with if our system is big enough is not with just 1 but a combination of databases, to be able to get the most of it.
  33. ·         LinkedIn is an example where we can see graph databases, relational databases, etc. ·         Need to be careful as the system might become messy
  34. ·         There are things like Kafka to help organising the mess, here is used as the main pipe to move data that will end in a data store.
  35. ·         These are some graphs taken from a web site that show o   How popular each database (and database type) is o   How “trendy” each database type is A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples[1] through semantic queries. A triple is a data entity composed of subject-predicate-object, like "Bob is 35" or "Bob knows Fred". Much like a relational database, one stores information in a triplestore and retrieves it via a query language. Unlike a relational database, a triplestore is optimized for the storage and retrieval of triples. In addition to queries, triples can usually be imported/exported using Resource Description Framework (RDF) and other formats.
  36. Relational – think about schema NoSQL – think about schema! ·         What is going to happen in a few years when a new developer joins the company and has to maintain the application? There is not a best database (unless we have created it ourselves from scratch J), and it all depends on what we want to do o   Domain (what the business is about) o   How we want to extract the data o   Do we know everything? For example Event Stores are very good in terms of “new ways to explore the data” as we are able to rebuild different things from the event series Lastly remember hardware is much more powerful than we think, most of us are probably working with small or medium data rather than “big data”