Data stores: beyond relational databases

DOTNETMÁLAGA // MalagaMakers // 5th Nov 2015

• Relational vs. NoSQL
• Definitions and examples
• Other database classifications
• 9 Databases in 40 minutes!
• Polyglot Persistence
• Some statistics
• Summary

SQL
Commercial example: Oracle | OS example: (Oracle) MySQL
NoSQL
“Mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.”
“Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally
scalable.”
NoSQL systems are also sometimes called "Not only SQL".
SQL? ACID? Relations? Distributed?
Commercial example: DynamoDB | OS example: MongoDB
NewSQL
Modern relational database management systems that seek to provide the same
scalable performance of NoSQL systems for online transaction processing (OLTP)
read-write workloads while still maintaining the ACID guarantees of a traditional
database system.
OS example: VoltDB
Y
A
X B
NoSQL vs. SQL vs. NewSQL
Wikipedia
No-sql.org

More Database classifications
On premises vs. Cloud “As a service” (Azure DocumentDB)
Memory / Disk vs. Only in memory (OrigoDB, Redis, SQL Server)
OLTP vs. OLAP
Databases vs. Not a database but a data store (Zookeeper, Kafka)
CAP classifications

In action…
Key-value stores (Redis)
Document stores (RavenDB …ok, MongoDB)
Wide column stores (Cassandra)
Graph DBMS (Neo4j)
Search engines (Elastic Search)
Time Series DBMS (InfluxDB)
Event Stores (Event Store)
MultiModel (OrientDB)
Relational DBMS (MS SQL Server 2016)

Use cases…
Show latest items
Count items
Leaderboards
Unique items
Pub/Sub
Queues
Cache
As the main database
Key Value

Use cases…
Log data
Product catalog
Metadata / asset management
CMS
Prototyping
Document Store

Some Javascript (Meteor) code…

Use cases…
Time series analytics
Huge # writes
(for big data storage!)
Wide Column

CQL vs. Internal structure (Cassandra CLI)
cqlsh:test> SELECT * FROM tweets;
user | time | lat | long | tweet
--------------+--------------------------+--------+---------+---------------------
softwaredoug | 2013-07-13 08:21:54-0400 | 38.162 | -78.549 | Having chest pain.
softwaredoug | 2013-07-21 12:15:27-0400 | 38.093 | -78.573 | Speedo self shot.
jnbrymn | 2013-06-29 20:53:15-0400 | 38.092 | -78.453 | I like programming.
jnbrymn | 2013-07-14 22:55:45-0400 | 38.073 | -78.659 | Who likes cats?
jnbrymn | 2013-07-24 06:23:54-0400 | 38.073 | -78.647 | My coffee is cold.
[default@test] list tweets;
-------------------
RowKey: softwaredoug
=> (column=2013-07-13 08:21:54-0400:, value=,
timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:lat, value=4218a5e3,
timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:long, value=c29d1917,
timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:tweet,
value=486176696e67206368657374207061696e2e, timestamp=1374673155373000)
=> (column=2013-07-21 12:15:27-0400:, value=,
timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:lat, value=42185f3b,
timestamp=1374673155407000)
timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:tweet,
value=53706565646f2073656c662073686f742e, timestamp=1374673155407000)
-------------------
RowKey: jnbrymn
=> (column=2013-06-29 20:53:15-0400:, value=,
timestamp=1374673155419000)
=> (column=2013-06-29 20:53:15-0400:lat, value=42185e35,
timestamp=1374673155419000)
=> (column=2013-06-29 20:53:15-0400:long, value=c29ce7f0,
timestamp=1374673155419000)
=> (column=2013-06-29 20:53:15-0400:tweet,
value=49206c696b652070726f6772616d6d696e672e,
timestamp=1374673155419000)
=> (column=2013-07-14 22:55:45-0400:, value=,
timestamp=1374673155434000)
=> (column=2013-07-14 22:55:45-0400:lat, value=42184ac1,
timestamp=1374673155434000)
timestamp=1374673155434000)
=> (column=2013-07-14 22:55:45-0400:tweet,
value=57686f206c696b657320636174733f, timestamp=1374673155434000)
=> (column=2013-07-24 06:23:54-0400:, value=,
timestamp=1374673155485000)
user – partition key time – clustering key

Use cases…
General data management
Network and IT operations
Recommendation engines
Fraud detection
Social networks
Graph DBs
Just a few slides remaining…

Some C# code… log4net + ElasticSearch + Kibana
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"LogEvent": {
"properties": {
"timeStamp": {
"type": "date",
"format": "dateOptionalTime"
},
"message": {
"type": "string"
},
"messageObject": {
"type": "object"
},
"exception": {
"type": "object"
},
….
2 ElasticSearch general purpose libraries for .Net:
• Nest – High level
• ElasticSearch.Net – Low level

C# + InfluxDB + Grafana + … IoT?
InfluxDB + Grafana <> ElasticSearch + Kibana
Time series (metrics) <> Structured data, e.g. logs

CQRS
https://msdn.microsoft.com/en-us/library/jj591559.aspx

CQRS…
WITH an ORM WITH Event Store
https://msdn.microsoft.com/en-us/library/jj591559.aspx

Too good to be true…?
http://orientdb.com/why-orientdb/

The Beast 
• SQL and NoSQL (JSON support)
• In-Memory tables
• Row level security
• Always Encrypted
• Query Store
• Polybase  Hadoop / Azure blob storage

Polyglot persistence
Any decent sized enterprise will have a variety of
different data storage technologies for different
kinds of data

before…
https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin

after…
https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin

Some stats
(from DB-Engines.com)

Key Takeaways
Always think about the schema
(even with schema less DBs)
Best DB? “It depends”
• Prototyping?
• Domain?
• How the data is going to be used?
Most of us don’t work with “big data” but “small or medium”

Docker images used
spotify/cassandra
balsamiq/docker-elasticsearch
balsamiq/docker-kibana
tutum/influxdb
neo4j/neo4j
wkruse/eventstore
redis

Resources
Different DB images: https://www.thoughtworks.com/insights/blog/nosql-databases-
overview
Polyglot persistence images: http://www.slideshare.net/mongodb/webinar-
mongodb-and-polyglot-persistence-architecture
DATABASE NAME AVAILABLE FOR WINDOWS?
Redis Yes (C)
MongoDB Yes (C++)
Cassandra Yes (Java)
Neo4j Yes (Java)
ElasticSearch Yes (Java)
InfluxDB Yes (Go)
EventStore Yes
OrientDB Yes (Java)
SQL Server Yes (C++)

Data stores: beyond relational databases

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Data stores: beyond relational databases

Similaire à Data stores: beyond relational databases (20)

Dernier

Dernier (20)

Data stores: beyond relational databases

Notes de l'éditeur