SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
What database?
- a practical guide to selection from
NoSQL, SQL and Polyglot data stores
Regunath B
twitter.com/RegunathB
github.com/regunathb
Engineering @ CureFit-HealthFace, ex-Flipkart Infra services, Built Aadhaar
State of the Database Landscape - the options
What to look for?
Over-the-hood considerations
Data Manipulation
SQL[1]
KV operations put(k,v)
get(k)
remove(k)
Graph query
g.V().has(‘name','hercules').out('father').out('father').values('name')
Query
Language
Bulk Processing
• Loading
• Data Export &
Transfer
ACID properties
1.0
Atomicity
Consistency
Isolation
Durability
2.0
(For Scaling)
Associative
Commutative
Idempotent
Distributed
Transaction Support
Data Staleness
Ordering
Surviving Crashes
Transaction Support (Limited)
Relaxed Ordering (High- Throughput) - CRDTs [2]
Atleast-Once delivery (Eventually-Consistent)
High-Availability
Property
Impact
Wire-protocol, Standard
interfaces
• Wire Protocol
• Custom protocols over TCP/IP, Http, gRPC
• Support popular database protocols
• Postgres - e.g. CockroachDB
• Memcached - e.g. Couch base
• Standard Interfaces
• JDBC - e.g. Apache Hive, Phoenix for HBase, Vitess
Schema(less) support
Schema is
not Evil
• Why Schema-less
• Sparse-Metric and Entity-Attribute-Value storage needs
• Frequent changes
• Why Schema
• Understanding structure of data
• Referential Data integrity, Quality of data controlled by Data
dictionary
• In-between
• Schema-less but require Column Indexes (e.g. ColumnFamily
model of KV stores)
CAP theorem critique
• “one example of a fundamental trade-off between safety and
liveness in fault-prone systems” [3]
• Too simplistic [4]
• Choice of CA impractical mostly (Single node database), critique
therefore applies to CP or AP.
• CAP-Availability and CAP-Consistency is a spectrum and not binary
• e.g. AP-Reads, AP-Writes, Strong Consistency vs. Eventual
Consistency
• Define application tradeoffs, validate impact on NFRs - Latency,
Throughput
• Good starting point for considering Polyglot persistence
X
Polyglot Persistence - Tiered Data stores
Source: Aadhaar technology white-paper
Polyglot Persistence - CQRS
Source: Microsoft MSDN
Source: Flipkart catalog Write & Read[5]
Polyglot Persistence - pluggable storage,
secondary indices
• Healthcare Graph data
(Conditions, Symptoms) on
Apache Titan
• Mostly Read-only queries - Point
lookups, one-hop traversals
• AP-Read data (Storage engine :
Cassandra)
• Also query by properties of
Vertex/Edge(Secondary indices
in ES)
Source: CureFit Symptoms & Conditions datastore
Others
• Performance benchmarks - Latency, Throughput,
Concurrency - e.g. Graph DBs benchmark [6]
• Operations & Maintenance - e.g. MySQL as
backend data store for Facebook TAO [7], LinkedIn
Espresso [8]
• Support - Paid (single vendor vs. multiple),
Community (size, composition)
• Hosted service - on public clouds as a managed
service
What to look for?
Under-the-hood considerations
Database Type
• Relational
• All field values of a row stored together
• Common storage formats: BTree
• Better suited for OLTP
• Columnar
• All values of a column stored together
• More efficient data compression
• OLAP queries perform better
Source:
https://gerardnico.com/wiki/relation/structure/column_store
Database Type
• Document
• Sub-class of a KV store
• Often hierarchical (DB -> Collection ->
Document)
• Often have challenges in optimising
storage - due to lack of Data
Dictionary (schema-free)
• KV
• Often RAM based
• Durability through replication(sync)
and persistence to disk
• Preference for LSM over in-place
updates when designed for SSD
Source:
https://blog.mlab.com/2014/01/how-big-is-your-mongodb/
Source:
http://www.aerospike.com/technologies/
Data Organisation
• B-Tree
• Better suited for in-place updates
• Log Structured Merged (LSM)
• Better suited for high insert volume
• Better suited for SSD (for reducing write
amplification)
• Achieve high data locality of reference
through good row-key design [9]
Source: http://www.programering.com/a/MTMwAzMwATM.html
Source: http://www.cyanny.com/2014/03/13/hbase-architecture-
analysis-part1-logical-architecture/
Replication, Consensus
• Replication
• Sync vs. Async
• No. of Replicas, Min. Replicas, Journalling, Guaranteed writes
with hinted handoff
• Single master read-write(CP) vs. Replica reads(AP)
• Consensus
• Used in
• Leader election
• Committing transactions/Log replication
• Strength of protocol - Paxos, Raft, Zab etc.
• Jepsen Tests (https://jepsen.io/) - Tests ‘Safety’ of distributed
databases
• e.g. CockroachDB, MongoDB, VoltDB, Solr, Elastic Search etc
Source:https://martin.kleppmann.com
Source: https://raft.github.io/raft.pdf
Operations
• Data export & restore (RPO, RTO) - Disaster Recovery(DR)
• Tools for full export vs incremental snapshots
• Tools for restoring from exports, logs
• Piggy-back on XDC replication support to create continuous/ongoing
backup&restore
• Large scale data migration [10]
• Mean Time to Recovery (MTTR) - Node failure/Minor outages
• e.g. promoting hot-standby to master
• Tools to detect failure, validate data, promote new master/leader
Cost
• Disk-Memory ratio
• Database architecture to support disk storage, size of on-disk data w.r.t RAM
• Compute required
• No. of compute nodes required to keep data on-line
• Power Consumption
• SSD based databases generally more energy efficient than HDD
• Density of storage
• Relevant when storing large data over extended periods of time
• e.g. Aadhaar enrolment raw data, Facebook photos [11]
DB-specific Optimisations to leverage RAM, reduce Disk I/O
• Data block-cache/buffer-pool
• Reduces disk I/O
• Provides lower latency on repeat reads
• Provides potentially lower latency for
reads on high data locality of reference
• Bloom Filters
• Reduces disk I/O and row scanning in
random key lookups
Source: https://sematext.com
Source: Cloudera
References
• [1] - Google Spanner becoming a SQL System
• [2] - CRDTs in Riak
• [3] - Perspectives on the CAP Theorem
• [4] - Martin Kleppmann CP or AP
• [5] - Flipkart Catalog System, Datastore
• [6] - Do We Need Specialised Graph Databases?
• [7] - Facebook TAO social graph data store
• [8] - LinkedIn Espresso
• [9] - Facebook style notifications using HBase
• [10] - Flipkart DC migration
• [11] - Facebook cold storage system

Contenu connexe

Tendances

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullyMd Kamaruzzaman
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...✔ Eric David Benari, PMP
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsDan Lynn
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelishuguk
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Fwdays
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Amazon Web Services
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Data Con LA
 
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresCloudera, Inc.
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 

Tendances (20)

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data Analytics
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 

En vedette

E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres Regunath B
 
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uidAjit Dadresa
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagationRegunath B
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsRegunath B
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantomRegunath B
 
Facebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsFacebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsRegunath B
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome themsaipriyadonthula
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantageRegunath B
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 

En vedette (14)

E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uid
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Uid
UidUid
Uid
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
 
Facebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsFacebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streams
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Srikanth Nadhamuni
Srikanth NadhamuniSrikanth Nadhamuni
Srikanth Nadhamuni
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Similaire à What database

So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overviewStreamHorizon
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementDATAVERSITY
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehousehadoopsphere
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectRemy Rosenbaum
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Datamichaelguia
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectureshypertable
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedCloudera, Inc.
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 

Similaire à What database (20)

So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application Enablement
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons Learned
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 

Dernier

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 

Dernier (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 

What database

  • 1. What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores Regunath B twitter.com/RegunathB github.com/regunathb Engineering @ CureFit-HealthFace, ex-Flipkart Infra services, Built Aadhaar
  • 2. State of the Database Landscape - the options
  • 3. What to look for? Over-the-hood considerations
  • 4. Data Manipulation SQL[1] KV operations put(k,v) get(k) remove(k) Graph query g.V().has(‘name','hercules').out('father').out('father').values('name') Query Language Bulk Processing • Loading • Data Export & Transfer
  • 5. ACID properties 1.0 Atomicity Consistency Isolation Durability 2.0 (For Scaling) Associative Commutative Idempotent Distributed Transaction Support Data Staleness Ordering Surviving Crashes Transaction Support (Limited) Relaxed Ordering (High- Throughput) - CRDTs [2] Atleast-Once delivery (Eventually-Consistent) High-Availability Property Impact
  • 6. Wire-protocol, Standard interfaces • Wire Protocol • Custom protocols over TCP/IP, Http, gRPC • Support popular database protocols • Postgres - e.g. CockroachDB • Memcached - e.g. Couch base • Standard Interfaces • JDBC - e.g. Apache Hive, Phoenix for HBase, Vitess
  • 7. Schema(less) support Schema is not Evil • Why Schema-less • Sparse-Metric and Entity-Attribute-Value storage needs • Frequent changes • Why Schema • Understanding structure of data • Referential Data integrity, Quality of data controlled by Data dictionary • In-between • Schema-less but require Column Indexes (e.g. ColumnFamily model of KV stores)
  • 8. CAP theorem critique • “one example of a fundamental trade-off between safety and liveness in fault-prone systems” [3] • Too simplistic [4] • Choice of CA impractical mostly (Single node database), critique therefore applies to CP or AP. • CAP-Availability and CAP-Consistency is a spectrum and not binary • e.g. AP-Reads, AP-Writes, Strong Consistency vs. Eventual Consistency • Define application tradeoffs, validate impact on NFRs - Latency, Throughput • Good starting point for considering Polyglot persistence X
  • 9. Polyglot Persistence - Tiered Data stores Source: Aadhaar technology white-paper
  • 10. Polyglot Persistence - CQRS Source: Microsoft MSDN Source: Flipkart catalog Write & Read[5]
  • 11. Polyglot Persistence - pluggable storage, secondary indices • Healthcare Graph data (Conditions, Symptoms) on Apache Titan • Mostly Read-only queries - Point lookups, one-hop traversals • AP-Read data (Storage engine : Cassandra) • Also query by properties of Vertex/Edge(Secondary indices in ES) Source: CureFit Symptoms & Conditions datastore
  • 12. Others • Performance benchmarks - Latency, Throughput, Concurrency - e.g. Graph DBs benchmark [6] • Operations & Maintenance - e.g. MySQL as backend data store for Facebook TAO [7], LinkedIn Espresso [8] • Support - Paid (single vendor vs. multiple), Community (size, composition) • Hosted service - on public clouds as a managed service
  • 13. What to look for? Under-the-hood considerations
  • 14. Database Type • Relational • All field values of a row stored together • Common storage formats: BTree • Better suited for OLTP • Columnar • All values of a column stored together • More efficient data compression • OLAP queries perform better Source: https://gerardnico.com/wiki/relation/structure/column_store
  • 15. Database Type • Document • Sub-class of a KV store • Often hierarchical (DB -> Collection -> Document) • Often have challenges in optimising storage - due to lack of Data Dictionary (schema-free) • KV • Often RAM based • Durability through replication(sync) and persistence to disk • Preference for LSM over in-place updates when designed for SSD Source: https://blog.mlab.com/2014/01/how-big-is-your-mongodb/ Source: http://www.aerospike.com/technologies/
  • 16. Data Organisation • B-Tree • Better suited for in-place updates • Log Structured Merged (LSM) • Better suited for high insert volume • Better suited for SSD (for reducing write amplification) • Achieve high data locality of reference through good row-key design [9] Source: http://www.programering.com/a/MTMwAzMwATM.html Source: http://www.cyanny.com/2014/03/13/hbase-architecture- analysis-part1-logical-architecture/
  • 17. Replication, Consensus • Replication • Sync vs. Async • No. of Replicas, Min. Replicas, Journalling, Guaranteed writes with hinted handoff • Single master read-write(CP) vs. Replica reads(AP) • Consensus • Used in • Leader election • Committing transactions/Log replication • Strength of protocol - Paxos, Raft, Zab etc. • Jepsen Tests (https://jepsen.io/) - Tests ‘Safety’ of distributed databases • e.g. CockroachDB, MongoDB, VoltDB, Solr, Elastic Search etc Source:https://martin.kleppmann.com Source: https://raft.github.io/raft.pdf
  • 18. Operations • Data export & restore (RPO, RTO) - Disaster Recovery(DR) • Tools for full export vs incremental snapshots • Tools for restoring from exports, logs • Piggy-back on XDC replication support to create continuous/ongoing backup&restore • Large scale data migration [10] • Mean Time to Recovery (MTTR) - Node failure/Minor outages • e.g. promoting hot-standby to master • Tools to detect failure, validate data, promote new master/leader
  • 19. Cost • Disk-Memory ratio • Database architecture to support disk storage, size of on-disk data w.r.t RAM • Compute required • No. of compute nodes required to keep data on-line • Power Consumption • SSD based databases generally more energy efficient than HDD • Density of storage • Relevant when storing large data over extended periods of time • e.g. Aadhaar enrolment raw data, Facebook photos [11]
  • 20. DB-specific Optimisations to leverage RAM, reduce Disk I/O • Data block-cache/buffer-pool • Reduces disk I/O • Provides lower latency on repeat reads • Provides potentially lower latency for reads on high data locality of reference • Bloom Filters • Reduces disk I/O and row scanning in random key lookups Source: https://sematext.com Source: Cloudera
  • 21. References • [1] - Google Spanner becoming a SQL System • [2] - CRDTs in Riak • [3] - Perspectives on the CAP Theorem • [4] - Martin Kleppmann CP or AP • [5] - Flipkart Catalog System, Datastore • [6] - Do We Need Specialised Graph Databases? • [7] - Facebook TAO social graph data store • [8] - LinkedIn Espresso • [9] - Facebook style notifications using HBase • [10] - Flipkart DC migration • [11] - Facebook cold storage system