SlideShare a Scribd company logo
1 of 46
Download to read offline
Vishnu Rao
MySQL Enthusiast
Doodle maker
Senior Data Engineer @ DataSpark
Formerly @ flipkart.com
The comma separated list ...
● Hadoop , Hbase, Rocks Db
● MySQL , MariaDB , Postgres
● Cassandra , MongoDb
● Druid , Redis, MemSQL
● Elastic Search , Solr
● Cockroach Db, Couch db
● Vertica , Infobright
● Redshift , Dynamo Db
● S3 , OpenStack Swift ….
The FUN-damental Qns:
The FUN-damental Qns:
Which one should I use ?
Demystifying
Datastores
Lets try to look at the problem from
the view of the database
First lets play some baseball ...
Base 0 : The Data itself
Base 0 : The Data itself
● Row having columns
Base 0 : The Data itself
● Row having columns
● Key - Value
Base 0 : The Data itself
● Row having columns
● Key - Value
○ Key - Blob (u think object)
Base 0 : The Data itself
● Row having columns
● Key - Value
○ Key - Blob (u think object)
○ Key - Document (u think json / xml)
Base 0 : The Data itself
● Row having columns
● Key - Value
○ Key - Blob (u think object)
○ Key - Document (u think json / xml)
● Graph (Nodes/edges kind of like key-value)
Base 1 : How is the Data Stored ?
Base 1 : How is the Data Stored ?
Let’s consider a Sample Data Record/Row
order-id-123 customer-1 5$ bill amount Bugis
Street
1$ Tax 3 Items
Base 1 : How is the Data Stored ?
Let’s consider a Sample Data Record/Row
order-id-123 customer-1 5$ bill amount Bugis
Street
1$ Tax 3 Items
Columns / Attributes
Possible PrimaryKey
Column
Base 1 : How is the Data Stored ?
Approach 1
● Store all columns of the Row side by side (i.e. TOGETHER ) on disk.
Base 1 : How is the Data Stored ?
Approach 1
● Store all columns of the Row side by side (i.e. TOGETHER ) on disk.
● This is generally referred to as a ROW based DataStore.
Base 1 : How is the Data Stored ?
Approach 1
● Useful for use cases like “showing ENTIRE Order on UI”
order-id-123 customer-1 5$ bill amount Bugis
Street
1$ Tax 3 Items
Base 1 : How is the Data Stored ?
Approach 1
● Useful for use cases like “showing ENTIRE Order on UI”
● The entire row is fetched in one disk access
order-id-123 customer-1 5$ bill amount Bugis
Street
1$ Tax 3 Items
Base 1 : How is the Data Stored ?
Approach 2
● Store Columns SEPARATELY, so that they can be accessed
independently.
Base 1 : How is the Data Stored ?
Approach 2
● Store Columns SEPARATELY, so that they can be accessed
independently.
● This is generally referred to as a COLUMN based DataStore.
Base 1 : How is the Data Stored ?
Approach 2
● Avg(billing_amount) or Sum(Items)
order-id-123 customer-1 5$ bill amount
Bugis
Street1$ tax 3 items
order-id-121 customer-1 2$ bill amount 2$ tax 1 items
Bugis
Street
Base 1 : How is the Data Stored ?
Approach 2
● Avg(billing_amount) or Sum(Items)
● Instead of fetching entire row, fetch necessary columns for compute
○ I.e Less Data fetched from Disk = REDUCED IO
order-id-123 customer-1 5$ bill amount
Bugis
Street1$ tax 3 items
order-id-121 customer-1 2$ bill amount 2$ tax 1 items
Bugis
Street
Base 1 : How is the Data Stored ?
Approach 2
● What are the other optimisations for column store.
○ Imagine 4 rows with column say ‘age’
■ Row 1 - 28
■ Row 2- 30
■ Row 3 - 28
■ Row 4- 28
Base 1 : How is the Data Stored ?
Approach 2
● While storing on disk , if you SORT and store, you can
also think of compression:
28,28,28,30 (sorted -> good for search now)
28(3),30 (now compressed -> 28 stored once)
Base 1 : How is the Data Stored ?
Typically :
● MySQL / Postgres = ROW based
● Vertica / Infobright / Druid = COLUMN based
Base 1 : How is the Data Stored ?
Approach 2.5
● Store Group of Columns TOGETHER but store each group separately.
Base 1 : How is the Data Stored ?
Approach 2.5
● Store Group of Columns TOGETHER but store each group separately.
● This is generally referred to as a COLUMN-family based DataStore.
Base 1 : How is the Data Stored ?
Approach 2.5
Logically group the columns.
order-id-123
customer-1
5$ bill amount
Bugis
Street
1$ tax 3 items
Base 1 : How is the Data Stored ?
Approach 2.5
Logically group the columns.
Typically: Hbase/Cassandra
order-id-123
customer-1
5$ bill amount
Bugis
Street
1$ tax 3 items
Base 2 : The Indexing
● What kind of Data Structure is used ?
Base 2 : The Indexing
● What kind of Data Structure is used ?
○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ?
Base 2 : The Indexing
● What kind of Data Structure is used ?
○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ?
● Certain type of queries like certain indexes
Base 2 : The Indexing
● What kind of Data Structure is used ?
○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ?
● Certain type of queries like certain indexes
○ Range like B-tree, Inserts like Fractal.
Base 2 : The Indexing
● What kind of Data Structure is used ?
○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ?
● Certain type of queries like certain indexes
○ Range like B-tree, Inserts like Fractal.
● Whats the index loading mechanism ?
○ Redis is Memory bound.
Base 3 : The Theorem
● Most Datastores do
○ Horizontal scaling
○ Sharding
Base 3 : The Theorem
● Most Datastores do
○ Horizontal scaling
○ Sharding
● So Here is the Catch - In event of Network Partition,
○ How is Consistency / Availability Handled ?
Base 4 : Apart from CAP theorem
Base 4 : Apart from CAP theorem
● ACID ?
○ Transaction commit/Rollback support
Base 4 : Apart from CAP theorem
● ACID ?
○ Transaction commit/Rollback support
● BASE ?
○ Basically Available , Soft State, Eventual Consistency ?
Base 4 : Apart from CAP theorem
● ACID ?
○ Transaction commit/Rollback support
● BASE ?
○ Basically Available , Soft State, Eventual Consistency ?
● Can I do joins if data is sharded ?
○ What about Distribution awareness ?
Base 4 : Apart from CAP theorem
● ACID ?
○ Transaction commit/Rollback support
● BASE ?
○ Basically Available , Soft State, Eventual Consistency ?
● Can I do joins if data is sharded ?
○ What about Distribution awareness ?
● The Query Interface (major concern ?)
The bases...
So, Try to cover the Bases & decide if you need it..
PS: There is no Silver Bullet
Thank you.
Vishnu Rao
jaihind213
sweetweet213
mash213.wordpress.com
linkedin.com/in/213vishnu

More Related Content

Viewers also liked

Viewers also liked (20)

Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker
 
What every data programmer needs to know about disks
What every data programmer needs to know about disksWhat every data programmer needs to know about disks
What every data programmer needs to know about disks
 
Punch clock for debugging apache storm
Punch clock for  debugging apache stormPunch clock for  debugging apache storm
Punch clock for debugging apache storm
 
a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
 
Sensu Monitoring
Sensu MonitoringSensu Monitoring
Sensu Monitoring
 
Do you need microservices architecture?
Do you need microservices architecture?Do you need microservices architecture?
Do you need microservices architecture?
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Redis overview
Redis overviewRedis overview
Redis overview
 
Golang 101 (Concurrency vs Parallelism)
Golang 101 (Concurrency vs Parallelism)Golang 101 (Concurrency vs Parallelism)
Golang 101 (Concurrency vs Parallelism)
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Introduction of Mesosphere DCOS
Introduction of Mesosphere DCOSIntroduction of Mesosphere DCOS
Introduction of Mesosphere DCOS
 
Сергей Радзыняк ".NET Microservices in Real Life"
Сергей Радзыняк ".NET Microservices in Real Life"Сергей Радзыняк ".NET Microservices in Real Life"
Сергей Радзыняк ".NET Microservices in Real Life"
 
CloudConf2017 - Deploy, Scale & Coordinate a microservice oriented application
CloudConf2017 - Deploy, Scale & Coordinate a microservice oriented applicationCloudConf2017 - Deploy, Scale & Coordinate a microservice oriented application
CloudConf2017 - Deploy, Scale & Coordinate a microservice oriented application
 
Introduction To Anypoint CloudHub With Mulesoft
Introduction To Anypoint CloudHub With MulesoftIntroduction To Anypoint CloudHub With Mulesoft
Introduction To Anypoint CloudHub With Mulesoft
 
Redis原生命令介绍
Redis原生命令介绍Redis原生命令介绍
Redis原生命令介绍
 
Spring IO '15 - Developing microservices, Spring Boot or Grails?
Spring IO '15 - Developing microservices, Spring Boot or Grails?Spring IO '15 - Developing microservices, Spring Boot or Grails?
Spring IO '15 - Developing microservices, Spring Boot or Grails?
 
Running .NET on Docker
Running .NET on DockerRunning .NET on Docker
Running .NET on Docker
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 

Similar to Demystifying datastores

Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
Siva Rushi
 

Similar to Demystifying datastores (20)

Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
M|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerM|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB Server
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Kill mysql-performance
Kill mysql-performanceKill mysql-performance
Kill mysql-performance
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
 
Really Big Elephants: PostgreSQL DW
Really Big Elephants: PostgreSQL DWReally Big Elephants: PostgreSQL DW
Really Big Elephants: PostgreSQL DW
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
NoSQL - Leo's notes
NoSQL - Leo's notesNoSQL - Leo's notes
NoSQL - Leo's notes
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
 
MySQL Indexes and Histograms - RMOUG Training Days 2022
MySQL Indexes and Histograms - RMOUG Training Days 2022MySQL Indexes and Histograms - RMOUG Training Days 2022
MySQL Indexes and Histograms - RMOUG Training Days 2022
 
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, ItalyPHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 

More from vishnu rao (6)

A talk on mysql & aurora
A talk on mysql & auroraA talk on mysql & aurora
A talk on mysql & aurora
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Mysql Relay log - the unsung hero
Mysql Relay log - the unsung heroMysql Relay log - the unsung hero
Mysql Relay log - the unsung hero
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
Druid beginner performance tips
Druid beginner performance tipsDruid beginner performance tips
Druid beginner performance tips
 
StormWars - when the data stream shrinks
StormWars - when the data stream shrinksStormWars - when the data stream shrinks
StormWars - when the data stream shrinks
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Demystifying datastores

  • 1. Vishnu Rao MySQL Enthusiast Doodle maker Senior Data Engineer @ DataSpark Formerly @ flipkart.com
  • 2. The comma separated list ... ● Hadoop , Hbase, Rocks Db ● MySQL , MariaDB , Postgres ● Cassandra , MongoDb ● Druid , Redis, MemSQL ● Elastic Search , Solr ● Cockroach Db, Couch db ● Vertica , Infobright ● Redshift , Dynamo Db ● S3 , OpenStack Swift ….
  • 4. The FUN-damental Qns: Which one should I use ?
  • 6. Lets try to look at the problem from the view of the database
  • 7. First lets play some baseball ...
  • 8. Base 0 : The Data itself
  • 9. Base 0 : The Data itself ● Row having columns
  • 10. Base 0 : The Data itself ● Row having columns ● Key - Value
  • 11. Base 0 : The Data itself ● Row having columns ● Key - Value ○ Key - Blob (u think object)
  • 12. Base 0 : The Data itself ● Row having columns ● Key - Value ○ Key - Blob (u think object) ○ Key - Document (u think json / xml)
  • 13. Base 0 : The Data itself ● Row having columns ● Key - Value ○ Key - Blob (u think object) ○ Key - Document (u think json / xml) ● Graph (Nodes/edges kind of like key-value)
  • 14. Base 1 : How is the Data Stored ?
  • 15. Base 1 : How is the Data Stored ? Let’s consider a Sample Data Record/Row order-id-123 customer-1 5$ bill amount Bugis Street 1$ Tax 3 Items
  • 16. Base 1 : How is the Data Stored ? Let’s consider a Sample Data Record/Row order-id-123 customer-1 5$ bill amount Bugis Street 1$ Tax 3 Items Columns / Attributes Possible PrimaryKey Column
  • 17. Base 1 : How is the Data Stored ? Approach 1 ● Store all columns of the Row side by side (i.e. TOGETHER ) on disk.
  • 18. Base 1 : How is the Data Stored ? Approach 1 ● Store all columns of the Row side by side (i.e. TOGETHER ) on disk. ● This is generally referred to as a ROW based DataStore.
  • 19. Base 1 : How is the Data Stored ? Approach 1 ● Useful for use cases like “showing ENTIRE Order on UI” order-id-123 customer-1 5$ bill amount Bugis Street 1$ Tax 3 Items
  • 20. Base 1 : How is the Data Stored ? Approach 1 ● Useful for use cases like “showing ENTIRE Order on UI” ● The entire row is fetched in one disk access order-id-123 customer-1 5$ bill amount Bugis Street 1$ Tax 3 Items
  • 21. Base 1 : How is the Data Stored ? Approach 2 ● Store Columns SEPARATELY, so that they can be accessed independently.
  • 22. Base 1 : How is the Data Stored ? Approach 2 ● Store Columns SEPARATELY, so that they can be accessed independently. ● This is generally referred to as a COLUMN based DataStore.
  • 23. Base 1 : How is the Data Stored ? Approach 2 ● Avg(billing_amount) or Sum(Items) order-id-123 customer-1 5$ bill amount Bugis Street1$ tax 3 items order-id-121 customer-1 2$ bill amount 2$ tax 1 items Bugis Street
  • 24. Base 1 : How is the Data Stored ? Approach 2 ● Avg(billing_amount) or Sum(Items) ● Instead of fetching entire row, fetch necessary columns for compute ○ I.e Less Data fetched from Disk = REDUCED IO order-id-123 customer-1 5$ bill amount Bugis Street1$ tax 3 items order-id-121 customer-1 2$ bill amount 2$ tax 1 items Bugis Street
  • 25. Base 1 : How is the Data Stored ? Approach 2 ● What are the other optimisations for column store. ○ Imagine 4 rows with column say ‘age’ ■ Row 1 - 28 ■ Row 2- 30 ■ Row 3 - 28 ■ Row 4- 28
  • 26. Base 1 : How is the Data Stored ? Approach 2 ● While storing on disk , if you SORT and store, you can also think of compression: 28,28,28,30 (sorted -> good for search now) 28(3),30 (now compressed -> 28 stored once)
  • 27. Base 1 : How is the Data Stored ? Typically : ● MySQL / Postgres = ROW based ● Vertica / Infobright / Druid = COLUMN based
  • 28. Base 1 : How is the Data Stored ? Approach 2.5 ● Store Group of Columns TOGETHER but store each group separately.
  • 29. Base 1 : How is the Data Stored ? Approach 2.5 ● Store Group of Columns TOGETHER but store each group separately. ● This is generally referred to as a COLUMN-family based DataStore.
  • 30. Base 1 : How is the Data Stored ? Approach 2.5 Logically group the columns. order-id-123 customer-1 5$ bill amount Bugis Street 1$ tax 3 items
  • 31. Base 1 : How is the Data Stored ? Approach 2.5 Logically group the columns. Typically: Hbase/Cassandra order-id-123 customer-1 5$ bill amount Bugis Street 1$ tax 3 items
  • 32. Base 2 : The Indexing ● What kind of Data Structure is used ?
  • 33. Base 2 : The Indexing ● What kind of Data Structure is used ? ○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ?
  • 34. Base 2 : The Indexing ● What kind of Data Structure is used ? ○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ? ● Certain type of queries like certain indexes
  • 35. Base 2 : The Indexing ● What kind of Data Structure is used ? ○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ? ● Certain type of queries like certain indexes ○ Range like B-tree, Inserts like Fractal.
  • 36. Base 2 : The Indexing ● What kind of Data Structure is used ? ○ B-tree, Inverted Index , Fractal Tree, Clustered Key , BitMap, No Index ? ● Certain type of queries like certain indexes ○ Range like B-tree, Inserts like Fractal. ● Whats the index loading mechanism ? ○ Redis is Memory bound.
  • 37. Base 3 : The Theorem ● Most Datastores do ○ Horizontal scaling ○ Sharding
  • 38. Base 3 : The Theorem ● Most Datastores do ○ Horizontal scaling ○ Sharding ● So Here is the Catch - In event of Network Partition, ○ How is Consistency / Availability Handled ?
  • 39. Base 4 : Apart from CAP theorem
  • 40. Base 4 : Apart from CAP theorem ● ACID ? ○ Transaction commit/Rollback support
  • 41. Base 4 : Apart from CAP theorem ● ACID ? ○ Transaction commit/Rollback support ● BASE ? ○ Basically Available , Soft State, Eventual Consistency ?
  • 42. Base 4 : Apart from CAP theorem ● ACID ? ○ Transaction commit/Rollback support ● BASE ? ○ Basically Available , Soft State, Eventual Consistency ? ● Can I do joins if data is sharded ? ○ What about Distribution awareness ?
  • 43. Base 4 : Apart from CAP theorem ● ACID ? ○ Transaction commit/Rollback support ● BASE ? ○ Basically Available , Soft State, Eventual Consistency ? ● Can I do joins if data is sharded ? ○ What about Distribution awareness ? ● The Query Interface (major concern ?)
  • 45. So, Try to cover the Bases & decide if you need it.. PS: There is no Silver Bullet