SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
NOSQL Data Stores
                                    Not Only SQL




Tuesday, September 21, 2010
Data Store

                    Super Set
                    Relational Databases
                    Key Value Stores
                    Document Stores
                    Column Family Stores

Tuesday, September 21, 2010
Design This Schema
                              Student   Course
                    Student
                    Address
                              Address   Score
                    Course
                    Score


Tuesday, September 21, 2010
Scalable huh??

                    Use Case : This schema has to serve the whole student
                    community in this world
                    One Big Server?? How Big?
                    More than 1 Servers. How will that work?



Tuesday, September 21, 2010
WHY NOSQL ?

                    Scalability : Horizontal
                    Relational Databases do no good when distributed
                    NOSQL : Distributed, Flexible Schema, Relaxing Consistency



Tuesday, September 21, 2010
Issues with Relational DB

                              Scalability
                              Replication : Scaling by duplication
                              Partitioning(Sharding) : Scaling by division



Tuesday, September 21, 2010
Replication
                    Master - Slave
                              1 write = N * writes (N is number of slaves)
                              Faster reads ( Can Read from N nodes)
                              Critical Reads Go to Master (Application Aware)
                              Limitations of high volumes of data


Tuesday, September 21, 2010
Replication
                    Multi - Master
                              Adding more masters
                              Conflict resolution O(n^3) or O(n^2)




Tuesday, September 21, 2010
Partitioning(Sharding)
                    Scales Read as well as Writes
                    Application needs to be Partition Aware
                    Broken Relationships : Cartesian products across shards ??
                    Referential Integrity is no more
                    Rebalancing


Tuesday, September 21, 2010
Consistent Hashing
                    Hash Ring (Or Clock Face)




                     Balanced Distribution      After Adding a new Node
Tuesday, September 21, 2010
Common Sharding Schemes
                    Vertical Partitioning
                    Range Based Partitioning
                    Hash Based Partitioning
                    Directory Based Partitioning




Tuesday, September 21, 2010
Can live without !!
                    UPDATE and DELETE
                              Loss of Information
                              Can be modeled as INSERT with versioning
                              Filter out inactive records




Tuesday, September 21, 2010
Avoid JOINS
                    Expensive, Fails with partitions
                    How to avoid?
                              De - normalize
                              Storage is cheap now
                              Burden of Consistency shifts to application


Tuesday, September 21, 2010
Still need ACID ??
                    Atomicity : Only Single key is enough
                    Consistency : CAP Theorem
                              Can only get any two of Consistency, Availability,
                              Partition Tolerance
                    Isolation : Not more than Read - Committed (Single Key)
                    Durability : Node failures. Peer Replication

Tuesday, September 21, 2010
Fixed Schema
                    Schema comes before Data
                    Modifying Schema is essential
                              Adding new features
                    Modifying Schema is hard
                              Locking of rows(Add/Modify a column)
                              Locking of table(Add/Remove index)

Tuesday, September 21, 2010
Model this!!
                    Hierarchal Data
                    Graphs




Tuesday, September 21, 2010
Desired Characteristics
                    High Scalability
                              Add nodes incrementally
                              No Diminishing Returns
                    High Availability
                              No single point of failure
                              Node Failures agnostic

Tuesday, September 21, 2010
Desired Characteristics
                    High Performance
                              Fast operations
                              Non - Blocking Writes
                    Consistency
                              No need of Strong consistency
                              Eventual Consistency, Read - Your - Write Consistency

Tuesday, September 21, 2010
Desired Characteristics
                    Deployment Flexibility
                              Add/Remove node automatically
                              NO DFS or shared storage
                              Should work with commodity heterogenous hardware
                    Modeling Flexibility
                              Key - Value Pairs, Hierarchal and Graph Data

Tuesday, September 21, 2010
Desired Characteristics
                    Query Flexibility
                              Multi Gets
                              Range Queries
                              Upserts




Tuesday, September 21, 2010
Inspiration
                    Memcached
                              In-memory Key Value
                              Blazing Fast
                              Infinite Horizontal Scalability




Tuesday, September 21, 2010
Key Value Stores
                    Simple Data Model
                    Amazon Dynamo
                    Amazon S3
                    Project Voldemort
                    Redis
                    Scalaris and lot others

Tuesday, September 21, 2010
Amazon Dynamo
                    Internal to Amazon
                    Distributed K-V store
                              Opaque Values
                    Partitioning
                              A variant of consistent hashing
                              Hash Ring division

Tuesday, September 21, 2010
Amazon Dynamo
                    Partitioning
                              Mapping Communication via Gossip protocol
                              Eventually consistent view of mappings
                    Replication
                              Each key is replicated on N nodes
                              Preference List

Tuesday, September 21, 2010
Amazon Dynamo
                    Replication
                              Read/Write through Coordinator nodes
                              Configurations
                                 N = number of replicas
                                 W = min. nodes that must ACK the receipt of a WRITE
                                 R = min. nodes contacted for a READ
                                 R+W > N will ensure Quorum
Tuesday, September 21, 2010
Amazon Dynamo
                    Tuning (N,R,W)
                              Increased W means more replication
                              Increased R mean high consistency low performance
                              Typical values for Amazon Apps (N,R,W)= (3,2,2)




Tuesday, September 21, 2010
Amazon Dynamo
                    Consistency
                              Eventually consistent
                              Uses Object versioning via Vector Clocks
                    Consistency Protocol
                              Return all versions
                              Reconcile divergent versions
                              Reconciled version superseding the current is written
Tuesday, September 21, 2010
Amazon Dynamo
                    Handling Temporary Failures
                              Hinted Handoff
                    Handling Permanent Failures
                              Node Sync




Tuesday, September 21, 2010
Amazon Dynamo
                    Ring membership
                              Add/Remove node needs rebalancing
                    Failure Detection
                              Gossip about failures
                              Check periodically about availability and gossip


Tuesday, September 21, 2010
Other K-V Stores
                    Check out others too. Worth a read and try.
                    S3,Voldemort,Redis,Scalaris.




Tuesday, September 21, 2010
Document Stores
                    Step further from K-V stores
                    Value is full blown record(document)
                    Document is not Opaque(Expose a structure to perform
                    operations)
                    Each document can have different schema e.g JSON
                    Relations are possible
                              One to Many and Many to Many
Tuesday, September 21, 2010
Document Stores
                    Mostly Similar to relational db(except upfront Schema)
                    Amazon Simple DB
                    Apache CouchDB
                    Riak
                    Mongo DB


Tuesday, September 21, 2010
Mongo DB
                    We use mongo in a large automated translation software
                    Data Model
                              Key - Value, value being binary serialized JSON(BSON)
                                 4 Mb limit on BSON
                                 For larger object use GridFS.
                              Collections : more of like a table
                              B-trees used for indexes
Tuesday, September 21, 2010
Mongo DB
                    Storage
                              Uses Memory Mapped Files(Cache controlled by OS VMM)
                    Writes
                              In place updates
                              partial updates
                              Single Document Atomic updates

Tuesday, September 21, 2010
Mongo DB
                    Queries
                              JSON style based syntax (powered by js engine)
                              Support for conditional operators,regex etc
                              Cursor support
                              Query optimizers
                    Map-Reduce over a collection

Tuesday, September 21, 2010
Mongo DB
                    Replication
                              Master Slave
                              Replica Pairs
                              Master - Master




Tuesday, September 21, 2010
Mongo DB
                    Partitioning
                              Auto Sharding Done through chunks(50 Mb max)
                              Easy node addition
                              Auto balancing
                              ZERO single point of failure
                              Automatic Failover

Tuesday, September 21, 2010
Column Family Stores
                    Sparse, Distributed, Persistent, Multi-Dimensional sorted Map
                    Column Keys are grouped into sets called column-families
                    BigTable
                    HBase
                    Cassandra


Tuesday, September 21, 2010
Big Table Column Family




Tuesday, September 21, 2010
Cassandra
                    Combines distributed architecture of Dynamo with column-
                    family data model of Big Table




Tuesday, September 21, 2010
Cassandra
                    Data Model : Multi Dimensional Map indexed by a key
                              Each app has its own key-space
                              Key can be any long string. Indexed by cassandra
                              Column - an attribute of record. Time Stamped
                              Column-Family: Grouping of columns. Similar to
                              relational table
                              Super Columns: List of columns
Tuesday, September 21, 2010
Cassandra
                    Data Model
                              Column family can contain any one of column/super
                              column
                              KeySpace.ColumnFamily.Key.[SuperColumn].Column
                    Sorting
                              Data is sorted at write time
                              Columns are sorted within their row by column name
                              (pluggable sorting providers)
Tuesday, September 21, 2010
Cassandra
                    Partitioning : Mostly Like Dynamo
                              Consistent hashing under order preserving hash function
                              Uses Chord approach to load balance(dynamo used v-
                              node)




Tuesday, September 21, 2010
Cassandra
                    Replication
                              Coordinator nodes and preference list as Dynamo
                              DataCenter aware, rack aware, rack-unaware
                              Rack aware uses Zookeeper
                              Membership based on ScuttleButt- anti-entropy gossip


Tuesday, September 21, 2010
Cassandra
                    Failure Detection
                              Modified version of Accrual failure detection
                    Failure Handling
                              Same as hinted handoff in Dynamo




Tuesday, September 21, 2010
Cassandra
                    Write
                              Writing to commit log, followed by an update to
                              memtable.
                              Dedicated disk for commit log(Makes write sequential)
                              No seeks-always sequential, so blazing fast
                              Atomic With in column family

Tuesday, September 21, 2010
Cassandra
                    Read
                              Similar to dynamo to figure out which nodes will serve
                              Similar to Big Table for storage level




Tuesday, September 21, 2010
Thanks!!!
                    Due regards to Reddy Raja for this invite.




Tuesday, September 21, 2010

Contenu connexe

Similaire à Nosql

Abstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterAbstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterLeonidas Tsementzis
 
Building Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and CascadingBuilding Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and Cascadingcwensel
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL DatabaseSteve Min
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNGDATA
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and DeterminismDaniel Abadi
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseDataStax
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) Andreas Chatzakis
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Bill Wilder
 
Summer Training In Dotnet
Summer Training In DotnetSummer Training In Dotnet
Summer Training In DotnetDUCC Systems
 
Summer training in dotnet
Summer training in dotnetSummer training in dotnet
Summer training in dotnetDUCC Systems
 
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukRoger Xia
 
DALi - A database abstraction layer
DALi - A database abstraction layerDALi - A database abstraction layer
DALi - A database abstraction layerESUG
 

Similaire à Nosql (16)

Abstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterAbstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at Twitter
 
Nosql and newsql
Nosql and newsqlNosql and newsql
Nosql and newsql
 
Building Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and CascadingBuilding Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and Cascading
 
DEVCON1 - BooJs
DEVCON1 - BooJsDEVCON1 - BooJs
DEVCON1 - BooJs
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
 
Summer Training In Dotnet
Summer Training In DotnetSummer Training In Dotnet
Summer Training In Dotnet
 
Summer training in dotnet
Summer training in dotnetSummer training in dotnet
Summer training in dotnet
 
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
 
Introduction to AWS tools
Introduction to AWS toolsIntroduction to AWS tools
Introduction to AWS tools
 
DALi - A database abstraction layer
DALi - A database abstraction layerDALi - A database abstraction layer
DALi - A database abstraction layer
 

Nosql

  • 1. NOSQL Data Stores Not Only SQL Tuesday, September 21, 2010
  • 2. Data Store Super Set Relational Databases Key Value Stores Document Stores Column Family Stores Tuesday, September 21, 2010
  • 3. Design This Schema Student Course Student Address Address Score Course Score Tuesday, September 21, 2010
  • 4. Scalable huh?? Use Case : This schema has to serve the whole student community in this world One Big Server?? How Big? More than 1 Servers. How will that work? Tuesday, September 21, 2010
  • 5. WHY NOSQL ? Scalability : Horizontal Relational Databases do no good when distributed NOSQL : Distributed, Flexible Schema, Relaxing Consistency Tuesday, September 21, 2010
  • 6. Issues with Relational DB Scalability Replication : Scaling by duplication Partitioning(Sharding) : Scaling by division Tuesday, September 21, 2010
  • 7. Replication Master - Slave 1 write = N * writes (N is number of slaves) Faster reads ( Can Read from N nodes) Critical Reads Go to Master (Application Aware) Limitations of high volumes of data Tuesday, September 21, 2010
  • 8. Replication Multi - Master Adding more masters Conflict resolution O(n^3) or O(n^2) Tuesday, September 21, 2010
  • 9. Partitioning(Sharding) Scales Read as well as Writes Application needs to be Partition Aware Broken Relationships : Cartesian products across shards ?? Referential Integrity is no more Rebalancing Tuesday, September 21, 2010
  • 10. Consistent Hashing Hash Ring (Or Clock Face) Balanced Distribution After Adding a new Node Tuesday, September 21, 2010
  • 11. Common Sharding Schemes Vertical Partitioning Range Based Partitioning Hash Based Partitioning Directory Based Partitioning Tuesday, September 21, 2010
  • 12. Can live without !! UPDATE and DELETE Loss of Information Can be modeled as INSERT with versioning Filter out inactive records Tuesday, September 21, 2010
  • 13. Avoid JOINS Expensive, Fails with partitions How to avoid? De - normalize Storage is cheap now Burden of Consistency shifts to application Tuesday, September 21, 2010
  • 14. Still need ACID ?? Atomicity : Only Single key is enough Consistency : CAP Theorem Can only get any two of Consistency, Availability, Partition Tolerance Isolation : Not more than Read - Committed (Single Key) Durability : Node failures. Peer Replication Tuesday, September 21, 2010
  • 15. Fixed Schema Schema comes before Data Modifying Schema is essential Adding new features Modifying Schema is hard Locking of rows(Add/Modify a column) Locking of table(Add/Remove index) Tuesday, September 21, 2010
  • 16. Model this!! Hierarchal Data Graphs Tuesday, September 21, 2010
  • 17. Desired Characteristics High Scalability Add nodes incrementally No Diminishing Returns High Availability No single point of failure Node Failures agnostic Tuesday, September 21, 2010
  • 18. Desired Characteristics High Performance Fast operations Non - Blocking Writes Consistency No need of Strong consistency Eventual Consistency, Read - Your - Write Consistency Tuesday, September 21, 2010
  • 19. Desired Characteristics Deployment Flexibility Add/Remove node automatically NO DFS or shared storage Should work with commodity heterogenous hardware Modeling Flexibility Key - Value Pairs, Hierarchal and Graph Data Tuesday, September 21, 2010
  • 20. Desired Characteristics Query Flexibility Multi Gets Range Queries Upserts Tuesday, September 21, 2010
  • 21. Inspiration Memcached In-memory Key Value Blazing Fast Infinite Horizontal Scalability Tuesday, September 21, 2010
  • 22. Key Value Stores Simple Data Model Amazon Dynamo Amazon S3 Project Voldemort Redis Scalaris and lot others Tuesday, September 21, 2010
  • 23. Amazon Dynamo Internal to Amazon Distributed K-V store Opaque Values Partitioning A variant of consistent hashing Hash Ring division Tuesday, September 21, 2010
  • 24. Amazon Dynamo Partitioning Mapping Communication via Gossip protocol Eventually consistent view of mappings Replication Each key is replicated on N nodes Preference List Tuesday, September 21, 2010
  • 25. Amazon Dynamo Replication Read/Write through Coordinator nodes Configurations N = number of replicas W = min. nodes that must ACK the receipt of a WRITE R = min. nodes contacted for a READ R+W > N will ensure Quorum Tuesday, September 21, 2010
  • 26. Amazon Dynamo Tuning (N,R,W) Increased W means more replication Increased R mean high consistency low performance Typical values for Amazon Apps (N,R,W)= (3,2,2) Tuesday, September 21, 2010
  • 27. Amazon Dynamo Consistency Eventually consistent Uses Object versioning via Vector Clocks Consistency Protocol Return all versions Reconcile divergent versions Reconciled version superseding the current is written Tuesday, September 21, 2010
  • 28. Amazon Dynamo Handling Temporary Failures Hinted Handoff Handling Permanent Failures Node Sync Tuesday, September 21, 2010
  • 29. Amazon Dynamo Ring membership Add/Remove node needs rebalancing Failure Detection Gossip about failures Check periodically about availability and gossip Tuesday, September 21, 2010
  • 30. Other K-V Stores Check out others too. Worth a read and try. S3,Voldemort,Redis,Scalaris. Tuesday, September 21, 2010
  • 31. Document Stores Step further from K-V stores Value is full blown record(document) Document is not Opaque(Expose a structure to perform operations) Each document can have different schema e.g JSON Relations are possible One to Many and Many to Many Tuesday, September 21, 2010
  • 32. Document Stores Mostly Similar to relational db(except upfront Schema) Amazon Simple DB Apache CouchDB Riak Mongo DB Tuesday, September 21, 2010
  • 33. Mongo DB We use mongo in a large automated translation software Data Model Key - Value, value being binary serialized JSON(BSON) 4 Mb limit on BSON For larger object use GridFS. Collections : more of like a table B-trees used for indexes Tuesday, September 21, 2010
  • 34. Mongo DB Storage Uses Memory Mapped Files(Cache controlled by OS VMM) Writes In place updates partial updates Single Document Atomic updates Tuesday, September 21, 2010
  • 35. Mongo DB Queries JSON style based syntax (powered by js engine) Support for conditional operators,regex etc Cursor support Query optimizers Map-Reduce over a collection Tuesday, September 21, 2010
  • 36. Mongo DB Replication Master Slave Replica Pairs Master - Master Tuesday, September 21, 2010
  • 37. Mongo DB Partitioning Auto Sharding Done through chunks(50 Mb max) Easy node addition Auto balancing ZERO single point of failure Automatic Failover Tuesday, September 21, 2010
  • 38. Column Family Stores Sparse, Distributed, Persistent, Multi-Dimensional sorted Map Column Keys are grouped into sets called column-families BigTable HBase Cassandra Tuesday, September 21, 2010
  • 39. Big Table Column Family Tuesday, September 21, 2010
  • 40. Cassandra Combines distributed architecture of Dynamo with column- family data model of Big Table Tuesday, September 21, 2010
  • 41. Cassandra Data Model : Multi Dimensional Map indexed by a key Each app has its own key-space Key can be any long string. Indexed by cassandra Column - an attribute of record. Time Stamped Column-Family: Grouping of columns. Similar to relational table Super Columns: List of columns Tuesday, September 21, 2010
  • 42. Cassandra Data Model Column family can contain any one of column/super column KeySpace.ColumnFamily.Key.[SuperColumn].Column Sorting Data is sorted at write time Columns are sorted within their row by column name (pluggable sorting providers) Tuesday, September 21, 2010
  • 43. Cassandra Partitioning : Mostly Like Dynamo Consistent hashing under order preserving hash function Uses Chord approach to load balance(dynamo used v- node) Tuesday, September 21, 2010
  • 44. Cassandra Replication Coordinator nodes and preference list as Dynamo DataCenter aware, rack aware, rack-unaware Rack aware uses Zookeeper Membership based on ScuttleButt- anti-entropy gossip Tuesday, September 21, 2010
  • 45. Cassandra Failure Detection Modified version of Accrual failure detection Failure Handling Same as hinted handoff in Dynamo Tuesday, September 21, 2010
  • 46. Cassandra Write Writing to commit log, followed by an update to memtable. Dedicated disk for commit log(Makes write sequential) No seeks-always sequential, so blazing fast Atomic With in column family Tuesday, September 21, 2010
  • 47. Cassandra Read Similar to dynamo to figure out which nodes will serve Similar to Big Table for storage level Tuesday, September 21, 2010
  • 48. Thanks!!! Due regards to Reddy Raja for this invite. Tuesday, September 21, 2010