SlideShare une entreprise Scribd logo
1  sur  69
Télécharger pour lire hors ligne
The NoSQL Ecosystem



        Adam Marcus
         MIT CSAIL
marcua@csail.mit.edu / @marcua
About Me
     ●
         Social Computing + Database Systems
     ●
         Easily Distracted: Wrote The NoSQL Ecosystem in
         The Architecture of Open Source Applications   1




1
    http://www.aosabook.org/en/nosql.html
History, Compressed




Late 1990s                         Today
History, Compressed




 Buy RAM

Late 1990s                         Today
History, Compressed




 Buy RAM Wrap RDBMSs

Late 1990s                         Today
History, Compressed




 Buy RAM Wrap RDBMSs   NoSQL

Late 1990s                         Today
History, Compressed


                          Not Only SQL




 Buy RAM Wrap RDBMSs   NoSQL

Late 1990s                          Today
History, Compressed




 Buy RAM Wrap RDBMSs   NoSQL   Commercialize

Late 1990s                            Today
History, Compressed




 Buy RAM Wrap RDBMSs   NoSQL   Commercialize

Late 1990s                            Today
The List, So You Don't Yell at Me

                HBase                  CouchDB
                               Neo4j
     Cassandra          Riak            InfoGrid
BerkeleyDB      Voldemort      HyperTable
       Redis
                         MongoDB        Sones
AllegroGraph HyperGraphDB
                                 DEX    FlockDB
             VertexDB
                                    Oracle NoSQL
Tokyo Cabinet       MemcacheDB
The List, So You Don't Yell at Me
      Marcus' Law of Databases:
           HBase             CouchDB
                             Neo4j
     Cassandra        Riak           InfoGrid
       The number of persistence
                      HyperTable
BerkeleyDB doubles every 1.5 years
            Voldemort
    options
       Redis
                       MongoDB       Sones
AllegroGraph HyperGraphDB
                               DEX   FlockDB
           VertexDB
                                 Oracle NoSQL
Tokyo Cabinet    MemcacheDB
The List, So You Don't Yell at Me
      Marcus' Law of Databases:
           HBase             CouchDB
                           Neo4j
     Cassandra      Riak           InfoGrid
       The number of persistence
                      HyperTable
BerkeleyDB doubles every 1.5 years
            Voldemort
    options
        Redis
                     MongoDB       Sones
                  Corollary:
AllegroGraph HyperGraphDB
          I'm sorry I missedDEX FlockDB
                             your
             persistence optionOracle NoSQL
            VertexDB
Tokyo Cabinet    MemcacheDB
interesting properties
  real-world usage
     takeaways
      questions
interesting properties
  real-world usage
     takeaways
      questions
Conventional Wisdom on NoSQL
●
    Key-based data model
●
    Sloppy schema
●
    Single-key transactions
●
    In-app joins
●
    Eventually consistent
Exceptions are More Interesting
●
    Data Model
●
    Query Model
●
    Transactions
●
    Consistency
Data Model
●
    Usually key-based...
    ●
        Column-families
    ●
        Documents
    ●
        Data structures
Data Model
●
    Usually key-based...
    ●
        Column-families
    ●
        Documents
    ●
        Data structures
●
    ...but not always
    ●
        Graph stores
Query Model
●
    Redis: data structure-specific operations
●
    CouchDB, Riak: MapReduce
●
    Cassandra, MongoDB: SQL-like languages, no
    joins or transactions
Query Model
●
    Redis: data structure-specific operations
●
    CouchDB, Riak: MapReduce
●
    Cassandra, MongoDB: SQL-like languages, no
    joins or transactions
●
    Third-party
    ●
        High-level: PigLatin, HiveQL
    ●
        Library: Cascading, Crunch
    ●
        Streaming: Flume, Kafka, S4, Scribe
Transactions
●
    Full ACID for single key
●
    Redis: multi-key single-node transactions
Consistency
●   Strong: Appears that all replicas see all writes
●   Eventual: Replicas may have different, divergent versions
Consistency
●   Strong: Appears that all replicas see all writes
●   Eventual: Replicas may have different, divergent versions

●   Dynamo: strong or eventual (quorum size)
Consistency
  ●   Strong: Appears that all replicas see all writes
  ●   Eventual: Replicas may have different, divergent versions

  ●   Dynamo: strong or eventual (quorum size)


  ●   PNUTs: Timeline consistency
  ●   ...many consistency models!




See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
interesting properties
  real-world usage
     takeaways
      questions
NoSQL Use-cases
●
    Cassandra
●
    HBase
●
    MongoDB
Cassandra
●
    BigTable data model: key   column family
●
    Dynamo sharding model: consistent hashing
●
    Eventual or strong consistency
Cassandra at Netflix
  ●
      Transitioned from Oracle
  ●
      Store customer profiles, customer:movie watch
      log, and detailed usage logging
         Note: no multi-record locking (e.g., bank transfer)
      In-datacenter: 3 replicas, per-app consistency
         read/write quorum = 1, ~1ms latency
         read/write quorum = 2, ~3ms latency


“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft
http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
Cassandra at Netflix
  ●
      Transitioned from Oracle
  ●
      Store customer profiles, customer:movie watch
      log, and detailed usage logging
      ●
          Note: no multi-record locking (e.g., bank transfer)
      In-datacenter: 3 replicas, per-app consistency
          read/write quorum = 1, ~1ms latency
          read/write quorum = 2, ~3ms latency


“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft
http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
Cassandra at Netflix
  ●
      Transitioned from Oracle
  ●
      Store customer profiles, customer:movie watch
      log, and detailed usage logging
      ●
          Note: no multi-record locking (e.g., bank transfer)
  ●
      In-datacenter: 3 replicas, per-app consistency




“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft
http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
Cassandra at Netflix (cont'd)
●
    Benefit: async inter-datacenter replication
●
    Benefit: no downtime for schema changes
●
    Benefit: hooks for live backups
HBase
●
    Data model: key       column family
●
    Sharding model: range partitioning
●
    Strong consistency


    Applications
      Logging events/crawls, storing analytics
      Twitter: replicate data from MySQL, Hadoop analytics
      Facebook Messages
HBase
●
    Data model: key         column family
●
    Sharding model: range partitioning
●
    Strong consistency


●
    Applications
    ●
        Logging events/crawls, storing analytics
    ●
        Twitter: replicate data from MySQL, Hadoop analytics
    ●
        Facebook Messages
HBase for Facebook Messages
●
    Cassandra/Dynamo eventual consistency was
    difficult to program against


    Benefit: simple consistency model
    Benefit: flexible data model
    Benefit: simple sharding, load balancing,
    replication
HBase for Facebook Messages
●
    Cassandra/Dynamo eventual consistency was
    difficult to program against


●
    Benefit: simple consistency model
●
    Benefit: flexible data model
●
    Benefit: simple sharding, load balancing,
    replication
MongoDB
●
    Document-based data model
●
    Range-based partitioning
●
    Consistency depends on how you use it
MongoDB: Two use-cases
●
    Archiving at Craigslist
    ●
        2.2B historical posts, semi-structured
    ●
        Relatively large blobs: avg 2KB, max > 4 MB
MongoDB: Two use-cases
●
    Archiving at Craigslist
    ●
        2.2B historical posts, semi-structured
    ●
        Relatively large blobs: avg 2KB, max > 4 MB
●
    Checkins at Foursquare
    ●
        Geospatial indexing
    ●
        Small location-based updates, sharded on user
interesting properties
  real-world usage
     takeaways
      questions
Takeaways
●
    Developer accessibility
●
    Ecosystem of reuse
●
    Soft spot
Developer Accessibility
Developer Accessibility
    devops




@DEVOPS_BORAT
Developer Accessibility
    devops          self-taught dev




@DEVOPS_BORAT
A Different Five-Minute Rule



What does a first-time user of your system
experience in their first five minutes?
(idea credit: Justin Sheehy, Basho)
Redis
http://simonwillison.net/2009/Oct/22/redis/
PostgreSQL
http://www.postgresql.org/docs/9.1/interactive/sql-createtable.html
Cassandra
http://www.datastax.com/docs/0.7/getting_started/using_cli
Accessibility is Forever
Beyond five minutes:
●
    changing schemas
●
    scaling up
●
    modifying topology
Accessibility is Forever
Beyond five minutes:
●
    changing schemas
●
    scaling up
●
    modifying topology


       An accessible data store will ease
        first-time users in, and support
       experienced users as they expand
Ecosystem of Reuse
Spectrum of Reusability


Monolithic System         System Kernel
Spectrum of Reusability


Monolithic System            System Kernel


MongoDB
 Redis


 Use as
provided
Spectrum of Reusability


Monolithic System            System Kernel


MongoDB      Cassandra
 Redis       Voldemort


 Use as     Pick between
provided    components
Spectrum of Reusability


Monolithic System                System Kernel


MongoDB      Cassandra   ZooKeeper
 Redis       Voldemort    LevelDB


 Use as     Pick between Reusable
provided    components components
Spectrum of Reusability


Monolithic System                System Kernel


MongoDB      Cassandra   ZooKeeper
 Redis       Voldemort    LevelDB     riak_core


 Use as     Pick between Reusable     Build your
provided    components components    own system
And finally, a soft spot
polyglot persistence
            =
right tool for right task
Polyglot persistence: Where's the data?




“HBase at Mendeley,” Dan Harvey
http://www.slideshare.net/danharvey/hbase-at-mendeley
Polyglot persistence: Where's the data?




“HBase at Mendeley,” Dan Harvey
http://www.slideshare.net/danharvey/hbase-at-mendeley
Polyglot persistence: Where's the data?




Aid developers in reasoning about data consistency
          across multiple storage engines
“HBase at Mendeley,” Dan Harvey
http://www.slideshare.net/danharvey/hbase-at-mendeley
interesting properties
  real-world usage
     takeaways
      questions
●
    Systems: Polyglot persistence + data consistency?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Comparisons: Design or implementation?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Comparisons: Design or implementation?
●
    Future: Next-generation NoSQL stores?
Thank You!
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Comparisons: Design or implementation?
●
    Future: Next-generation NoSQL stores?

                    Adam Marcus
            marcua@csail.mit.edu / @marcua
Photo Credits
http://www.flickr.com/photos/dhannah/457765
7955/
http://www.flickr.com/photos/27316226@N02/3
000888100/sizes/m/in/photostream/
http://www.texample.net/tikz/examples/valenti
ne-heart/
http://voltdb.com/sites/default/files/VCE_button
.png

Contenu connexe

Tendances

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)Shun Nakamura
 
Java BigData Full Stack Development (version 2.0)
Java BigData Full Stack Development (version 2.0)Java BigData Full Stack Development (version 2.0)
Java BigData Full Stack Development (version 2.0)Alexey Zinoviev
 
Demystfying nosql databases
Demystfying nosql databasesDemystfying nosql databases
Demystfying nosql databasesMike King
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...Shun Nakamura
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational DatabasesChris Baglieri
 
M|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerM|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerMariaDB plc
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLOSInet
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structureZhichao Liang
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSGlusterFS
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good? Alkin Tezuysal
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 

Tendances (20)

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Java BigData Full Stack Development (version 2.0)
Java BigData Full Stack Development (version 2.0)Java BigData Full Stack Development (version 2.0)
Java BigData Full Stack Development (version 2.0)
 
Demystfying nosql databases
Demystfying nosql databasesDemystfying nosql databases
Demystfying nosql databases
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
M|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerM|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB Server
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
NoSQL Technology
NoSQL TechnologyNoSQL Technology
NoSQL Technology
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQL
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good?
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 

En vedette

The New Rules for the Public Solicitation for Funding
The New Rules for the Public Solicitation for FundingThe New Rules for the Public Solicitation for Funding
The New Rules for the Public Solicitation for Fundingtiearizona
 
A Focus on Efficiency
A Focus on EfficiencyA Focus on Efficiency
A Focus on EfficiencyFirat Demirel
 
Health Connectivity Sylvia
Health Connectivity SylviaHealth Connectivity Sylvia
Health Connectivity Sylviapfrawley
 
Aztie Econ Stimulus Gsp Consulting
Aztie Econ Stimulus   Gsp ConsultingAztie Econ Stimulus   Gsp Consulting
Aztie Econ Stimulus Gsp Consultingtiearizona
 
ethology TiE AZ August-2013
ethology TiE AZ August-2013ethology TiE AZ August-2013
ethology TiE AZ August-2013tiearizona
 
Stimulus Presentation for TiE AZ June 2009
Stimulus Presentation for TiE AZ June 2009Stimulus Presentation for TiE AZ June 2009
Stimulus Presentation for TiE AZ June 2009tiearizona
 
Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009yarapavan
 

En vedette (7)

The New Rules for the Public Solicitation for Funding
The New Rules for the Public Solicitation for FundingThe New Rules for the Public Solicitation for Funding
The New Rules for the Public Solicitation for Funding
 
A Focus on Efficiency
A Focus on EfficiencyA Focus on Efficiency
A Focus on Efficiency
 
Health Connectivity Sylvia
Health Connectivity SylviaHealth Connectivity Sylvia
Health Connectivity Sylvia
 
Aztie Econ Stimulus Gsp Consulting
Aztie Econ Stimulus   Gsp ConsultingAztie Econ Stimulus   Gsp Consulting
Aztie Econ Stimulus Gsp Consulting
 
ethology TiE AZ August-2013
ethology TiE AZ August-2013ethology TiE AZ August-2013
ethology TiE AZ August-2013
 
Stimulus Presentation for TiE AZ June 2009
Stimulus Presentation for TiE AZ June 2009Stimulus Presentation for TiE AZ June 2009
Stimulus Presentation for TiE AZ June 2009
 
Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009
 

Similaire à The NoSQL Ecosystem

Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...Alexey Zinoviev
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
No Sql Introduction
No Sql IntroductionNo Sql Introduction
No Sql IntroductionDingding Ye
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptChris Richardson
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcriptfoliba
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 

Similaire à The NoSQL Ecosystem (20)

Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
 
Drop acid
Drop acidDrop acid
Drop acid
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
 
NoSQL
NoSQLNoSQL
NoSQL
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
No SQL Technologies
No SQL TechnologiesNo SQL Technologies
No SQL Technologies
 
No Sql Introduction
No Sql IntroductionNo Sql Introduction
No Sql Introduction
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
NoSQL
NoSQLNoSQL
NoSQL
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 

Dernier

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 

Dernier (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 

The NoSQL Ecosystem

  • 1. The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua
  • 2. About Me ● Social Computing + Database Systems ● Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications 1 1 http://www.aosabook.org/en/nosql.html
  • 4. History, Compressed Buy RAM Late 1990s Today
  • 5. History, Compressed Buy RAM Wrap RDBMSs Late 1990s Today
  • 6. History, Compressed Buy RAM Wrap RDBMSs NoSQL Late 1990s Today
  • 7. History, Compressed Not Only SQL Buy RAM Wrap RDBMSs NoSQL Late 1990s Today
  • 8. History, Compressed Buy RAM Wrap RDBMSs NoSQL Commercialize Late 1990s Today
  • 9. History, Compressed Buy RAM Wrap RDBMSs NoSQL Commercialize Late 1990s Today
  • 10. The List, So You Don't Yell at Me HBase CouchDB Neo4j Cassandra Riak InfoGrid BerkeleyDB Voldemort HyperTable Redis MongoDB Sones AllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQL Tokyo Cabinet MemcacheDB
  • 11. The List, So You Don't Yell at Me Marcus' Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTable BerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB Sones AllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQL Tokyo Cabinet MemcacheDB
  • 12. The List, So You Don't Yell at Me Marcus' Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTable BerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB Sones Corollary: AllegroGraph HyperGraphDB I'm sorry I missedDEX FlockDB your persistence optionOracle NoSQL VertexDB Tokyo Cabinet MemcacheDB
  • 13. interesting properties real-world usage takeaways questions
  • 14. interesting properties real-world usage takeaways questions
  • 15. Conventional Wisdom on NoSQL ● Key-based data model ● Sloppy schema ● Single-key transactions ● In-app joins ● Eventually consistent
  • 16. Exceptions are More Interesting ● Data Model ● Query Model ● Transactions ● Consistency
  • 17. Data Model ● Usually key-based... ● Column-families ● Documents ● Data structures
  • 18. Data Model ● Usually key-based... ● Column-families ● Documents ● Data structures ● ...but not always ● Graph stores
  • 19. Query Model ● Redis: data structure-specific operations ● CouchDB, Riak: MapReduce ● Cassandra, MongoDB: SQL-like languages, no joins or transactions
  • 20. Query Model ● Redis: data structure-specific operations ● CouchDB, Riak: MapReduce ● Cassandra, MongoDB: SQL-like languages, no joins or transactions ● Third-party ● High-level: PigLatin, HiveQL ● Library: Cascading, Crunch ● Streaming: Flume, Kafka, S4, Scribe
  • 21. Transactions ● Full ACID for single key ● Redis: multi-key single-node transactions
  • 22. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions
  • 23. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions ● Dynamo: strong or eventual (quorum size)
  • 24. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions ● Dynamo: strong or eventual (quorum size) ● PNUTs: Timeline consistency ● ...many consistency models! See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
  • 25. interesting properties real-world usage takeaways questions
  • 26. NoSQL Use-cases ● Cassandra ● HBase ● MongoDB
  • 27. Cassandra ● BigTable data model: key column family ● Dynamo sharding model: consistent hashing ● Eventual or strong consistency
  • 28. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency “Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 29. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency “Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 30. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) ● In-datacenter: 3 replicas, per-app consistency “Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 31. Cassandra at Netflix (cont'd) ● Benefit: async inter-datacenter replication ● Benefit: no downtime for schema changes ● Benefit: hooks for live backups
  • 32. HBase ● Data model: key column family ● Sharding model: range partitioning ● Strong consistency Applications Logging events/crawls, storing analytics Twitter: replicate data from MySQL, Hadoop analytics Facebook Messages
  • 33. HBase ● Data model: key column family ● Sharding model: range partitioning ● Strong consistency ● Applications ● Logging events/crawls, storing analytics ● Twitter: replicate data from MySQL, Hadoop analytics ● Facebook Messages
  • 34. HBase for Facebook Messages ● Cassandra/Dynamo eventual consistency was difficult to program against Benefit: simple consistency model Benefit: flexible data model Benefit: simple sharding, load balancing, replication
  • 35. HBase for Facebook Messages ● Cassandra/Dynamo eventual consistency was difficult to program against ● Benefit: simple consistency model ● Benefit: flexible data model ● Benefit: simple sharding, load balancing, replication
  • 36. MongoDB ● Document-based data model ● Range-based partitioning ● Consistency depends on how you use it
  • 37. MongoDB: Two use-cases ● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB
  • 38. MongoDB: Two use-cases ● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB ● Checkins at Foursquare ● Geospatial indexing ● Small location-based updates, sharded on user
  • 39. interesting properties real-world usage takeaways questions
  • 40. Takeaways ● Developer accessibility ● Ecosystem of reuse ● Soft spot
  • 42. Developer Accessibility devops @DEVOPS_BORAT
  • 43. Developer Accessibility devops self-taught dev @DEVOPS_BORAT
  • 44. A Different Five-Minute Rule What does a first-time user of your system experience in their first five minutes? (idea credit: Justin Sheehy, Basho)
  • 48. Accessibility is Forever Beyond five minutes: ● changing schemas ● scaling up ● modifying topology
  • 49. Accessibility is Forever Beyond five minutes: ● changing schemas ● scaling up ● modifying topology An accessible data store will ease first-time users in, and support experienced users as they expand
  • 51. Spectrum of Reusability Monolithic System System Kernel
  • 52. Spectrum of Reusability Monolithic System System Kernel MongoDB Redis Use as provided
  • 53. Spectrum of Reusability Monolithic System System Kernel MongoDB Cassandra Redis Voldemort Use as Pick between provided components
  • 54. Spectrum of Reusability Monolithic System System Kernel MongoDB Cassandra ZooKeeper Redis Voldemort LevelDB Use as Pick between Reusable provided components components
  • 55. Spectrum of Reusability Monolithic System System Kernel MongoDB Cassandra ZooKeeper Redis Voldemort LevelDB riak_core Use as Pick between Reusable Build your provided components components own system
  • 56. And finally, a soft spot
  • 57. polyglot persistence = right tool for right task
  • 58. Polyglot persistence: Where's the data? “HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley
  • 59. Polyglot persistence: Where's the data? “HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley
  • 60. Polyglot persistence: Where's the data? Aid developers in reasoning about data consistency across multiple storage engines “HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley
  • 61. interesting properties real-world usage takeaways questions
  • 62. Systems: Polyglot persistence + data consistency?
  • 63. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters?
  • 64. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability?
  • 65. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems?
  • 66. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems? ● Comparisons: Design or implementation?
  • 67. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems? ● Comparisons: Design or implementation? ● Future: Next-generation NoSQL stores?
  • 68. Thank You! ● Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems? ● Comparisons: Design or implementation? ● Future: Next-generation NoSQL stores? Adam Marcus marcua@csail.mit.edu / @marcua