SlideShare a Scribd company logo
1 of 52
Download to read offline
NOSQL, COUCHDB
 AND THE CLOUD
    Brad Anderson
       Cloudant




          1
BRAD ANDERSON

• BS   Hotel Management

• Restaurant   Chain Data - econometric modeling, BI/DW

• Open   Source - trac, dsource.org, couchdb

• NOSQLEast     2009

• Cloudant

• http://twitter.com/boorad


                                2
AGENDA

• NOSQL

• COUCHDB

 •   Erlang

• Cloud

 •   Dynamo

 •   MapReduce


                   3
IF YOU
• don’t have ‘medium data’
 or ‘big data’

• arecool with 25K loc
 object-relational mappers

• love   an ops challenge

• areokay paying Uncle
 Larry



                              4
IF YOU
• don’t have ‘medium data’
 or ‘big data’

• arecool with 25K loc
 object-relational mappers

• love   an ops challenge

• areokay paying Uncle
 Larry



                              4
IF NOT




http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg

                                                                             5
RELATIONAL DATABASES


                                                           RDBMS
• Rigid         Schema / ORM fun
                                                           1970-2010
• Scale          Up

• Everything                 is a Nail



http://www.flickr.com/photos/36041246@N00/3419197777/

                                                       6
SCALING RDBMS

•   Replication Sucks
    •   master-slave
    •   master-master
•   Partitioning Sucks
    •   vertical (by functional area)
    •   horizontal (by some key, say time)
•   Caching sort of works


                                        7
OKAY, NOT SCREWED




http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg

                                                                             8
RELATIONAL DATABASES


                                                           RDBMS
• Not          Dead                                        1970-

• Just       have a ‘smell’ for certain tasks




http://www.flickr.com/photos/36041246@N00/3419197777/

                                                       9
NOSQL

        NOT ONLY SQL
   A moniker for different data storage systems
         solving very different problems,
all where a relational database is not the right fit.

                         10
RIGHT FIT

• Google    indexes 400 Pb / day (2007)

• CERN, LHC        generates 100 Pb / sec

• Unique    data created each year (IDC, 2007)
  •   2007 40 Eb

  •   2010 988 Eb (exponential growth)

• Flightcaster


                                     11
FOUR CATEGORIES
• Key/Value     Stores
   •   Dynomite, Voldemort, Tokyo

• Document        Stores
   •   CouchDB, MongoDB

• Column     Stores / BigTable
   •   HBase, Hypertable, Cassandra

• Graph    Databases
   •   Neo4j, AllegroGraph, VertexDB


                                       12
BIG TAKEAWAY

       function




data              data

data              data

data              data

data              data

data              data




                         13
BIG TAKEAWAY

                                                           function

       function                                             data
                                                function              function

                                                 data                  data



                                     function                                    function
data              data
                                      data                                        data
data              data

data              data
                                     function                                    function
data              data
                                      data                                        data
data              data

                                                function              function

                                                 data                  data
                                                           function

                                                            data




                  Bring the function to the data
                                13
14
HUH? ERLANG?


• Programming    Language created at Ericsson (20 yrs
 old now)

• Designed   for scalable, long-lived systems

• Compiled, Functional, Dynamically Typed, Open
 Source




                                  15
3 BIGGIES
• Massively    Concurrent

    •   green threads, very lightweight != os threads


• Seamlessly     Distributed

    •   node = os thread = VM, processes can live anywhere


• Fault Tolerant


    •   99.9999999 = 32ms downtime per year - AXD301


                                         16
Of
                  fi   cia
                         lB
                              et
                                a!




CouchDB
 Apache


          17
COUCHDB
• Schema-free      document database server

• Robust, highly   concurrent, fault-tolerant

• RESTful   JSON API

• Futon   web admin console

• MapReduce     system for generating custom views

• Bi-directional   incremental replication

• couchapp: lightweight
                    HTML+JavaScript apps served directly
 from CouchDB using views to transform JSON
                                   18
FROM INTEREST TO ADOPTION




• 100+   production users          • Active
                                          commercial
•3
                                    development
     books being written
                                   • Rapidly   maturing
• Vibrant, open   community   19
OF THE WEB

   Django may be built for the Web, but
  CouchDB is built of the Web. I've never
seen software that so completely embraces
  the philosophies behind HTTP ... this is
 what the software of the future looks like.


                Jacob Kaplan-Moss
                 October 17 2007

   http://jacobian.org/writing/of-the-web/
                       20
DOCUMENTS




• Documents   are JSON Objects

• Underscore-prefixed   fields are reserved

• Documents   can have binary attachments

• MVCC   _rev deterministically generated from doc content
                               21
ROBUST

• Never   overwrite previously committed data

• In
   the event of a server crash or power failure, just restart
 CouchDB -- there is no “repair”

• Take   snapshots with “cp”

• Configurable levels of durability: can choose to fsync after
 every update, or less often to gain better throughput



                                22
CONCURRENT

• Erlang
       approach: lightweight processes to model the natural
 concurrency in a problem

• For   CouchDB that means one process per TCP connection

• Lock-free
          architecture; each process works with an MVCC
 snapshot of a DB.

• Performance   degrades gracefully under heavy concurrent load


                               23
REST API
• Create
 PUT /mydb/mydocid

• Retrieve
 GET /mydb/mydocid

• Update
 PUT /mydb/mydocid

• Delete
 DELETE /mydb/mydocid

                        24
25
VIEWS
• Custom, persistent   representations of document data

• “Closeto the metal” -- no dynamic queries in production, so
 you know exactly what you’re getting

• Generated using MapReduce functions written in JavaScript
 (and other languages)

     view must have a map function and may also have a
• Each
 reduce function

• Leverages   view collation, rich view query API
                                 26
DOCUMENTS BY AUTHOR




         27
INCREMENTAL
• Computing   a view can be expensive, so CouchDB saves the
 result in a B-tree and keeps it up-to-date

• Leafnodes store map results, inner nodes store reductions of
 children




 http://horicky.blogspot.com/2008/10/couchdb-implementation.html
                               28
REPLICATION
• Peer-based, bi-directional   replication using normal HTTP calls

• Mediated  by a replicator process which can live on the
 source, target, or somewhere else entirely

• Replicate
          a subset of documents in a DB meeting criteria
 defined in a custom filter function (coming soon)

• Applications   (_design documents) replicate along with the
 data

• Ideal   for offline applications -- “ground computing”
                                   29
CLOUD




  30
SHOWROOM
 A cluster of couches




          31
Help me, this name sucks!




SHOWROOM
 A cluster of couches




          31
ARCHITECTURE

• Each   cluster is a ring of nodes (Dynamo, Dynomite)

• Any    node can handle request (consistent hashing)

  • O(1), with   a hop

• nodes    own partitions (ring is divided)

• data   are distributed evenly across partitions and replicas

• mapreduce     functions are passed to nodes for execution
RESEARCH


• Google’s   MapReduce, http://bit.ly/bJbyq5

• Amazon’s   Dynamo, http://bit.ly/b7FlsN

• CAP   theorem, http://bit.ly/bERr2H
CLUSTER CONTROLS

•N   - Replication
                     Q
•Q   - Partitions = 2

•R   - Read Quorum

•W   - Write Quorum



• These   constants define the cluster
N


                                              Consistency
Throughput
                                               Durability




    N = Number of replicas per item stored in cluster
Q


Throughput                                    Scalability




     2^Q = Number of partitions (shards) in cluster
           T = Number of nodes in cluster
       2^Q / T = Number of partitions per node
R


Latency                                      Consistency




          R = Number of successful reads before
               returning value(s) to client
W


Latency                                   Durability




      W = Number of successful writes before
           returning ‘success’ to client
Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
     Y                                                      D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2

                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2

                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2


            24
                                Node 1

                                                         No
                                                                                                 node down
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
RESULT

• For   standalone or cluster
  •   one REST API

  •   one URL

• For   cluster
  •   redundant data

  •   distributed queries

  •   scale out

                                40
DEMO?




  41
QUESTIONS?
CREDITS



• Emil    Eifrem, http://bit.ly/5D40WQ

• Sergio    Bossa, http://bit.ly/c9UoRZ

• Cliff   Moon, http://bit.ly/bX887c




                                   43

More Related Content

What's hot

Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
 
Free Software and the Future of Database Technology
Free Software and the Future of Database TechnologyFree Software and the Future of Database Technology
Free Software and the Future of Database Technologyelliando dias
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopAllen Wittenauer
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedInAllen Wittenauer
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Extending Spring for Custom Usage
Extending Spring for Custom UsageExtending Spring for Custom Usage
Extending Spring for Custom UsageJoshua Long
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012Weiwei Chen
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Hadoop: A Hands-on Introduction
Hadoop: A Hands-on IntroductionHadoop: A Hands-on Introduction
Hadoop: A Hands-on IntroductionClaudio Martella
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra
 
Google App Engine, Groovy and Gaelyk presentation at the Paris JUG
Google App Engine, Groovy and Gaelyk presentation at the Paris JUGGoogle App Engine, Groovy and Gaelyk presentation at the Paris JUG
Google App Engine, Groovy and Gaelyk presentation at the Paris JUGGuillaume Laforge
 
What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebookAjen 陳
 

What's hot (20)

Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Free Software and the Future of Database Technology
Free Software and the Future of Database TechnologyFree Software and the Future of Database Technology
Free Software and the Future of Database Technology
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache Hadoop
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Extending Spring for Custom Usage
Extending Spring for Custom UsageExtending Spring for Custom Usage
Extending Spring for Custom Usage
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Hadoop: A Hands-on Introduction
Hadoop: A Hands-on IntroductionHadoop: A Hands-on Introduction
Hadoop: A Hands-on Introduction
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 
Google App Engine, Groovy and Gaelyk presentation at the Paris JUG
Google App Engine, Groovy and Gaelyk presentation at the Paris JUGGoogle App Engine, Groovy and Gaelyk presentation at the Paris JUG
Google App Engine, Groovy and Gaelyk presentation at the Paris JUG
 
What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebook
 

Viewers also liked

Trinket 2014
Trinket 2014Trinket 2014
Trinket 2014franjana
 
Geoffrey Smith Resume CV 2016
Geoffrey Smith Resume CV 2016Geoffrey Smith Resume CV 2016
Geoffrey Smith Resume CV 2016Geoffrey Smith
 
Scaling Social Media Across the Enterprise
Scaling Social Media Across the EnterpriseScaling Social Media Across the Enterprise
Scaling Social Media Across the EnterpriseJordan Viator Slabaugh
 
Amer society lecture2_america_compared
Amer society lecture2_america_comparedAmer society lecture2_america_compared
Amer society lecture2_america_comparedjdubrow2000
 
17 festival balões
17 festival balões17 festival balões
17 festival balõesESEP Jornal
 
Case record...Paraneoplastic leukoencephalopathy
Case record...Paraneoplastic leukoencephalopathyCase record...Paraneoplastic leukoencephalopathy
Case record...Paraneoplastic leukoencephalopathyProfessor Yasser Metwally
 

Viewers also liked (6)

Trinket 2014
Trinket 2014Trinket 2014
Trinket 2014
 
Geoffrey Smith Resume CV 2016
Geoffrey Smith Resume CV 2016Geoffrey Smith Resume CV 2016
Geoffrey Smith Resume CV 2016
 
Scaling Social Media Across the Enterprise
Scaling Social Media Across the EnterpriseScaling Social Media Across the Enterprise
Scaling Social Media Across the Enterprise
 
Amer society lecture2_america_compared
Amer society lecture2_america_comparedAmer society lecture2_america_compared
Amer society lecture2_america_compared
 
17 festival balões
17 festival balões17 festival balões
17 festival balões
 
Case record...Paraneoplastic leukoencephalopathy
Case record...Paraneoplastic leukoencephalopathyCase record...Paraneoplastic leukoencephalopathy
Case record...Paraneoplastic leukoencephalopathy
 

Similar to DevNation Atlanta

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the moveCodemotion
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDBDATAVERSITY
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source DatabasesIvan Zoratti
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalramazan fırın
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Gavin Heavyside
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 

Similar to DevNation Atlanta (20)

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
A peek into the future
A peek into the futureA peek into the future
A peek into the future
 
MongoDB
MongoDBMongoDB
MongoDB
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source Databases
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-final
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 

More from boorad

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011boorad
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008boorad
 

More from boorad (11)

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

DevNation Atlanta

  • 1. NOSQL, COUCHDB AND THE CLOUD Brad Anderson Cloudant 1
  • 2. BRAD ANDERSON • BS Hotel Management • Restaurant Chain Data - econometric modeling, BI/DW • Open Source - trac, dsource.org, couchdb • NOSQLEast 2009 • Cloudant • http://twitter.com/boorad 2
  • 3. AGENDA • NOSQL • COUCHDB • Erlang • Cloud • Dynamo • MapReduce 3
  • 4. IF YOU • don’t have ‘medium data’ or ‘big data’ • arecool with 25K loc object-relational mappers • love an ops challenge • areokay paying Uncle Larry 4
  • 5. IF YOU • don’t have ‘medium data’ or ‘big data’ • arecool with 25K loc object-relational mappers • love an ops challenge • areokay paying Uncle Larry 4
  • 7. RELATIONAL DATABASES RDBMS • Rigid Schema / ORM fun 1970-2010 • Scale Up • Everything is a Nail http://www.flickr.com/photos/36041246@N00/3419197777/ 6
  • 8. SCALING RDBMS • Replication Sucks • master-slave • master-master • Partitioning Sucks • vertical (by functional area) • horizontal (by some key, say time) • Caching sort of works 7
  • 10. RELATIONAL DATABASES RDBMS • Not Dead 1970- • Just have a ‘smell’ for certain tasks http://www.flickr.com/photos/36041246@N00/3419197777/ 9
  • 11. NOSQL NOT ONLY SQL A moniker for different data storage systems solving very different problems, all where a relational database is not the right fit. 10
  • 12. RIGHT FIT • Google indexes 400 Pb / day (2007) • CERN, LHC generates 100 Pb / sec • Unique data created each year (IDC, 2007) • 2007 40 Eb • 2010 988 Eb (exponential growth) • Flightcaster 11
  • 13. FOUR CATEGORIES • Key/Value Stores • Dynomite, Voldemort, Tokyo • Document Stores • CouchDB, MongoDB • Column Stores / BigTable • HBase, Hypertable, Cassandra • Graph Databases • Neo4j, AllegroGraph, VertexDB 12
  • 14. BIG TAKEAWAY function data data data data data data data data data data 13
  • 15. BIG TAKEAWAY function function data function function data data function function data data data data data data data data function function data data data data data data function function data data function data Bring the function to the data 13
  • 16. 14
  • 17. HUH? ERLANG? • Programming Language created at Ericsson (20 yrs old now) • Designed for scalable, long-lived systems • Compiled, Functional, Dynamically Typed, Open Source 15
  • 18. 3 BIGGIES • Massively Concurrent • green threads, very lightweight != os threads • Seamlessly Distributed • node = os thread = VM, processes can live anywhere • Fault Tolerant • 99.9999999 = 32ms downtime per year - AXD301 16
  • 19. Of fi cia lB et a! CouchDB Apache 17
  • 20. COUCHDB • Schema-free document database server • Robust, highly concurrent, fault-tolerant • RESTful JSON API • Futon web admin console • MapReduce system for generating custom views • Bi-directional incremental replication • couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON 18
  • 21. FROM INTEREST TO ADOPTION • 100+ production users • Active commercial •3 development books being written • Rapidly maturing • Vibrant, open community 19
  • 22. OF THE WEB Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP ... this is what the software of the future looks like. Jacob Kaplan-Moss October 17 2007 http://jacobian.org/writing/of-the-web/ 20
  • 23. DOCUMENTS • Documents are JSON Objects • Underscore-prefixed fields are reserved • Documents can have binary attachments • MVCC _rev deterministically generated from doc content 21
  • 24. ROBUST • Never overwrite previously committed data • In the event of a server crash or power failure, just restart CouchDB -- there is no “repair” • Take snapshots with “cp” • Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput 22
  • 25. CONCURRENT • Erlang approach: lightweight processes to model the natural concurrency in a problem • For CouchDB that means one process per TCP connection • Lock-free architecture; each process works with an MVCC snapshot of a DB. • Performance degrades gracefully under heavy concurrent load 23
  • 26. REST API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid • Update PUT /mydb/mydocid • Delete DELETE /mydb/mydocid 24
  • 27. 25
  • 28. VIEWS • Custom, persistent representations of document data • “Closeto the metal” -- no dynamic queries in production, so you know exactly what you’re getting • Generated using MapReduce functions written in JavaScript (and other languages) view must have a map function and may also have a • Each reduce function • Leverages view collation, rich view query API 26
  • 30. INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Leafnodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html 28
  • 31. REPLICATION • Peer-based, bi-directional replication using normal HTTP calls • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon) • Applications (_design documents) replicate along with the data • Ideal for offline applications -- “ground computing” 29
  • 33. SHOWROOM A cluster of couches 31
  • 34. Help me, this name sucks! SHOWROOM A cluster of couches 31
  • 35. ARCHITECTURE • Each cluster is a ring of nodes (Dynamo, Dynomite) • Any node can handle request (consistent hashing) • O(1), with a hop • nodes own partitions (ring is divided) • data are distributed evenly across partitions and replicas • mapreduce functions are passed to nodes for execution
  • 36. RESEARCH • Google’s MapReduce, http://bit.ly/bJbyq5 • Amazon’s Dynamo, http://bit.ly/b7FlsN • CAP theorem, http://bit.ly/bERr2H
  • 37. CLUSTER CONTROLS •N - Replication Q •Q - Partitions = 2 •R - Read Quorum •W - Write Quorum • These constants define the cluster
  • 38. N Consistency Throughput Durability N = Number of replicas per item stored in cluster
  • 39. Q Throughput Scalability 2^Q = Number of partitions (shards) in cluster T = Number of nodes in cluster 2^Q / T = Number of partitions per node
  • 40. R Latency Consistency R = Number of successful reads before returning value(s) to client
  • 41. W Latency Durability W = Number of successful writes before returning ‘success’ to client
  • 42. Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 43. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 44. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 45. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 46. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 47. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 48. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 24 Node 1 No node down de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 49. RESULT • For standalone or cluster • one REST API • one URL • For cluster • redundant data • distributed queries • scale out 40
  • 52. CREDITS • Emil Eifrem, http://bit.ly/5D40WQ • Sergio Bossa, http://bit.ly/c9UoRZ • Cliff Moon, http://bit.ly/bX887c 43

Editor's Notes

  1. You are not Google
  2. Right Tool for the Job
  3. small companies huge data information from data
  4. 20 yrs old, open source since mid-90’s, iirc. like a mobile telephone grid compiled (but to bytecode for a VM) open source
  5. Cluster Of Unreliable Commodity Hardware