SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Inside the Cassandra Database




                                 Jonathan Ellis
                            jbellis@riptano.com / @spyced
Saturday, August 21, 2010
Saturday, August 21, 2010
Why NoSQL?



    ✤    Relational databases don’t scale

    ✤    The relational model maps poorly to some problems

    ✤    Relational databases are slow




Saturday, August 21, 2010
Myth 1



    ✤    “NoSQL is for people who don’t understand {SQL,
         denormalization, query tuning, ...}”
          ✤    Similarly: “Only users of [database X] are turning to NoSQL
               databases, because X sucks.”




Saturday, August 21, 2010
eBay: NoSQL pioneer

    ✤    “BASE is diametrically opposed to ACID. Where ACID is
         pessimistic and forces consistency at the end of every
         operation, BASE is optimistic and accepts that the
         database consistency will be in a state of flux. Although
         this sounds impossible to cope with, in reality it is quite
         manageable and leads to levels of scalability that cannot
         be obtained with ACID.”
          ✤    --BASE: An Acid Alternative, Dan Pritchett, eBay

          ✤    See also: The eBay Architecture, Randy Shoup and Dan Pritchett

Saturday, August 21, 2010
(“The eBay Architecture,” Randy Shoup and Dan Pritchett)

Saturday, August 21, 2010
Myth 2




    ✤    “NoSQL is nothing new because we had key/value
         databases like bdb years ago.”




Saturday, August 21, 2010
Myth 3




    ✤    “Only huge web 2.0 sites like Facebook and Twitter care
         about scalability.”




Saturday, August 21, 2010
Cassandra



    ✤    Relational databases don’t scale

    ✤    The relational model maps poorly to some problems

    ✤    Relational databases are slow




Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Use cases

    ✤    Digital Reasoning: NLP + entity analytics

    ✤    OpenX: largest publisher-side ad network in the world

    ✤    Cloudkick: performance data & aggregation

    ✤    SimpleGEO: location-as-API

    ✤    Ooyala: video analytics and business intelligence

    ✤    ngmoco: massively multiplayer game worlds

Saturday, August 21, 2010
Cassandra



    ✤    Relational databases don’t scale

    ✤    The relational model maps poorly to some problems

    ✤    Relational databases are slow




Saturday, August 21, 2010
Saturday, August 21, 2010
Cassandra memory-aware index


                            <key 127>    <row data 0>
                            <key 255>    <row data 1>
                               ...            ...
                                        <row data 127>
                                              ...
                                        <row data 255>
                                              ...




Saturday, August 21, 2010
Reader
           Writer           Memtable




               Commitlog




The Log-Structured Merge-Tree,
Bigtable: A Distributed Storage System
for Structured Data


Saturday, August 21, 2010
Saturday, August 21, 2010
Durable



    ✤    Write to commitlog
          ✤    fsync is cheap since it’s append-only

    ✤    Write to memtable

    ✤    [amortized] flush memtable to sstable



Saturday, August 21, 2010
Scaling




Saturday, August 21, 2010
W   A




                            T

                                    L




Saturday, August 21, 2010
W   A




                                        F



                            T

                                    L




Saturday, August 21, 2010
W           A




                                                F
                                    (A-L]


                            T

                                            L




Saturday, August 21, 2010
W           A




                                    (A-F]       F



                            T
                                    (F-L]
                                            L




Saturday, August 21, 2010
Key “C”

                                W   A




                                        F



                            T

                                    L




Saturday, August 21, 2010
Reliability



    ✤    No single points of failure, clean design

    ✤    Multiple datacenters

    ✤    Monitorable




Saturday, August 21, 2010
Saturday, August 21, 2010
Some headlines


    ✤    “Resyncing Broken MySQL Replication”

    ✤    “How To Repair MySQL Replication”

    ✤    “Fixing Broken MySQL Database Replication”

    ✤    “Replication on Linux broken after db restore”

    ✤    “MySQL :: Repairing broken replication”


Saturday, August 21, 2010
The opposite of heroes




    ✤    “If your software wakes someone up at 4 AM to fix it,
         you’re doing it wrong.”




Saturday, August 21, 2010
Good architecture solves multiple
    problems at once


    ✤    Availability in single datacenter

    ✤    Availablility in multiple datacenters

    ✤    Built-in repair rather than an afterthought
          ✤    or worse, having to restart replication entirely




Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                            A
                                W




                            U
                                                    F




                                T
                                                L
                                        P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                Key “C”
                                        A
                                W




                            U
                                            F



                                T
                                        L
                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                        X
                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                       Key “C”
                                           A
                                W




                            U
                                                   F




                                           X
                                T   hint
                                               L

                                    P



Saturday, August 21, 2010
Y
                                                       Key “C”
                                           A
                                W




                            U
                                                   F




                                           X
                                T   hint
                                               L

                                    P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Saturday, August 21, 2010
Eventual Tuneable consistency



    ✤    ONE, QUORUM, ALL

    ✤    R+W>N

    ✤    Choose availability vs consistency (and latency)




Saturday, August 21, 2010
Monitorable




Saturday, August 21, 2010
Events




Saturday, August 21, 2010
1500
                                                                                       mails sent

          1125



            750



            375



                 0
                            Jan     Feb             Apr            May       Jun         Jul
                            (0.5)   (0.5.1)   Mar   (0.6, 0.6.1)   (0.6.2)   (0.6.3)     (0.6.4)




Saturday, August 21, 2010
Data model




    ✤    Twitter: “Fifteen months ago, it took two weeks to
         perform ALTER TABLE on the statuses [tweets] table.”




Saturday, August 21, 2010
ColumnFamilies
                            Columns




Saturday, August 21, 2010
Saturday, August 21, 2010
Cassandra “indexes” are
    materialized views
 SELECT * FROM tweets
 WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?)
 ORDER BY time_tweeted DESC
 LIMIT 40




             followers                                           timeline

                                                      ?
                                     SORT




   ?                                                      uuid:tweet



                            tweets

Saturday, August 21, 2010
Analytics in Cassandra




    ✤    @afex: “Cassandra + Pig (Hadoop) is very exciting. A 7
         line script to analyze data from my entire cluster
         transparently, with no ETL? Yes, please”




Saturday, August 21, 2010
TaskTracker




                            JobTracker




Saturday, August 21, 2010
Coming in 0.7 [beta1 released 13/8]

    ✤    Secondary indexes and expressions

    ✤    Large (> 2GB) rows

    ✤    More control over replica placement

    ✤    Online schema changes

    ✤    Flow control

    ✤    Dynamic routing around slow nodes

Saturday, August 21, 2010
More




    ✤    http://wiki.apache.org/cassandra/
         ArticlesAndPresentations

    ✤    http://wiki.apache.org/cassandra/ArchitectureInternals




Saturday, August 21, 2010
Questions




Saturday, August 21, 2010

Contenu connexe

En vedette

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 

En vedette (10)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
NoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBNoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DB
 
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
 

Plus de jbellis

Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
jbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
jbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
jbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
jbellis
 

Plus de jbellis (20)

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Cassandra FrOSCon 10

  • 1. Inside the Cassandra Database Jonathan Ellis jbellis@riptano.com / @spyced Saturday, August 21, 2010
  • 3. Why NoSQL? ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
  • 4. Myth 1 ✤ “NoSQL is for people who don’t understand {SQL, denormalization, query tuning, ...}” ✤ Similarly: “Only users of [database X] are turning to NoSQL databases, because X sucks.” Saturday, August 21, 2010
  • 5. eBay: NoSQL pioneer ✤ “BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.” ✤ --BASE: An Acid Alternative, Dan Pritchett, eBay ✤ See also: The eBay Architecture, Randy Shoup and Dan Pritchett Saturday, August 21, 2010
  • 6. (“The eBay Architecture,” Randy Shoup and Dan Pritchett) Saturday, August 21, 2010
  • 7. Myth 2 ✤ “NoSQL is nothing new because we had key/value databases like bdb years ago.” Saturday, August 21, 2010
  • 8. Myth 3 ✤ “Only huge web 2.0 sites like Facebook and Twitter care about scalability.” Saturday, August 21, 2010
  • 9. Cassandra ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
  • 16. Use cases ✤ Digital Reasoning: NLP + entity analytics ✤ OpenX: largest publisher-side ad network in the world ✤ Cloudkick: performance data & aggregation ✤ SimpleGEO: location-as-API ✤ Ooyala: video analytics and business intelligence ✤ ngmoco: massively multiplayer game worlds Saturday, August 21, 2010
  • 17. Cassandra ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
  • 19. Cassandra memory-aware index <key 127> <row data 0> <key 255> <row data 1> ... ... <row data 127> ... <row data 255> ... Saturday, August 21, 2010
  • 20. Reader Writer Memtable Commitlog The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data Saturday, August 21, 2010
  • 22. Durable ✤ Write to commitlog ✤ fsync is cheap since it’s append-only ✤ Write to memtable ✤ [amortized] flush memtable to sstable Saturday, August 21, 2010
  • 24. W A T L Saturday, August 21, 2010
  • 25. W A F T L Saturday, August 21, 2010
  • 26. W A F (A-L] T L Saturday, August 21, 2010
  • 27. W A (A-F] F T (F-L] L Saturday, August 21, 2010
  • 28. Key “C” W A F T L Saturday, August 21, 2010
  • 29. Reliability ✤ No single points of failure, clean design ✤ Multiple datacenters ✤ Monitorable Saturday, August 21, 2010
  • 31. Some headlines ✤ “Resyncing Broken MySQL Replication” ✤ “How To Repair MySQL Replication” ✤ “Fixing Broken MySQL Database Replication” ✤ “Replication on Linux broken after db restore” ✤ “MySQL :: Repairing broken replication” Saturday, August 21, 2010
  • 32. The opposite of heroes ✤ “If your software wakes someone up at 4 AM to fix it, you’re doing it wrong.” Saturday, August 21, 2010
  • 33. Good architecture solves multiple problems at once ✤ Availability in single datacenter ✤ Availablility in multiple datacenters ✤ Built-in repair rather than an afterthought ✤ or worse, having to restart replication entirely Saturday, August 21, 2010
  • 34. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 35. Y A W U F T L P Saturday, August 21, 2010
  • 36. Y A W U F T L P Saturday, August 21, 2010
  • 37. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 38. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 39. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 40. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 41. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 42. Y Key “C” A W U F X T L P Saturday, August 21, 2010
  • 43. Y Key “C” A W U F X T hint L P Saturday, August 21, 2010
  • 44. Y Key “C” A W U F X T hint L P Saturday, August 21, 2010
  • 45. Y A W U F T L P Saturday, August 21, 2010
  • 46. Y A W U F T L P Saturday, August 21, 2010
  • 47. Y A W U F T L P Saturday, August 21, 2010
  • 49. Eventual Tuneable consistency ✤ ONE, QUORUM, ALL ✤ R+W>N ✤ Choose availability vs consistency (and latency) Saturday, August 21, 2010
  • 52. 1500 mails sent 1125 750 375 0 Jan Feb Apr May Jun Jul (0.5) (0.5.1) Mar (0.6, 0.6.1) (0.6.2) (0.6.3) (0.6.4) Saturday, August 21, 2010
  • 53. Data model ✤ Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.” Saturday, August 21, 2010
  • 54. ColumnFamilies Columns Saturday, August 21, 2010
  • 56. Cassandra “indexes” are materialized views SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?) ORDER BY time_tweeted DESC LIMIT 40 followers timeline ? SORT ? uuid:tweet tweets Saturday, August 21, 2010
  • 57. Analytics in Cassandra ✤ @afex: “Cassandra + Pig (Hadoop) is very exciting. A 7 line script to analyze data from my entire cluster transparently, with no ETL? Yes, please” Saturday, August 21, 2010
  • 58. TaskTracker JobTracker Saturday, August 21, 2010
  • 59. Coming in 0.7 [beta1 released 13/8] ✤ Secondary indexes and expressions ✤ Large (> 2GB) rows ✤ More control over replica placement ✤ Online schema changes ✤ Flow control ✤ Dynamic routing around slow nodes Saturday, August 21, 2010
  • 60. More ✤ http://wiki.apache.org/cassandra/ ArticlesAndPresentations ✤ http://wiki.apache.org/cassandra/ArchitectureInternals Saturday, August 21, 2010