SlideShare une entreprise Scribd logo
1  sur  42
Cassandra from the trenches:
      migrating Netflix
          Jason Brown
    Senior Software Engineer
             Netflix
       @jasobrown jasedbrown@gmail.com

      http://www.linkedin.com/in/jasedbrown
History, 2008
• In the beginning, there was the webapp
  – And a database
  – In one datacenter
• Then we grew, and grew, and grew
  – More databases, all conjoined
  – Database links, PL/SQL, Materialized views
  – Multi-Master replication (MMR)
• Then it melted down
  – Couldn’t ship DVDs for ~3 days
History, 2009
• Time to rethink everything
  – Abandon our datacenter
  – Ditch the monolithic webapp
  – Migrate single point of failure database to …
History, 2010
• SimpleDB/S3
  – Managed by Amazon, not us
  – Got us started with NoSQL in the cloud
  – Problems:
     • High latency, rate limiting (throttling)
     • (no) auto-sharding, no backups
Shiny new toy (2011)
• We switched to Cassandra
  – Similar to SimpleDB, with limits removed
  – Dynamo-model appealed to us
  – Column-based, key-value data model seemed
    sufficient for most needs
  – Performance looked great (rudimentary tests)
Data Modeling -
  Where the rubber meets the road
About Netflix’s AB Testing
• Basic concepts
  – Test – An experiment where several competing
    behaviors are implemented and compared
  – Cell – different experiences within a test that are
    being compared against each other
  – Allocation – a customer-specific assignment to a
    cell within a test
Data Modeling - background
• AB has two sets of data
  – metadata about tests
  – allocations
AB - allocations
• Single table to hold allocations
  – Currently at > 1 billion records
  – Plus indices!
• One record for every test that every customer
  is allocated into
• Unique constraint on customer/test
AB – relational model
• Typical parent-child table relationship
• Not updated frequently, so service can cache
Data modeling in Cassandra
• Every where I looked, the Internet told me to
  understand my data use patterns

• Identify the questions that you need to
  answer from the data

• Know how to query your data set and make
  the persistence model match
Identifying the AB questions that need
            to be answered
• High traffic
  – get all allocations for a customer
• Low traffic
  – get count of customers in test/cell
  – find all customers in a test/cell
  – find all customers in a test who were added within
    a date range
Modeling allocations in Cassandra
• Read all allocations for a customer
  – as fast as possible
• Find all of customers in a test/cell
  – reverse index
• Get count of customers in test/cell
  – count the entries in the reverse index
Denormalization - HOWTO
• No real world examples
  – ‘Normalization is for sissies’, Pat Helland


• Denormalize allocations per customer
  – Trivial with a schema-less database
Denormalized allocations
• normalized data




• denormalized (sparse) data
Implementing allocations
• As allocation for a customer has a handful of
  data points, they logically can be grouped
  together

• Avoided blobs, json or otherwise

• Using a standard column family, with
  composite columns
Composite columns
• Composite columns are sorted by each ‘token’
  in name

• Allocation column naming convention
  – <testId>:<field>
  – 42:cell = 2
  – 42:enabled = Y
  – 47:cell = 0
  – 47:enabled = Y
Modeling AB metadata in cassandra
• Explored several models, including json
  blobs, spreading across multiple CFs, differing
  degrees of denormalization
• Reverse index to identify all tests for loading
Implementing metadata
• One CF, one row for all test’s data
  – Every data point is a column – no blobs
• Composite columns
  – type:id:field
     • Types = base info, cells, allocation plans
     • Id = cell number, allocation plan (gu)id
     • Field = type-specific
        – Base info = test name, description, enabled
        – Cell’s name / description
        – Plan’s start/end dates, country to allocate to
Implementing indices
• Cassandra’s secondary indices vs. hand-built
  and maintained alternate indices

• Secondary indices work great on uniform data
  between rows

• But sparse column data not easy to index
Hand-built Indices, 1

• Reverse index
  – Test/cell (key) to custIds (columns)
     • Column value is timestamp
• Updating index when allocating a customer
  into test (double write)
Hand-built indices, 2
• Counter column family
  – Test/cell to count of customers in test columns
  – Mutate on allocating a customer into test
• Counters are not idempotent!
• Mutates need to write to every node that
  hosts that key
Index rebuilding
• To keep the index consistent, it needs to be
  rebuilt occasionally
• Even Oracle needs to have it’s indices rebuilt
Into the real world
Cassandra java clients
• Hector
  – github.com/rantav/hector
• Astyanax
  – Developed at Netflix (Eran Landau)
  – github.com/netflix
• Cassie (scala)
  – Developed at Twitter
  – https://github.com/twitter/cassie
Astyanax features
•   Clean object model
•   Node discovery
•   Node quarantine
•   Request failover/retry
•   JMX Monitoring
•   Connection pooling
•   Future execution
Astyanax code example,1
Astyanax code example, 2
Astyanax code example, 3
Astyanax connection pools, 1
• Round Robin uses coordinator node
Astyanax connection pooling, 2
• Token aware knows where the data resides for
  point reads
Astyanax latency aware
• Samples response times from Cassandra
  nodes
• Favors faster responding nodes in pool
• Use with token aware connection pooling
Allocation mutates
• AB allocations are immutable, so we need to
  prevent mutating
• Oracle - unique table constraint
• Cassandra - read before write
  – data race!
Running cassandra
• Compactions happen
  – how Cassandra is maintained
  – Mutations are written to memory (Memtable)
  – Flushed to disk (SSTable) on triggering threshold
  – Eventually, Cassandra merges SSTables as data for
    individual rows becomes scattered
Compactions, 2
• Latency spikes happen, especially on read-
  heavy systems
  – Everything can slow down
  – Throttling in newer Cassandra versions helps
  – Astyanax avoids this problem with latency
    awareness
Tunings, 1
• Key and row caches
  – Left unbounded can consume JVM memory
    needed for normal work
  – Latencies will spike as the JVM fights for free
    memory
  – Off-heap row cache is better but still maintains
    data structures on-heap
Tunings, 2
• mmap() as in-memory cache
  – When the Cassandra process is terminated, mmap
    pages are returned to the free list
• Row cache helps at startup
Tunings, 3
• Sizing memtable flushes for optimizing
  compactions
  – Easier when writes are uniformly
    distributed, timewise – easier to reason about
    flush patterns
  – Best to optimize flushes based on memtable
    size, not time
Tunings, 4
• Sharding
  – If a single row has disproportionately high
    gets/mutates, the nodes holding it will become
    hot spots
  – If a row grows too large, it can’t fit into memory
Takeaways
• Netflix is making all of our components
  distributed and fault tolerant as we grow
  domestically and internationally.

• Cassandra is a core piece of our cloud
  infrastructure.

• Netflix is open sourcing it’s cloud
  platform, including Cassandra support
終わり(The End)


• Q&A



        @jasobrown jasedbrown@gmail.com

        http://www.linkedin.com/in/jasedbrown
References
• Pat Helland, ‘Normalization Is for Sissies”
  http://blogs.msdn.com/b/pathelland/archive/
  2007/07/23/normalization-is-for-sissies.aspx

Contenu connexe

Tendances

6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
Rick Perry
 

Tendances (20)

6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortals
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
NoSql
NoSqlNoSql
NoSql
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Chapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsChapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortals
 
No sql or Not only SQL
No sql or Not only SQLNo sql or Not only SQL
No sql or Not only SQL
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 

Similaire à Cassandra from the trenches: migrating Netflix (update)

Cassandra
CassandraCassandra
Cassandra
exsuns
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 

Similaire à Cassandra from the trenches: migrating Netflix (update) (20)

Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
NoSql
NoSqlNoSql
NoSql
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Cassandra from the trenches: migrating Netflix (update)

  • 1. Cassandra from the trenches: migrating Netflix Jason Brown Senior Software Engineer Netflix @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  • 2. History, 2008 • In the beginning, there was the webapp – And a database – In one datacenter • Then we grew, and grew, and grew – More databases, all conjoined – Database links, PL/SQL, Materialized views – Multi-Master replication (MMR) • Then it melted down – Couldn’t ship DVDs for ~3 days
  • 3. History, 2009 • Time to rethink everything – Abandon our datacenter – Ditch the monolithic webapp – Migrate single point of failure database to …
  • 4. History, 2010 • SimpleDB/S3 – Managed by Amazon, not us – Got us started with NoSQL in the cloud – Problems: • High latency, rate limiting (throttling) • (no) auto-sharding, no backups
  • 5. Shiny new toy (2011) • We switched to Cassandra – Similar to SimpleDB, with limits removed – Dynamo-model appealed to us – Column-based, key-value data model seemed sufficient for most needs – Performance looked great (rudimentary tests)
  • 6. Data Modeling - Where the rubber meets the road
  • 7. About Netflix’s AB Testing • Basic concepts – Test – An experiment where several competing behaviors are implemented and compared – Cell – different experiences within a test that are being compared against each other – Allocation – a customer-specific assignment to a cell within a test
  • 8. Data Modeling - background • AB has two sets of data – metadata about tests – allocations
  • 9. AB - allocations • Single table to hold allocations – Currently at > 1 billion records – Plus indices! • One record for every test that every customer is allocated into • Unique constraint on customer/test
  • 10. AB – relational model • Typical parent-child table relationship • Not updated frequently, so service can cache
  • 11. Data modeling in Cassandra • Every where I looked, the Internet told me to understand my data use patterns • Identify the questions that you need to answer from the data • Know how to query your data set and make the persistence model match
  • 12. Identifying the AB questions that need to be answered • High traffic – get all allocations for a customer • Low traffic – get count of customers in test/cell – find all customers in a test/cell – find all customers in a test who were added within a date range
  • 13. Modeling allocations in Cassandra • Read all allocations for a customer – as fast as possible • Find all of customers in a test/cell – reverse index • Get count of customers in test/cell – count the entries in the reverse index
  • 14. Denormalization - HOWTO • No real world examples – ‘Normalization is for sissies’, Pat Helland • Denormalize allocations per customer – Trivial with a schema-less database
  • 15. Denormalized allocations • normalized data • denormalized (sparse) data
  • 16. Implementing allocations • As allocation for a customer has a handful of data points, they logically can be grouped together • Avoided blobs, json or otherwise • Using a standard column family, with composite columns
  • 17. Composite columns • Composite columns are sorted by each ‘token’ in name • Allocation column naming convention – <testId>:<field> – 42:cell = 2 – 42:enabled = Y – 47:cell = 0 – 47:enabled = Y
  • 18. Modeling AB metadata in cassandra • Explored several models, including json blobs, spreading across multiple CFs, differing degrees of denormalization • Reverse index to identify all tests for loading
  • 19. Implementing metadata • One CF, one row for all test’s data – Every data point is a column – no blobs • Composite columns – type:id:field • Types = base info, cells, allocation plans • Id = cell number, allocation plan (gu)id • Field = type-specific – Base info = test name, description, enabled – Cell’s name / description – Plan’s start/end dates, country to allocate to
  • 20. Implementing indices • Cassandra’s secondary indices vs. hand-built and maintained alternate indices • Secondary indices work great on uniform data between rows • But sparse column data not easy to index
  • 21. Hand-built Indices, 1 • Reverse index – Test/cell (key) to custIds (columns) • Column value is timestamp • Updating index when allocating a customer into test (double write)
  • 22. Hand-built indices, 2 • Counter column family – Test/cell to count of customers in test columns – Mutate on allocating a customer into test • Counters are not idempotent! • Mutates need to write to every node that hosts that key
  • 23. Index rebuilding • To keep the index consistent, it needs to be rebuilt occasionally • Even Oracle needs to have it’s indices rebuilt
  • 24. Into the real world
  • 25. Cassandra java clients • Hector – github.com/rantav/hector • Astyanax – Developed at Netflix (Eran Landau) – github.com/netflix • Cassie (scala) – Developed at Twitter – https://github.com/twitter/cassie
  • 26. Astyanax features • Clean object model • Node discovery • Node quarantine • Request failover/retry • JMX Monitoring • Connection pooling • Future execution
  • 30. Astyanax connection pools, 1 • Round Robin uses coordinator node
  • 31. Astyanax connection pooling, 2 • Token aware knows where the data resides for point reads
  • 32. Astyanax latency aware • Samples response times from Cassandra nodes • Favors faster responding nodes in pool • Use with token aware connection pooling
  • 33. Allocation mutates • AB allocations are immutable, so we need to prevent mutating • Oracle - unique table constraint • Cassandra - read before write – data race!
  • 34. Running cassandra • Compactions happen – how Cassandra is maintained – Mutations are written to memory (Memtable) – Flushed to disk (SSTable) on triggering threshold – Eventually, Cassandra merges SSTables as data for individual rows becomes scattered
  • 35. Compactions, 2 • Latency spikes happen, especially on read- heavy systems – Everything can slow down – Throttling in newer Cassandra versions helps – Astyanax avoids this problem with latency awareness
  • 36. Tunings, 1 • Key and row caches – Left unbounded can consume JVM memory needed for normal work – Latencies will spike as the JVM fights for free memory – Off-heap row cache is better but still maintains data structures on-heap
  • 37. Tunings, 2 • mmap() as in-memory cache – When the Cassandra process is terminated, mmap pages are returned to the free list • Row cache helps at startup
  • 38. Tunings, 3 • Sizing memtable flushes for optimizing compactions – Easier when writes are uniformly distributed, timewise – easier to reason about flush patterns – Best to optimize flushes based on memtable size, not time
  • 39. Tunings, 4 • Sharding – If a single row has disproportionately high gets/mutates, the nodes holding it will become hot spots – If a row grows too large, it can’t fit into memory
  • 40. Takeaways • Netflix is making all of our components distributed and fault tolerant as we grow domestically and internationally. • Cassandra is a core piece of our cloud infrastructure. • Netflix is open sourcing it’s cloud platform, including Cassandra support
  • 41. 終わり(The End) • Q&A @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  • 42. References • Pat Helland, ‘Normalization Is for Sissies” http://blogs.msdn.com/b/pathelland/archive/ 2007/07/23/normalization-is-for-sissies.aspx