SlideShare a Scribd company logo
1 of 133
Cassandra at Spotify




                       7th of March 2012
About this talk
About this talk
  An introduction Spotify, to our service and our persistent storage needs
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago


  Not a comparison between different NoSQL solutions
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago


  Not a comparison between different NoSQL solutions
  Not a hands on introduction to Cassandra
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago


  Not a comparison between different NoSQL solutions
  Not a hands on introduction to Cassandra
  We work with physical hardware for production
Noa Resare
Noa Resare
  Stockholm, Sweden
Noa Resare
  Stockholm, Sweden
  Service Reliability Engineering
Noa Resare
  Stockholm, Sweden
  Service Reliability Engineering
  noa@spotify.com
Noa Resare
  Stockholm, Sweden
  Service Reliability Engineering
  noa@spotify.com
  @blippie
Spotify — all music, all the time
Spotify — all music, all the time
  A better user experience than file sharing.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
  3 datacenters.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
  3 datacenters.
  Tens of gigabits of data pushed per datacenter.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
  3 datacenters.
  Tens of gigabits of data pushed per datacenter.
  Backend systems that support a large set of innovative features.
Innovative features in practice
Innovative features in practice
   Playlist
Innovative features in practice
   Playlist
   Should be simple, right?
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
   Full history
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
   Full history
   Really fast access to latest version number and content
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
   Full history
   Really fast access to latest version number and content
Suggested solutions
Suggested solutions
  Flat files
Suggested solutions
  Flat files
  We don’t need ACID
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
  Tokyo cabinet, some experience
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
  Tokyo cabinet, some experience
  Clustered Key-Value store
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
  Tokyo cabinet, some experience
  Clustered Key-Value store
  Evaluated a lot, end game contestants HBase and Cassandra
Enter Cassandra
Enter Cassandra
  Solves a large subset of storage related problems
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing




  66 + 18 + 9 + 28 production nodes
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing




  66 + 18 + 9 + 28 production nodes
  About twenty nodes for various testing clusters
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing




  66 + 18 + 9 + 28 production nodes
  About twenty nodes for various testing clusters
  Datasets ranging from 8T to a few gigs.
Cassandra key concepts, on a node
  Log structured storage
  Sorted string table — SSTable
  Immutable files on disk
  Compaction — Many to one, merge sort




                 Memtable




              SSTable         SSTable    SSTable
Cassandra key concepts, In a cluster
  Clusters of nodes in a ring by key order
  All data typically written to several nodes, Replication Factor
  Rings can be expanded in production
  Gossip, detects nodes being up / down / joining
  Anti Entropy mechanisms
  Many read operations can be done sequentially
Cassandra, winning!
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
  Tinker friendly
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
  Tinker friendly
  Readable code
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
  Tinker friendly
  Readable code
Let me tell you a story
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?




  Downtime for payment
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?




  Downtime for payment
  Downtime for account creation
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?




  Downtime for payment
  Downtime for account creation
  No downtime for cassandra backed systems
Backups
Backups
  A few terabytes of live data, many nodes. Painful.
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
  Non-compacted. Possibly a few tens of old versions.
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
  Non-compacted. Possibly a few tens of old versions.
  Initially, only full backups (pre 0.8)
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
  Non-compacted. Possibly a few tens of old versions.
  Initially, only full backups (pre 0.8)
Our solution to backups
Our solution to backups
  Separate datacenter for backups with RF=1
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
  Once removed from production performance considerations
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
  Once removed from production performance considerations
  Application level incremental backups
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
  Once removed from production performance considerations
  Application level incremental backups
  Soon: Cassandra incremental backups
Solid state is a game changer
Solid state is a game changer
  Large datasets, light read load
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
  Hack cassandra to have configurable datadirs per keyspace.
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
  Hack cassandra to have configurable datadirs per keyspace.
  Our patch is integrated in Cassandra 1.1
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
  Hack cassandra to have configurable datadirs per keyspace.
  Our patch is integrated in Cassandra 1.1
Some unpleasant surprises
Some unpleasant surprises
  Immaturity
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
  Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
  Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  Small disk, high load, very possible to get into an Out Of Disk condition
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
  Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  Small disk, high load, very possible to get into an Out Of Disk condition
  Logging is lacking
Spot the bug
Spot the bug
  Hector java cassandra driver:
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
  java.lang.ArrayIndexOutOfBoundsException
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
  java.lang.ArrayIndexOutOfBoundsException
  After close to 2**31 requests
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
  java.lang.ArrayIndexOutOfBoundsException
  After close to 2**31 requests
  Took about 5 days
Conclusions
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing




  Consider it a toolkit for constructing solutions.
Questions? Answers.

More Related Content

Similar to Cassandra nyc

Enlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitEnlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitSamsung Open Source Group
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019Karthik Murugesan
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ SpotifyNikhil Tibrewal
 
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsDavid Poblador i Garcia
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyEvention
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Basic introduction to SOA
Basic introduction to SOABasic introduction to SOA
Basic introduction to SOAJoaquin Rincon
 
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000'sDevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000'stechopsguru
 
Maximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleMaximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleScyllaDB
 
Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...Amazon Web Services
 
AWS Customer Presentation - Melodeo
AWS Customer Presentation - MelodeoAWS Customer Presentation - Melodeo
AWS Customer Presentation - MelodeoAmazon Web Services
 
Melodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWSMelodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWSguestda111d9
 
Rapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoopRapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoopRaymond Camden
 
Deliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For EveryoneDeliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For EveryoneRoché Compaan
 
High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010Barry Abrahamson
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainKen Collins
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsHPCC Systems
 
Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform randomfromtheweb
 
Colin Carter - LSPs and APIs
Colin Carter  - LSPs and APIsColin Carter  - LSPs and APIs
Colin Carter - LSPs and APIssconul
 

Similar to Cassandra nyc (20)

sql.pdf
sql.pdfsql.pdf
sql.pdf
 
Enlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitEnlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & Toolkit
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Basic introduction to SOA
Basic introduction to SOABasic introduction to SOA
Basic introduction to SOA
 
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000'sDevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's
 
Maximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleMaximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with Ansible
 
Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...
 
AWS Customer Presentation - Melodeo
AWS Customer Presentation - MelodeoAWS Customer Presentation - Melodeo
AWS Customer Presentation - Melodeo
 
Melodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWSMelodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWS
 
Rapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoopRapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoop
 
Deliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For EveryoneDeliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For Everyone
 
High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own Domain
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for Translations
 
Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform
 
Colin Carter - LSPs and APIs
Colin Carter  - LSPs and APIsColin Carter  - LSPs and APIs
Colin Carter - LSPs and APIs
 

Recently uploaded

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Cassandra nyc

  • 1. Cassandra at Spotify 7th of March 2012
  • 3. About this talk An introduction Spotify, to our service and our persistent storage needs
  • 4. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings
  • 5. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned
  • 6. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago
  • 7. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago
  • 8. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions
  • 9. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions Not a hands on introduction to Cassandra
  • 10. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions Not a hands on introduction to Cassandra We work with physical hardware for production
  • 12. Noa Resare Stockholm, Sweden
  • 13. Noa Resare Stockholm, Sweden Service Reliability Engineering
  • 14. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com
  • 15. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com @blippie
  • 16. Spotify — all music, all the time
  • 17. Spotify — all music, all the time A better user experience than file sharing.
  • 18. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients.
  • 19. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  • 20. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  • 21. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users.
  • 22. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users. 3 datacenters.
  • 23. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter.
  • 24. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter. Backend systems that support a large set of innovative features.
  • 26. Innovative features in practice Playlist
  • 27. Innovative features in practice Playlist Should be simple, right?
  • 28. Innovative features in practice Playlist Should be simple, right? A named list of tracks
  • 29. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated
  • 30. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync
  • 31. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists
  • 32. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices
  • 33. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices
  • 34. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system
  • 35. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic.
  • 36. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements:
  • 37. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements: Full history
  • 38. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements: Full history Really fast access to latest version number and content
  • 39. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements: Full history Really fast access to latest version number and content
  • 41. Suggested solutions Flat files
  • 42. Suggested solutions Flat files We don’t need ACID
  • 43. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass.
  • 44. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really)
  • 45. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL
  • 46. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this
  • 47. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store
  • 48. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience
  • 49. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store
  • 50. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store Evaluated a lot, end game contestants HBase and Cassandra
  • 52. Enter Cassandra Solves a large subset of storage related problems
  • 53. Enter Cassandra Solves a large subset of storage related problems Sharding, replication
  • 54. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure
  • 55. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request
  • 56. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software
  • 57. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing
  • 58. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing
  • 59. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing
  • 60. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes
  • 61. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters
  • 62. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters Datasets ranging from 8T to a few gigs.
  • 63. Cassandra key concepts, on a node Log structured storage Sorted string table — SSTable Immutable files on disk Compaction — Many to one, merge sort Memtable SSTable SSTable SSTable
  • 64. Cassandra key concepts, In a cluster Clusters of nodes in a ring by key order All data typically written to several nodes, Replication Factor Rings can be expanded in production Gossip, detects nodes being up / down / joining Anti Entropy mechanisms Many read operations can be done sequentially
  • 66. Cassandra, winning! Major upgrades without service interruptions (in theory)
  • 67. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes
  • 68. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you
  • 69. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read
  • 70. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O
  • 71. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do
  • 72. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward
  • 73. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support
  • 74. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support Tinker friendly
  • 75. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support Tinker friendly Readable code
  • 76. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support Tinker friendly Readable code
  • 77. Let me tell you a story
  • 78. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5
  • 79. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime?
  • 80. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120.
  • 81. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top
  • 82. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits?
  • 83. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment
  • 84. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation
  • 85. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation No downtime for cassandra backed systems
  • 87. Backups A few terabytes of live data, many nodes. Painful.
  • 88. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data
  • 89. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions.
  • 90. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Initially, only full backups (pre 0.8)
  • 91. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Initially, only full backups (pre 0.8)
  • 92. Our solution to backups
  • 93. Our solution to backups Separate datacenter for backups with RF=1
  • 94. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky
  • 95. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations
  • 96. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups
  • 97. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups Soon: Cassandra incremental backups
  • 98. Solid state is a game changer
  • 99. Solid state is a game changer Large datasets, light read load
  • 100. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load
  • 101. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD?
  • 102. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No.
  • 103. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot
  • 104. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot
  • 105. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot
  • 106. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan:
  • 107. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0
  • 108. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace.
  • 109. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace. Our patch is integrated in Cassandra 1.1
  • 110. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace. Our patch is integrated in Cassandra 1.1
  • 113. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift.
  • 114. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift. Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  • 115. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift. Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition
  • 116. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift. Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition Logging is lacking
  • 118. Spot the bug Hector java cassandra driver:
  • 119. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; }
  • 120. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition
  • 121. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException
  • 122. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests
  • 123. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests Took about 5 days
  • 125. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed
  • 126. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community
  • 127. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky
  • 128. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream
  • 129. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  • 130. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  • 131. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  • 132. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing Consider it a toolkit for constructing solutions.

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n