SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Project Voldemort
    Jay Kreps




        6/20/2009   1
Where was it born?

• LinkedIn’s Data & Analytics Team
   • Analysis & Research
   • Hadoop and data pipeline
   • Search
   • Social Graph
• Caltrain
• Very lenient boss
Two Cheers for the relational data model

• The relational view is a triumph of computer
  science, but…
• Pasting together strings to get at your data is
  silly
• Hard to build re-usable data structures
• Don’t hide the memory hierarchy!
   • Good: Filesystem API
   • Bad: SQL, some RPCs
Services Break Relational DBs


• No real joins
• Lots of denormalization
• ORM is pointless
• Most constraints, triggers, etc disappear
• Making data access APIs cachable means lots of
simple GETs
• No-downtime releases are painful
• LinkedIn isn’t horizontally partitioned
• Latency is key
Other Considerations

•Who is responsible for performance (engineers?
DBA? site operations?)
•Can you do capacity planning?
•Can you simulate the problem early in the design
phase?
•How do you do upgrades?
•Can you mock your database?
Some problems we wanted to solve


•   Application Examples
     • People You May Know
     • Item-Item Recommendations
     • Type ahead selection
     • Member and Company Derived Data
     • Network statistics
     • Who Viewed My Profile?
     • Relevance data
     • Crawler detection
•   Some data is batch computed and served as read only
•   Some data is very high write load
•   Voldemort is only for real-time problems
•   Latency is key
Some constraints

• Data set is large and persistent
    • Cannot be all in memory
    • Must partition data between machines
• 90% of caching tiers are fixing problems that shouldn’t exist
• Need control over system availability and data durability
    • Must replicate data on multiple machines
• Cost of scalability can’t be too high
• Must support diverse usages
Inspired By Amazon Dynamo & Memcached

•   Amazon’s Dynamo storage system
     • Works across data centers
     • Eventual consistency
     • Commodity hardware
     • Not too hard to build
    Memcached
     – Actually works
     – Really fast
     – Really simple
    Decisions:
     – Multiple reads/writes
     – Consistent hashing for data distribution
     – Key-Value model
     – Data versioning
Priorities

1.   Performance and scalability
2.   Actually works
3.   Community
4.   Data consistency
5.   Flexible & Extensible
6.   Everything else
Why Is This Hard?

• Failures in a distributed system are much more
  complicated
   • A can talk to B does not imply B can talk to A
   • A can talk to B does not imply C can talk to B
• Getting a consistent view of the cluster is as hard as
  getting a consistent view of the data
• Nodes will fail and come back to life with stale data
• I/O has high request latency variance
• I/O on commodity disks is even worse
• Intermittent failures are common
• User must be isolated from these problems
• There are fundamental trade-offs between availability and
  consistency
Voldemort Design




• Layered design
• One interface for all
layers:
    • put/get/delete
    • Each layer
    decorates the
    next
    • Very flexible
    • Easy to test
Voldemort Physical Deployment
Client API


• Key-value only
• Rich values give denormalized one-many relationships
• Four operations: PUT, GET, GET_ALL, DELETE
• Data is organized into “stores”, i.e. tables
• Key is unique to a store
• For PUT and DELETE you can specify the version you
  are updating
• Simple optimistic locking to support multi-row updates
  and consistent read-update-delete
Versioning & Conflict Resolution


• Vector clocks for consistency
   • A partial order on values
   • Improved version of optimistic locking
   • Comes from best known distributed system paper
     “Time, Clocks, and the Ordering of Events in a
     Distributed System”
• Conflicts resolved at read time and write time
• No locking or blocking necessary
• Vector clocks resolve any non-concurrent writes
• User can supply strategy for handling concurrent writes
• Tradeoffs when compared to Paxos or 2PC
Vector Clock Example



   # two servers simultaneously fetch a value
   [client 1] get(1234) => {"name":"jay", "email":"jkreps@linkedin.com"}
   [client 2] get(1234) => {"name":"jay", "email":"jkreps@linkedin.com"}

   # client 1 modifies the name and does a put
   [client 1] put(1234), {"name":"jay2", "email":"jkreps@linkedin.com"})

   # client 2 modifies the email and does a put
   [client 2] put(1234, {"name":"jay3", "email":"jay.kreps@gmail.com"})

   # We now have the following conflicting versions:
   {"name":"jay", "email":"jkreps@linkedin.com"}
   {"name":"jay kreps", "email":"jkreps@linkedin.com"}
   {"name":"jay", "email":"jay.kreps@gmail.com"}
Serialization

• Really important--data is forever
• But really boring!
• Many ways to do it
  • Compressed JSON, Protocol Buffers,
    Thrift
  • They all suck!
• Bytes <=> objects <=> strings?
• Schema-free?
• Support real data structures
Routing


• Routing layer turns a single GET, PUT, or DELETE into
  multiple, parallel operations
• Client- or server-side
• Data partitioning uses a consistent hashing variant
   • Allows for incremental expansion
   • Allows for unbalanced nodes (some servers may be
      better)
• Routing layer handles repair of stale data at read time
• Easy to add domain specific strategies for data
  placement
   • E.g. only do synchronous operations on nodes in
      the local data center
Routing Parameters


• N - The replication factor (how many copies of each
  key-value pair we store)
• R - The number of reads required
• W - The number of writes we block for
• If R+W > N then we have a quorum-like algorithm, and
  we will read our writes
Routing Algorithm
              •   To route a GET:
                   • Calculate an ordered preference list of N nodes that
                      handle the given key, skipping any known-failed
                      nodes
                   • Read from the first R
                   • If any reads fail continue down the preference list
                      until R reads have completed
                   • Compare all fetched values and repair any nodes
                      that have stale data
              •   To route a PUT/DELETE:
                   • Calculate an ordered preference list of N nodes that
                      handle the given key, skipping any failed nodes
                   • Create a latch with W counters
                   • Issue the N writes, and decrement the counter when
                      each is complete
                   • Block until W successful writes occur
Routing With Failures


• Load balancing is in the software
     • either server or client
• No master
• View of server state may be inconsistent (A may think B
is down, C may disagree)
• If a write fails put it somewhere else
• A node that gets one of these failed writes will attempt to
deliver it to the failed node periodically until the node
returns
• Value may be read-repaired first, but delivering stale
data will be detected from the vector clock and ignored
• All requests must have aggressive timeouts
Network Layer


• Network is the major bottleneck in many uses
• Client performance turns out to be harder than server
(client must wait!)
• Server is also a Client
• Two implementations
    • HTTP + servlet container
    • Simple socket protocol + custom server
• HTTP server is great, but http client is 5-10X slower
• Socket protocol is what we use in production
• Blocking IO and new non-blocking connectors
Persistence


• Single machine key-value storage is a commodity
• All disk data structures are bad in different ways
• Btrees are still the best all-purpose structure
• Huge variety of needs
• SSDs may completely change this layer
• Plugins are better than tying yourself to a single strategy
Persistence II

 • A good Btree takes 2 years to get right, so we just use BDB
 • Even so, data corruption really scares me
 • BDB, MySQL, and mmap’d file implementations
     • Also 4 others that are more specialized
 • In-memory implementation for unit testing (or caching)
 • Test suite for conformance to interface contract
 • No flush on write is a huge, huge win
 • Have a crazy idea you want to try?
State of the Project

• Active mailing list
• 4-5 regular committers outside LinkedIn
• Lots of contributors
• Equal contribution from in and out of LinkedIn
• Project basics
     • IRC
     • Some documentation
     • Lots more to do
• > 300 unit tests that run on every checkin (and pass)
• Pretty clean code
• Moved to GitHub (by popular demand)
• Production usage at a half dozen companies
• Not a LinkedIn project anymore
• But LinkedIn is really committed to it (and we are hiring to work on it)
Glaring Weaknesses

• Not nearly enough documentation
• Need a rigorous performance and multi-machine
failure tests running NIGHTLY
• No online cluster expansion (without reduced
guarantees)
• Need more clients in other languages (Java and
python only, very alpha C++ in development)
• Better tools for cluster-wide control and
monitoring
Example of LinkedIn’s usage


• 4 Clusters, 4 teams
    • Wide variety of data sizes, clients, needs
• My team:
    • 12 machines
    • Nice servers
    • 300M operations/day
    • ~4 billion events in 10 stores (one per event type)
• Other teams: news article data, email related data, UI
settings
• Some really terrifying projects on the horizon
Hadoop and Voldemort sitting in a tree…
• Now a completely different problem: Big batch data
processing
• One major focus of our batch jobs is relationships,
matching, and relevance
• Many types of matching people, jobs, questions, news
articles, etc
• O(N^2) :-(
• End result is hundreds of gigabytes or terrabytes of
output
• cycles are threatening to get rapid
• Building an index of this size is a huge operation
• Huge impact on live request latency
1.   Index build runs 100% in Hadoop
2.   MapReduce job outputs Voldemort Stores to HDFS
3.   Nodes all download their pre-built stores in parallel
4.   Atomic swap to make the data live
5.   Heavily optimized storage engine for read-only data
6.   I/O Throttling on the transfer to protect the live servers
Some performance numbers

• Production stats
    • Median: 0.1 ms
    • 99.9 percentile GET: 3 ms
• Single node max throughput (1 client node, 1 server
node):
    • 19,384 reads/sec
    • 16,559 writes/sec
• These numbers are for mostly in-memory problems
Some new & upcoming things


 • New
    • Python client
    • Non-blocking socket server
    • Alpha round on online cluster expansion
    • Read-only store and Hadoop integration
    • Improved monitoring stats
 • Future
    • Publish/Subscribe model to track changes
    • Great performance and integration tests
Shameless promotion

• Check it out: project-voldemort.com
• We love getting patches.
• We kind of love getting bug reports.
• LinkedIn is hiring, so you can work on this full time.
    • Email me if interested
    • jkreps@linkedin.com
The End

Contenu connexe

Tendances

Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsShubham Tagra
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Vinoth Chandar
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
A complete guide to azure storage
A complete guide to azure storageA complete guide to azure storage
A complete guide to azure storageHimanshu Sahu
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)Amazon Web Services
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseEdgar Alejandro Villegas
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
 

Tendances (20)

HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
A complete guide to azure storage
A complete guide to azure storageA complete guide to azure storage
A complete guide to azure storage
 
Windows Azure Storage – Architecture View
Windows Azure Storage – Architecture ViewWindows Azure Storage – Architecture View
Windows Azure Storage – Architecture View
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle DatabaseBest Practices – Extreme Performance with Data Warehousing on Oracle Database
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 

En vedette

Introducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jIntroducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jInnova4j
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesVinoth Chandar
 
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell PipesComposing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell PipesVinoth Chandar
 
Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해InGuen Hwang
 
Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.codefluency
 
Características MONGO DB
Características MONGO DBCaracterísticas MONGO DB
Características MONGO DBmaxfontana90
 
Memcached의 확장성 개선
Memcached의 확장성 개선Memcached의 확장성 개선
Memcached의 확장성 개선NAVER D2
 
Banco de Dados Não Relacionais vs Banco de Dados Relacionais
Banco de Dados Não Relacionais vs Banco de Dados RelacionaisBanco de Dados Não Relacionais vs Banco de Dados Relacionais
Banco de Dados Não Relacionais vs Banco de Dados Relacionaisalexculpado
 

En vedette (11)

Introducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jIntroducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4j
 
Bluetube
BluetubeBluetube
Bluetube
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell PipesComposing and Executing Parallel Data Flow Graphs wth Shell Pipes
Composing and Executing Parallel Data Flow Graphs wth Shell Pipes
 
Redis at the guardian
Redis at the guardianRedis at the guardian
Redis at the guardian
 
Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해Sha 2 기반 인증서 업그레이드 이해
Sha 2 기반 인증서 업그레이드 이해
 
Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.
 
Características MONGO DB
Características MONGO DBCaracterísticas MONGO DB
Características MONGO DB
 
Memcached의 확장성 개선
Memcached의 확장성 개선Memcached의 확장성 개선
Memcached의 확장성 개선
 
Banco de Dados Não Relacionais vs Banco de Dados Relacionais
Banco de Dados Não Relacionais vs Banco de Dados RelacionaisBanco de Dados Não Relacionais vs Banco de Dados Relacionais
Banco de Dados Não Relacionais vs Banco de Dados Relacionais
 

Similaire à Voldemort Nosql

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleChristophe Grand
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013Amazon Web Services
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsAchievers Tech
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesShivji Kumar Jha
 
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Mydbops
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum PainKafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum Painconfluent
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended CutWes McKinney
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 

Similaire à Voldemort Nosql (20)

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
 
noSQL choices
noSQL choicesnoSQL choices
noSQL choices
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web Applications
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum PainKafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 

Plus de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

Plus de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Dernier

Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Sheetaleventcompany
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...lizamodels9
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptxnandhinijagan9867
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noidadlhescort
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperityhemanthkumar470700
 

Dernier (20)

Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 

Voldemort Nosql

  • 1. Project Voldemort Jay Kreps 6/20/2009 1
  • 2. Where was it born? • LinkedIn’s Data & Analytics Team • Analysis & Research • Hadoop and data pipeline • Search • Social Graph • Caltrain • Very lenient boss
  • 3. Two Cheers for the relational data model • The relational view is a triumph of computer science, but… • Pasting together strings to get at your data is silly • Hard to build re-usable data structures • Don’t hide the memory hierarchy! • Good: Filesystem API • Bad: SQL, some RPCs
  • 4.
  • 5. Services Break Relational DBs • No real joins • Lots of denormalization • ORM is pointless • Most constraints, triggers, etc disappear • Making data access APIs cachable means lots of simple GETs • No-downtime releases are painful • LinkedIn isn’t horizontally partitioned • Latency is key
  • 6. Other Considerations •Who is responsible for performance (engineers? DBA? site operations?) •Can you do capacity planning? •Can you simulate the problem early in the design phase? •How do you do upgrades? •Can you mock your database?
  • 7. Some problems we wanted to solve • Application Examples • People You May Know • Item-Item Recommendations • Type ahead selection • Member and Company Derived Data • Network statistics • Who Viewed My Profile? • Relevance data • Crawler detection • Some data is batch computed and served as read only • Some data is very high write load • Voldemort is only for real-time problems • Latency is key
  • 8. Some constraints • Data set is large and persistent • Cannot be all in memory • Must partition data between machines • 90% of caching tiers are fixing problems that shouldn’t exist • Need control over system availability and data durability • Must replicate data on multiple machines • Cost of scalability can’t be too high • Must support diverse usages
  • 9. Inspired By Amazon Dynamo & Memcached • Amazon’s Dynamo storage system • Works across data centers • Eventual consistency • Commodity hardware • Not too hard to build Memcached – Actually works – Really fast – Really simple Decisions: – Multiple reads/writes – Consistent hashing for data distribution – Key-Value model – Data versioning
  • 10. Priorities 1. Performance and scalability 2. Actually works 3. Community 4. Data consistency 5. Flexible & Extensible 6. Everything else
  • 11. Why Is This Hard? • Failures in a distributed system are much more complicated • A can talk to B does not imply B can talk to A • A can talk to B does not imply C can talk to B • Getting a consistent view of the cluster is as hard as getting a consistent view of the data • Nodes will fail and come back to life with stale data • I/O has high request latency variance • I/O on commodity disks is even worse • Intermittent failures are common • User must be isolated from these problems • There are fundamental trade-offs between availability and consistency
  • 12. Voldemort Design • Layered design • One interface for all layers: • put/get/delete • Each layer decorates the next • Very flexible • Easy to test
  • 14. Client API • Key-value only • Rich values give denormalized one-many relationships • Four operations: PUT, GET, GET_ALL, DELETE • Data is organized into “stores”, i.e. tables • Key is unique to a store • For PUT and DELETE you can specify the version you are updating • Simple optimistic locking to support multi-row updates and consistent read-update-delete
  • 15. Versioning & Conflict Resolution • Vector clocks for consistency • A partial order on values • Improved version of optimistic locking • Comes from best known distributed system paper “Time, Clocks, and the Ordering of Events in a Distributed System” • Conflicts resolved at read time and write time • No locking or blocking necessary • Vector clocks resolve any non-concurrent writes • User can supply strategy for handling concurrent writes • Tradeoffs when compared to Paxos or 2PC
  • 16. Vector Clock Example # two servers simultaneously fetch a value [client 1] get(1234) => {"name":"jay", "email":"jkreps@linkedin.com"} [client 2] get(1234) => {"name":"jay", "email":"jkreps@linkedin.com"} # client 1 modifies the name and does a put [client 1] put(1234), {"name":"jay2", "email":"jkreps@linkedin.com"}) # client 2 modifies the email and does a put [client 2] put(1234, {"name":"jay3", "email":"jay.kreps@gmail.com"}) # We now have the following conflicting versions: {"name":"jay", "email":"jkreps@linkedin.com"} {"name":"jay kreps", "email":"jkreps@linkedin.com"} {"name":"jay", "email":"jay.kreps@gmail.com"}
  • 17. Serialization • Really important--data is forever • But really boring! • Many ways to do it • Compressed JSON, Protocol Buffers, Thrift • They all suck! • Bytes <=> objects <=> strings? • Schema-free? • Support real data structures
  • 18. Routing • Routing layer turns a single GET, PUT, or DELETE into multiple, parallel operations • Client- or server-side • Data partitioning uses a consistent hashing variant • Allows for incremental expansion • Allows for unbalanced nodes (some servers may be better) • Routing layer handles repair of stale data at read time • Easy to add domain specific strategies for data placement • E.g. only do synchronous operations on nodes in the local data center
  • 19. Routing Parameters • N - The replication factor (how many copies of each key-value pair we store) • R - The number of reads required • W - The number of writes we block for • If R+W > N then we have a quorum-like algorithm, and we will read our writes
  • 20. Routing Algorithm • To route a GET: • Calculate an ordered preference list of N nodes that handle the given key, skipping any known-failed nodes • Read from the first R • If any reads fail continue down the preference list until R reads have completed • Compare all fetched values and repair any nodes that have stale data • To route a PUT/DELETE: • Calculate an ordered preference list of N nodes that handle the given key, skipping any failed nodes • Create a latch with W counters • Issue the N writes, and decrement the counter when each is complete • Block until W successful writes occur
  • 21. Routing With Failures • Load balancing is in the software • either server or client • No master • View of server state may be inconsistent (A may think B is down, C may disagree) • If a write fails put it somewhere else • A node that gets one of these failed writes will attempt to deliver it to the failed node periodically until the node returns • Value may be read-repaired first, but delivering stale data will be detected from the vector clock and ignored • All requests must have aggressive timeouts
  • 22. Network Layer • Network is the major bottleneck in many uses • Client performance turns out to be harder than server (client must wait!) • Server is also a Client • Two implementations • HTTP + servlet container • Simple socket protocol + custom server • HTTP server is great, but http client is 5-10X slower • Socket protocol is what we use in production • Blocking IO and new non-blocking connectors
  • 23. Persistence • Single machine key-value storage is a commodity • All disk data structures are bad in different ways • Btrees are still the best all-purpose structure • Huge variety of needs • SSDs may completely change this layer • Plugins are better than tying yourself to a single strategy
  • 24. Persistence II • A good Btree takes 2 years to get right, so we just use BDB • Even so, data corruption really scares me • BDB, MySQL, and mmap’d file implementations • Also 4 others that are more specialized • In-memory implementation for unit testing (or caching) • Test suite for conformance to interface contract • No flush on write is a huge, huge win • Have a crazy idea you want to try?
  • 25. State of the Project • Active mailing list • 4-5 regular committers outside LinkedIn • Lots of contributors • Equal contribution from in and out of LinkedIn • Project basics • IRC • Some documentation • Lots more to do • > 300 unit tests that run on every checkin (and pass) • Pretty clean code • Moved to GitHub (by popular demand) • Production usage at a half dozen companies • Not a LinkedIn project anymore • But LinkedIn is really committed to it (and we are hiring to work on it)
  • 26. Glaring Weaknesses • Not nearly enough documentation • Need a rigorous performance and multi-machine failure tests running NIGHTLY • No online cluster expansion (without reduced guarantees) • Need more clients in other languages (Java and python only, very alpha C++ in development) • Better tools for cluster-wide control and monitoring
  • 27. Example of LinkedIn’s usage • 4 Clusters, 4 teams • Wide variety of data sizes, clients, needs • My team: • 12 machines • Nice servers • 300M operations/day • ~4 billion events in 10 stores (one per event type) • Other teams: news article data, email related data, UI settings • Some really terrifying projects on the horizon
  • 28. Hadoop and Voldemort sitting in a tree… • Now a completely different problem: Big batch data processing • One major focus of our batch jobs is relationships, matching, and relevance • Many types of matching people, jobs, questions, news articles, etc • O(N^2) :-( • End result is hundreds of gigabytes or terrabytes of output • cycles are threatening to get rapid • Building an index of this size is a huge operation • Huge impact on live request latency
  • 29. 1. Index build runs 100% in Hadoop 2. MapReduce job outputs Voldemort Stores to HDFS 3. Nodes all download their pre-built stores in parallel 4. Atomic swap to make the data live 5. Heavily optimized storage engine for read-only data 6. I/O Throttling on the transfer to protect the live servers
  • 30. Some performance numbers • Production stats • Median: 0.1 ms • 99.9 percentile GET: 3 ms • Single node max throughput (1 client node, 1 server node): • 19,384 reads/sec • 16,559 writes/sec • These numbers are for mostly in-memory problems
  • 31. Some new & upcoming things • New • Python client • Non-blocking socket server • Alpha round on online cluster expansion • Read-only store and Hadoop integration • Improved monitoring stats • Future • Publish/Subscribe model to track changes • Great performance and integration tests
  • 32. Shameless promotion • Check it out: project-voldemort.com • We love getting patches. • We kind of love getting bug reports. • LinkedIn is hiring, so you can work on this full time. • Email me if interested • jkreps@linkedin.com