Video and slides synchronized, mp3 and slide download available at http://bit.ly/11hucXn.
Jeremy Edberg presents the data stores used by Netflix and Reddit, some of the best practices and lessons for surviving outages. Filmed at qconsf.com.
Jeremy Edberg is currently the Reliability Architect for Netflix, the largest video streaming service in the world. Before that he ran Reddit, an online community for sharing and discussing interesting things on the internet that does more than two billion page views a month. Both run their entire operations on Amazon’s EC2. Jeremy has keynoted at conferences such as PyCon and Cloud Connect.
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Building a Reliable Data Store
1. Jeremy Edberg
QconSF 2012
Tweet @jedberg with feedback!
2. Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/Reliable-Data-Store
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
3. Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
6. Agenda
• CAP theory and how it applies to reliability
• How reddit and Netflix maintain reliable
data stores
• Best Practices
• War stories -- surviving real outages
Tweet @jedberg with feedback!
7. CAP Theorem
• Consistent
• Available
• Partition-resistant
Tweet @jedberg with feedback!
13. The problem with CAP
• Daniel Abadi had a problem with CAP
• The weightings were uneven
• A is essential in all scenarios
• C is more important than P
• Latency wasn’t accounted for at all
Tweet @jedberg with feedback!
14. PACELC
If there is a partition (P) how does the system
tradeoff between availability and consistency (A
and C); else (E) when the system is running as
normal in the absence of partitions, how does
the system tradeoff between latency (L) and
consistency (C)?
Tweet @jedberg with feedback!
31. Sharding
• reddit split writes across four master databases
• Links/Accounts/Subreddits, Comments,Votes
and Misc
• Each has at least one slave in another zone
• Avoid reading from the master if possible
• Wrote their own database access layer, called
the “thing” layer
Tweet @jedberg with feedback!
32. Sample Schema
link_thing
int id
timestamp date
int ups
int downs
bool deleted
bool spam
link_data
int thing_id
string name
string value
char kind
Tweet @jedberg with feedback!
33. The thing layer
• Postgres is used like a key/value store
• Thing table has denormalized data
• Data table has arbitrary keys
• Lots of indexes tuned for our specific
queries
• Thing and data tables are on the same box,
but don’t have to be
Tweet @jedberg with feedback!
34. I love memcache
I make heavy use of memcached
Tweet @jedberg with feedback!
46. A/B Testing
Online Data Offline Data
Test Cell allocation Test tracking
Test Metadata Retention
Start/End date Fraction Viewed
UI Directives Pages Viewed
Tweet @jedberg with feedback!
50. More Things Netflix
Stores in Cassandra
• Video Quality
• Network issues
• Usage History
• Playback Errors
Tweet @jedberg with feedback!
51. Service based
architecture
Tweet @jedberg with feedback!
52. Netflix on AWS
2012 2012 2012
IPv6 IPv6 IPv6
Tweet @jedberg with feedback!
53. Abstraction
• Data sources are abstracted away behind
restful interfaces
• Each application owns its own consistency
• Each application can scale independently
based on load
Tweet @jedberg with feedback!
61. How it works
• Replication factor
• Quorum reads / writes
• Bloom Filter for fast negative lookups
• Immutable files for fast writes
• Seed nodes
• Multi-region
• Gossip protocol
Tweet @jedberg with feedback!
62. Cassandra Benefits
• Fast writes
• Fast negative lookups
• Easy incremental scalability
• Distributed -- No SPoF
Tweet @jedberg with feedback!
63. Why Cassandra?
• Availability over consistency
• Writes over reads
• We know Java
• Open source + support
Tweet @jedberg with feedback!
79. Leveraging Mutli-region
• 100% uptime is theoretically possible.
• You have to replicate your data
• This will cost money
Tweet @jedberg with feedback!
80. Other options
• Backup datacenter
• Backup provider
Tweet @jedberg with feedback!
82. The Monkey Theory
• Simulate things that go wrong
• Find things that are different
Tweet @jedberg with feedback!
83. The simian army
• Chaos -- Kills random instances
• Latency -- Slows the network down
• Conformity -- Looks for outliers
• Doctor -- Looks for passing health checks
• Janitor -- Cleans up unused resources
• Howler -- Yells about bad things
Tweet @jedberg with feedback!
86. Automate all the things!
• Application startup
• Configuration
• Code deployment
• System deployment
Tweet @jedberg with feedback!
87. Incident Reviews
Ask the key questions:
• What went wrong?
• How could we have detected it sooner?
• How could we have prevented it?
• How can we prevent this class of problem
in the future?
• How can we improve our behavior for next
time?
Tweet @jedberg with feedback!
88. The Netflix way
• Everything is “built for three”
• Fully automated build tools to test and
make packages
• Fully automated machine image bakery
• Fully automated image deployment
Tweet @jedberg with feedback!
89. All systems choices
assume some part will
fail at some point.
Tweet @jedberg with feedback!
90. Best Practices
• Keep data in multiple Availability Zones / DCs
• Avoid keeping state on a single instance
Tweet @jedberg with feedback!
91. Best Practices
• Isolated Services
• Three Balanced AZs
• Triple replicated persistence
• Isolated Regions
Tweet @jedberg with feedback!
92. Best Practices
• Don’t trust your dependencies
• Have good fallbacks
• Use circuit breakers/dependency
commands
Tweet @jedberg with feedback!
93. Best Practices
• Be generous in what you accept and stingy
in what you give
Tweet @jedberg with feedback!
94. Best Practices
• Hope for the best, assume the worst
Tweet @jedberg with feedback!
98. June 29th Outage
• Due to a severe storm, power went out in
one AZ
• Netflix did not do well because of a bug in
our internal mid-tier load balancer
• However, Cassandra held up just fine!
Tweet @jedberg with feedback!
99. October 29th Outage
• EBS degradation in one Zone
• We did much better this time
• Cassandra just kept running
• MySql not as well, but fallbacks kicked in
Tweet @jedberg with feedback!
100. Hurricane Sandy
The outage that never was
Tweet @jedberg with feedback!
101. Just a quick reminder...
(Some of) Netflix is open source:
https://netflix.github.com/
Tweet @jedberg with feedback!
102. Another reminder...
reddit is also open source
https://github.com/reddit
patches are now being accepted!
Tweet @jedberg with feedback!
103. Netflix is hiring
http://jobs.netflix.com/jobs.html
- or -
email talent@netflix.com and
tell them jedberg sent you
Tweet @jedberg with feedback!