5. Objects
▪ More than million status updates posted each day
▪ /s
▪ More than billion photos uploaded to the site each month
▪ /s
▪ More than billion pieces of content (web links, news stories,
blog posts, notes, photo albums, etc.) shared each week
▪ K/s
▪ Average user has friends on the site
▪ Billion friend graph edges
▪ Average user clicks the Like button on pieces of content each
month
Monday, April 12, 2010
6. - Infrastructure
▪ Thousands of servers in several data centers in two regions
▪ Web servers
▪ DB servers
▪ Memcache Servers
▪ Other services
Monday, April 12, 2010
7. The scale of memcache @ facebook
▪ Memcache Ops/s
▪ over M gets/sec
▪ over M sets/sec
▪ over T cached items
▪ over Tbytes
▪ Network IO
▪ peak rx Mpkts/s GB/s
▪ peak tx Mpkts/s GB/s
Monday, April 12, 2010
8. A typical memcache server’s P.O.V.
▪ Network I/O
▪ rx Kpkts/s . MB/s
▪ tx Kpkts/s MB/s
▪ Memcache OPS
▪ K gets/s
▪ K sets/s
▪ M items
Monday, April 12, 2010
All rates are 1 day moving averages
10. Monday, April 12, 2010
• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
accepting friend
11. Monday, April 12, 2010
• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
accepting friend
12. Monday, April 12, 2010
• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
accepting friend
13. Monday, April 12, 2010
• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
accepting friend
14. Monday, April 12, 2010
• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
accepting friend
15. Scaling Facebook: Interconnected data
Bob
Monday, April 12, 2010
•On Facebook, the data required to serve your home page or
any other page s incredibly interconnected
•Your data can’t sit on one server or cluster of servers because
almost every piece of content on Facebook requires
information about your network of friends
•And the average user has 130 friends
•As we scale, we have to be able to quickly pull data across all
of our servers, wherever it’s stored.
16. Scaling Facebook: Interconnected data
Bob Brian
Monday, April 12, 2010
•On Facebook, the data required to serve your
home page or any other page s incredibly
interconnected
•Your data can’t sit on one server or cluster of
servers because almost every piece of content on
Facebook requires information about your
network of friends
•And the average user has 130 friends
•As we scale, we have to be able to quickly pull
data across all of our servers, wherever it’s stored.
17. Scaling Facebook: Interconnected data
Felicia Bob Brian
Monday, April 12, 2010
•On Facebook, the data required to serve your
home page or any other page s incredibly
interconnected
•Your data can’t sit on one server or cluster of
servers because almost every piece of content on
Facebook requires information about your
network of friends
•And the average user has 130 friends
•As we scale, we have to be able to quickly pull
data across all of our servers, wherever it’s stored.
18. Memcache Rules of the Game
▪ GET object from memcache
▪ on miss, query database and SET object to memcache
▪ Update database row and DELETE object in memcache
▪ No derived objects in memcache
▪ Every memcache object maps to persisted data in database
Monday, April 12, 2010
22. Phatty Phatty Multiget (notes)
▪ PHP runtime is single threaded and synchronous
▪ To get good performance for data-parallel operations like
retrieving info for all friends, it’s necessary to dispatch memcache
get requests in parallel
▪ Initially we just used polling I/O in PHP.
▪ Later we switched to true asynchronous I/O in a PHP C extension
▪ In both case the result was reduced latency through parallelism.
Monday, April 12, 2010
24. sp: cs:
sp:
cs: cs:
sp:
PHP Client
Monday, April 12, 2010
Different objects have different sizes and access patterns. We began creating
memcache pools to segregate different kinds of objects for better cache efficiency and
memory utilization.
25. sp: sp: sp: cs: cs: cs:
PHP Client
Monday, April 12, 2010
Different objects have different sizes and access patterns. We began creating
memcache pools to segregate different kinds of objects for better cache efficiency and
memory utilization.
26. PHP Client
Monday, April 12, 2010
Different objects have different sizes and access patterns. We began creating
memcache pools to segregate different kinds of objects for better cache efficiency and
memory utilization.
27. Pools and Threads (notes)
▪ Privacy objects are small but have poor hit rates
▪ User-profiles are large but have good hit rates
▪ We achieve better overall caching by segregating different classes
of objects into different pools of memcache servers
▪ Memcache was originally a classic single-threaded unix daemon
▪ This meant we needed to run instances with / the RAM on
each memcache server
▪ X the number of connections to each both
▪ X the meta-data overhead
▪ We needed a multi-threaded service
Monday, April 12, 2010
29. Connections and Congestion (notes)
▪ As we added web-servers the connections to each memcache box
grew.
▪ Each webserver ran - PHP processes
▪ Each memcache box has K+ TCP connections
▪ UDP could reduce the number of connections
▪ As we added users and features, the number of keys per-multiget
increased
▪ Popular people and groups
▪ Platform and FBML
▪ We began to see incast congestion on our ToR switches.
Monday, April 12, 2010
30. Serialization and Compression
▪ We noticed our short profiles weren’t so short
▪ K PHP serialized object
▪ fb-serialization
▪ based on thrift wire format
▪ X faster
▪ smaller
▪ gzcompress serialized strings
Monday, April 12, 2010
32. Multiple Datacenters
SC Web SF Web
SC SF
Memcache Memcache
SC MySQL
Monday, April 12, 2010
33. Multiple Datacenters
SC Web SF Web
Memcache Proxy Memcache Proxy
SC SF
Memcache Memcache
SC MySQL
Monday, April 12, 2010
34. ▪Multiple Datacenters (notes)
▪ In the early days we had two data-centers
▪ The one we were about to turn off
▪ The one we were about to turn on
▪ Eventually we outgrew a single data-center
▪ Still only one master database tier
▪ Rules of the game require that after an update we need to
broadcast deletes to all tiers
▪ The mcproxy era begins
Monday, April 12, 2010
35. Multiple Regions
West Coast East Coast
SC Web VA Web
SC VA
Memcache Memcache
Memcache Proxy
SC MySQL VA MySQL
Monday, April 12, 2010
36. Multiple Regions
West Coast East Coast
SC Web SF Web VA Web
Memcache Proxy Memcache Proxy
SC SF VA
Memcache Memcache Memcache
Memcache Proxy
SC MySQL VA MySQL
Monday, April 12, 2010
37. Multiple Regions
West Coast East Coast
SC Web SF Web VA Web
Memcache Proxy Memcache Proxy
SC SF VA
Memcache Memcache Memcache
Memcache Proxy
SC MySQL MySql replication VA MySQL
Monday, April 12, 2010
38. ▪ Multiple Regions (notes)
▪ Latency to east coast and European users was/is terrible.
▪ So we deployed a slave DB tier in Ashburn VA
▪ Slave DB tracks syncs with master via MySQL binlog
▪ This introduces a race condition
▪ mcproxy to the rescue again
▪ Add memcache delete pramga to MySQL update and insert ops
▪ Added thread to slave mysqld to dispatch deletes in east coast
via mcpro
Monday, April 12, 2010
45. Replicated Keys (notes)
▪ Viral groups and applications cause hot keys
▪ More gets than a single memcache server can process
▪ (Remember the rules of the game!)
▪ That means more queries than a single DB server can process
▪ That means that group or application is effectively down
▪ Creating key aliases allows us to add server capacity.
▪ Hot keys are published to all web-servers
▪ Each web-server picks an alias for gets
▪ get key:xxx => get key:xxx#N
▪ Each web-server deletes all aliases
Monday, April 12, 2010
46. Memcache Rules of the Game
▪ New Rule
▪ If a key is hot, pick an alias and fetch that for reads
▪ Delete all aliases on updates
Monday, April 12, 2010
47. Mirrored Pools
Specialized Replica Specialized Replica
Shard Shard Shard Shard
General pool with wide fanout
Shard Shard Shard Shard n
...
Monday, April 12, 2010
48. Mirrored Pools (notes)
▪ As our memcache tier grows the ratio of keys/packet decreases
▪ keys/ server = packet
▪ keys/ server = packets
▪ More network traffic
▪ More memcache server kernel interrupts per request
▪ Confirmed Info - critical account meta-data
▪ Have you confirmed your account?
▪ Are you a minor?
▪ Pulled from large user-profile objects
▪ Since we just need a few bytes of data for many users
Monday, April 12, 2010
49. Hot Misses
▪ [animation]
Monday, April 12, 2010
50. Hot Misses (notes)
▪ Remember the rules of the game
▪ update and delete
▪ miss, query, and set
▪ When the object is very, very popular, that query rate can kill a
database server
▪ We need flow control!
Monday, April 12, 2010
51. Memcache Rules of the Game
▪ For hot keys, on miss grab a mutex before issuing db query
▪ memcache-add a per-object mutex
▪ key:xxx => key:xxx#mutex
▪ If add succeeds do the query
▪ If add fails (because mutex already exists) back-off and try again
▪ After set delete mutex
Monday, April 12, 2010
52. Hot Deletes
▪ [hot groups graphics]
Monday, April 12, 2010
53. Hot Deletes (notes)
▪ We’re not out of the woods yet
▪ Cache mutex doesn’t work for frequently updated objects
▪ like membership lists and walls for viral groups and applications.
▪ Each process that acquires a mutex finds that the object has been
deleted again
▪ ...and again
▪ ...and again
Monday, April 12, 2010
54. Rules of the Game: Caching Intent
▪ Each memcache server is in the perfect position to detect and
mitigate contention
▪ Record misses
▪ Record deletes
▪ Serve stale data
▪ Serve lease-ids
▪ Don’t allow updates without a valid lease id
Monday, April 12, 2010
56. Shaping Memcache Traffic
▪ mcproxy as router
▪ admission control
▪ tunneling inter-datacenter traffic
Monday, April 12, 2010
57. Cache Hierarchies
▪ Warming up Cold Clusters
▪ Proxies for Cacheless Clusters
Monday, April 12, 2010
58. Big Low Latency Clusters
▪ Bigger Clusters are Better
▪ Low Latency is Better
▪ L .
▪ UDP
▪ Proxy Facebook Architecture
Monday, April 12, 2010
59. Worse IS better
▪ Richard Gabriel’s famous essay contrasted
▪ ITS and Unix
▪ LISP and C
▪ MIT and New Jersey
Monday, April 12, 2010
http://www.jwz.org/doc/worse-is-better.html
60. Why Memcache Works
▪ Uniform, low latency with partial results is a better user
experience
▪ memcache provides a few robust primitives
▪ key-to-server mapping
▪ parallel I/O
▪ flow-control
▪ traffic shaping
▪ that allow ad hoc solutions to a wide range of scaling issues
Monday, April 12, 2010
We started with simple, obvious improvements.
As we grew we deployed less obvious improvements...
But they’ve remained pretty simple
61. (c) Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. .
Monday, April 12, 2010