Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2hgpOEw.
Preslav Le talks about how Dropbox’s infrastructure evolved over the years and how it looks today as well the challenges and lessons learned and tips addressing massive scale, consistency, architecture, MySQL, Memcache, and more. Filmed at qconsf.com.
Preslav Le has been a software engineer at Dropbox for the past 3 years, contributing to various aspects of Dropbox’s infrastructure including traffic, performance and storage. He was part of the core on-call and storage on-call rotations, dealing with high emergency real world issues from bad code pushes to complete datacenter outages.
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
dropbox-infrastructure
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
18. B LO C K DATA I N S 3
metaserver
metaserver
metaserver
blockserver
blockserver
blockserver
S3DB
DB
DB
Memcached
Memcached
Memcached
nginx
nginx
LB
notification
server
clients
nginx
nginx
LB
async
processingasync
processingasync
processing
AWSDropbox’s datacenters AWS
19. M E TA DATA I N M Y S Q L
metaserver
metaserver
metaserver
blockserver
blockserver
blockserver
S3DB
DB
DB
Memcached
Memcached
Memcached
nginx
nginx
LB
notification
server
clients
nginx
nginx
LB
async
processingasync
processingasync
processing
AWSDropbox’s datacentersDropbox’s datacenters
20. 1 . F E TC H M E TA DATA
metaserver
metaserver
metaserver
blockserver
blockserver
blockserver
S3DB
DB
DB
Memcached
Memcached
Memcached
nginx
nginx
LB
notification
server
clients
nginx
nginx
LB
async
processingasync
processingasync
processing
AWSDropbox’s datacenters
metaserver
DB
LB
clients
Memcached
21. 2 . D OW N LOA D B LO C K S
metaserver
metaserver
metaserver
blockserver
blockserver
blockserver
S3DB
DB
DB
Memcached
Memcached
Memcached
nginx
nginx
LB
notification
server
clients
nginx
nginx
LB
async
processingasync
processingasync
processing
AWSDropbox’s datacenters
blockserver
S3
LB
LB
clients
22. 3 . WA I T F O R N OT I F I C AT I O N S
metaserver
metaserver
metaserver
blockserver
blockserver
blockserver
S3DB
DB
DB
Memcached
Memcached
Memcached
nginx
nginx
LB
notification
server
clients
nginx
nginx
LB
async
processingasync
processingasync
processing
AWSDropbox’s datacenters
notification
server
clients
metaserver
23. P Y T H O N E V E R Y W H E R E
metaserver
metaserver
metaserver
blockserver
blockserver
blockserver
S3DB
DB
DB
Memcached
Memcached
Memcached
nginx
nginx
LB
notification
server
clients
nginx
nginx
LB
async
processingasync
processingasync
processing
AWSDropbox’s datacenters
26. S C A L I N G DATA BA S E S
mysql
master
mysql
replica
mysql
replica
metaserverMemcached
Memcached
Memcached
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
27. H O R I ZO N TA L S C A L I N G
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaservermetaserver metaserver
28. CO N N E C T I O N S
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaservermetaserver metaserver
29. S Q L P R OX Y
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaservermetaserver metaserver
SQL Proxy SQL Proxy SQL Proxy
33. P L AY B O O K
1. Check for ongoing deployments or newly enabled features
34. P L AY B O O K
1. Check for ongoing deployments or newly enabled features
2. Check for recently started background jobs
35. 1. Check for ongoing deployments or newly enabled features
2. Check for recently started background jobs
3. DBA oncall, please help!
P L AY B O O K
37. • Slow queries would adversely impact performance across the board
38. • Slow queries would adversely impact performance across the board
• More features => Managing more independent MySQL
39. • Slow queries would adversely impact performance across the board
• More features => Managing more independent MySQL
• Reactively (re)sharding individual databases as they hit capacity
40. • Slow queries would adversely impact performance across the board
• More features => Managing more independent MySQL
• Reactively (re)sharding individual databases as they hit capacity
• Impacted developer productivity
41. S C A L A B L E M E TA DATA S TO R E
D E S I G N E D F O R M U LT I -T E N A N C Y
2013 — Present
42. S H A R D I N G A N D C AC H I N G
B E H I N D T H E S C E N E S
54. N E W S E R V I C E : F I L E J O U R N A L
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
55. S H A R D FA I LU R E
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1
master
57. LO N G T I M E O U T S
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1
master
58. R U N O U T O F W O R K E R S
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1
master
File JournalFile Journal File Journal
59. C A S C A D I N G FA I LU R E
shard1
master
shard1
replica
shard1
replica
shard0
master
shard0
replica
shard0
replica
shardN
master
shardN
replica
shrardN
replica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1
master
File JournalFile Journal File Journal
metaserver metaserver metaserver metaservermetaserver metaserver
68. H OW TO P R E V E N T C A S C A D I N G FA I LU R E ?
meta-client
meta-client
meta-client
meta-client
meta-client
meta-web
meta-api
meta-api
meta-api
meta-mobile
meta-mobile
meta-mobile
File Journal
File Journal
File Journal
Search
Search
Search
Auth
AuthAuth
service
Block
RoutingBlock
RoutingBlock
Routing
Auth
Auth
Edgestore
Auth
AuthPresence
&Notications
File Journal
File Journal
Cape
…
blockserver
blockserver
blockserver
Magic
PocketMagic
PocketMagic
Pocket
Blockservice
Riviera
RivieraThumbnail
service
Search
69. BA N DA I D : P E R R O U T E I S O L AT I O N