Building Highly-resilient Systems at Pinterest

InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations/
pinterest-resilient-systems

Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com

Highly Resilient Systems at Pinterest
Yongsheng Wu

Engineering Manager of Storage & Caching, Pinterest

Email: yongsheng@pinterest.com

Pinterest: www.pinterest.com/yswu

Nov 17, 2015

Our mission is to help people
discover and do what they love

50+ billion Pins
categorized by 100+ million Pinners into
> 1 billion Boards

10s of 1,000s of
Instances on AWS

Dynamic Service Discovery
Realtime Conﬁguration
Caching
Persistent Storage
Async Processing

Client Retries
Clients
foo
1
bar
1
bar
2
bar
3
bar 1, 2, 3

Client Retries
Clients
foo
1
bar
1
bar
2
bar
3
bar 1, 2, 3
bar
4
?

Zookeeper
• Highly reliable distributed coordination
• Hierarchical namespace: ZNode
• Persistent Node
• Ephemeral Node
• Sequence Node
• Watch
• Node Children Changed
• Node Created
• Node Data Changed
• Node Deleted

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1
Capacity Addition

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar2
Capacity Addition

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Capacity Reduction

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 1, bar 2
Capacity Reduction

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 1
Capacity Reduction

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Zookeeper Failure

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Observers
/discovery
bar
prod
bar 1
bar 2
Zookeeper with Observers

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Observers
/discovery
bar
prod
bar 1
bar 2
Zookeeper with Observers Failure

Avoid Complete reliance on any
single system, even if it is a highly
reliable distributed system

bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2 bar 1, bar 2
Observers
/discovery
bar
prod
bar 1
bar 2
ZUM bar
bar 1, bar 2
Zookeeper with Observers Protected by Local Persistent ServerSet

Challenges
• Rapid planned capacity reduction
• Gradual loss of instances

Typical Service Setup
Config
ServerClient
Cache
DB

• Decider
• Experiment
• Rate limiting
• Failover
• … …

Zookeeper
/conﬁg
decider
setting_1: 53
foo
foo apps
{“setting_1”: 53,
… …. }
setting_2: 0
setting_3: 100
setting_4: 78
… …
Conﬁg
Admin
Console
Zookeeper Powered Realtime Configuration Management

foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v0
AWS S3
Config
Admin
Console
/config
decider: v0
ZUM
Zookeeper, Observers, S3 Based Realtime Configuration Management

foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
ZUM decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v0
AWS S3
Config
Admin
Console
/config
decider: v0
v1: {“setting_1”: 54,
… …. }
v1

foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
ZUM decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v0
v1: {“setting_1”: 54,
… …. }
v1

foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
v1: {“setting_1”: 54,
… …. }
v1
ZUM
v1: {“setting_1”: 54,
… …. }

foo
foo apps
v1: {“setting_1”: 54,
… …. }
Observers
decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
v1: {“setting_1”: 54,
… …. }
v1
ZUM
v1: {“setting_1”: 54,
… …. }

foo
foo apps
v1: {“setting_1”: 54,
… …. }
Observers
decider
v1: {“setting_1”: 54,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
v1: {“setting_1”: 54,
… …. }
v1
ZUM
v1: {“setting_1”: 54,
… …. }

foo
foo apps
v1: {“setting_1”: 54,
… …. }
Observers
decider
v1: {“setting_1”: 54,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
ZUM

Challenges
• Staggered rollout
• Conﬁguration of huge size

Caching
bar
cache
1
bar
cache
m
proc1
… …
prock
consistent hash
ring
Foo 1
… …

Caching
bar
cache
1
bar
cache
m
proc1
… …
prock
Foo 2
… …
proc1
… … prock
Foo 1
… …
proc1
… …
prock
Foo n
Too many
connections!

Twemproxy
bar
cache
1
bar
cache
m
… …
proc1
… …
prock
Foo 1
… …
proc1
… …
prock
Foo 2
proc1
… …
prock
Foo n
NutcrackerNutcracker Nutcracker

Cache Inconsistency
Nutcracker
… …foo: 1
bar cache 1 bar cache 2 bar cache n
set foo 2
foo: 2

Cache Inconsistency
Nutcracker
… …foo: 1
bar cache 1 bar cache 2 bar cache n
foo: 2

McRouter
… …
cache 1 cache 2 cache n
McRouter

McRouter
… …
cache 1 cache 2 cache n
McRouter
No ring reshuﬄe

McRouter
Pros
• No inconsistency caused by node joining/leaving the pool
• No cascading failures in case of excessive load caused by hot keys
Cons
• Cache misses

Replicated Pools - Reads
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Poolgetfoo

Replicated Pools - Invalidation
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Pooldeletefoo

McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
Log

McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
LogSinger
kafka
tailer
PinLater

Challenges
• Build the feedback loop from persistent layer to
caching layer
• Move to multiple geographic regions

Sharding
Shard Id Type Local Id
64 bits

Clients
DataServices
1
… …
Master
1
Slave
1 … …
Master
m
Slave
m
Clients
Clients
DataServices
2
DataServices
n
Read from Slave

Clients
DataServices
1
… …
Master
1
Slave
1 … …
Master
m
Slave
m
Clients
Clients
Read from Slave
DataServices
2
DataServices
n
Read from slave after
master failing health
check over a certain
period of time.

Master
1
Slave
1 … …
Master
m
Slave
m
Failover
DataServices Proc
ZUM ShardConﬁg
ShardConﬁg
Changes
Realtime Configuration Management Powered Failover

Other Persistence Stores
UMetaStore
• Key value store based on HBase
Zen
• Graph store: nodes and edges
• Flexible schema
• Custom index
• Both HBase and MySQL
Future
• Rocksdb

Challenges
• Automated failover with MySQL replica set
• HBase 1.x Upgrade
• Move to multiple geographic regions

Replication and failover are the
key ingredients for building highly
resilient storage and caching
systems

Async Processing
Use Cases
• Acknowledge success with non-time-sensitive actions taken at
later time
• Schedule and execute large number of jobs
Beneﬁts
• Faster response time
• More resilient to dependent system failures

Pyres Limitations
• No mechanism for success acknowledgement
• No visibility into status of individual job types
• No support for scheduled job execution at a
speciﬁc time in the future
• Rate limiting and retries are hard to manage
• Redis as only supported storage backend

PinLater
Asynchronous Processing System
Clients
Pinlater
Servers
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pools
Master Slave
Clients
Clients
Pinlater
Servers
PinLater
Servers
Storage Backend
Enqueue
Dequeue/ACK

PinLater Job State Transition
Pending Running
Succeeded
Failed
Done
Failed (no more retries)
Dequeued
Failed (retries left)/Claim Timeout

PinLater Job Requirements
Idempotency
Commutativity

Clients
PinLater
Server
1
PinLater
Server
2
PinLater
Server
n
… …
Master
1
Slave
1
… …
Master
m
Slave
m
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
1
… …
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
k
Clients
Clients
Enqueue
PinLater
Asynchronous Processing System

Challenges
• Multi-tenancy with fair failure isolation
• Fault-tolerant async job enqueuer

Use async processing as much as
possible to deliver faster
response time and make request
handling more robust

Learnings
• Avoid Complete reliance on any single system, even
if it is a highly reliable distributed system
• Replication and failover are the key ingredients for
building highly resilient storage and caching
systems
• Use async processing as much as possible to
deliver faster response time and make request
handling more robust

Failure Testing
• Be explicit with scope
• Failure Modes
• Sandbox testing
• Manual testing
• Automated simulation
• Testing on production
• AWS is doing it for us all the time
• Simian Army

Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/pinterest-
resilient-systems

Building Highly-resilient Systems at Pinterest

Recommended

Recommended

More Related Content

Similar to Building Highly-resilient Systems at Pinterest

Similar to Building Highly-resilient Systems at Pinterest (20)

More from C4Media

More from C4Media (20)

Recently uploaded

Recently uploaded (20)

Building Highly-resilient Systems at Pinterest