Flink Forward San Francisco 2022.
Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework.
by
Rainie Li & Kanchi Masalia
12. PinStats Analytic
Use case
“Overall, users … cited that currently
they have difficulties monitoring content
performance due to a lack of real-time
data being available, which they find
frustrating.”
13. Creator Content
Use cases
Fast user signals: Make user content
signals available quickly after content
creation
Safety: Reduce levels of unsafe content
as close to content creation time
Content Creation
Audience
Targeting
Content
Understanding
Quality
Interests &
Annotations
Embeddings
Performance
17. Xenon Jobs / Hermez workloads
154
Production Xenon use cases
>90
179
Deployments everyday
18. Highlights
Stability and Tier 1 support
● Enhanced JSS State Machine
● Supported job level dedicated S3 buckets
User experience
● Hermez supported most recent checkpoint deployment
● Hermez supported kill job and distributed shell
● Enriched savepoint information on Hermez
● Track daily & monthly deployment success rate
Metrics
● Job submission latency
19. Xenon Job Management Service
Monitoring
● Job Status
● Critical metrics (QPS)
● Checkpointing health
● Job/task health
● Notify users
Auto Recovery
Auto recover failed jobs
from:
● Last completed
checkpoint
● Most recent savepoint
● Fresh State
AZ Failure
Resilience
Auto failover jobs to
backup clusters in different
AZs when primary
cluster/AZ goes down
25. VIP Navboost Signal (Map Transforms, Async RPC calls, Backfill)
● User code focuses only on Business logic. ✅
● Tune flink operators using configs. ✅
● ROI: Kappa architecture - roadmap to shutting down an $800K double compute GPU cluster for visual-search batch. 🚧
Xenon
Flink
Application
Code Config
37. Questions?
Anumol Sebastian
Chenqi Liu
Hannah Chen
Divye Kapoor
Kanchi Masalia
Lu Niu Rainie Li
Teja Thotapalli
Nishant More
Samuel Bahr
Heng Zhang
Kevin Browne
Sergii Marchenko
Ashish Jhaveri Dinesh Kumar Sekar
Chen Qin
Shaowen Wang YOU?!