Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Luca Giovagnoli
luca@yelp.com
Realtime Store Visit Predictions at
Scale
Product
Algorithm
Flink Implementation
Impact
Outline
Yelp’s Mission
Connecting people with great
local businesses.
The wreck Room
St Francis
Memorial Hospital
Trader Joe’s
Creator
Candidate Businesses
Creator
Trader Joe’s
The Wreck Room
St. Francis Hospital
...
?
Input location
Luca
@ California & Hyd...
Flink Clustering
~ 103
/s Pings Store Visits
Online model
inference
Async I/O +
Dispatch
Data Pipeline
Py service
Flink ap...
Clustering
03:30
DepartureVisit
(03:21, 03:30)
ArrivalVisit
(03:21, +inf)
03:21 03:55
d meters
< d d + ε
Time
> t mins
03:...
GlobalWindow
ArrivalVisit
[03:21, +inf)
04:1003:5503:3003:21
DepartureVisit
[03:21, 03:30]
04:29
ArrivalVisit
[04:10, +inf...
03:3003:21 Time
OnAnyPingTrigger fires here
GlobalWindow
Custom Trigger
03:3003:21 Event time
CountEvictor removes the oldest ping
GlobalWindow
03:30 Event time
GlobalWindow
Count Evictor
● Mobiles go offline
● Mobiles clocks can be off
* Picture from https://www.oreilly.com/ideas/the-world-beyond-batch-streamin...
03:5503:1003:21
(Evicted ping)
GlobalWindow
03:1003:21
GlobalWindow
Event time
Event time
LatePingEvictor
03:22 03:24 Proc...
Location (Input) Candidate businesses Features Confidence
score
Distance Directions ...
Luca
@ California & Hyde St.
Creato...
● Fragile in-house state
● No concurrency, no scaling
● At-least-once guarantee --> duplicate pushes
Origins story
x10 Visits recall increase
up to ~ 103
ML predictions / sec
14 Flink instances down from ~102
Python
~ x5 times cheaper
Im...
www.yelp.com/careers/
We're Hiring!
@YelpEngineering
engineeringblog.yelp.com
github.com/yelp
yelp.com/careers
Flink Forward San Francisco 2019: Realtime Store Visit Predictions at Scale - Luca Giovagnoli
Flink Forward San Francisco 2019: Realtime Store Visit Predictions at Scale - Luca Giovagnoli
Prochain SlideShare
Chargement dans…5
×

Flink Forward San Francisco 2019: Realtime Store Visit Predictions at Scale - Luca Giovagnoli

441 vues

Publié le

This talk aims to inspire attendees with a multidisciplinary Flink application, where different fields have come together with a graceful synergy. You will hear about geospatial clustering algorithms, a gradient boosting ML model, and cutting-edge stream-processing technology - all in the same talk! And, if you are wondering, you can incorporate all this into your SOA using Async I/O!

After introducing our product use-case (real-time notifications for nearby local businesses), we’ll dive into the big data challenges. The talk will be describing a Visit Detection algorithm we have built to cluster raw GPS pings into Visits, using Flink state management and custom processing constructs (custom Windows, Triggers and Evictors). Finally we will discuss a real-time machine learning model to predict the correct nearby business, leveraging Flink’s Async I/O at scale.

Flink enabled us to scale complex algorithms to thousands of operations per second, and to power hundreds of thousands of daily push notifications. It availed itself as a clearly superior alternative, whose performance netted Yelp great cost savings, and allowed us to move away from hardly scalable Python alternatives.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Flink Forward San Francisco 2019: Realtime Store Visit Predictions at Scale - Luca Giovagnoli

  1. 1. Luca Giovagnoli luca@yelp.com Realtime Store Visit Predictions at Scale
  2. 2. Product Algorithm Flink Implementation Impact Outline
  3. 3. Yelp’s Mission Connecting people with great local businesses.
  4. 4. The wreck Room St Francis Memorial Hospital Trader Joe’s Creator
  5. 5. Candidate Businesses Creator Trader Joe’s The Wreck Room St. Francis Hospital ... ? Input location Luca @ California & Hyde St. XGBoost ML model
  6. 6. Flink Clustering ~ 103 /s Pings Store Visits Online model inference Async I/O + Dispatch Data Pipeline Py service Flink app System overview
  7. 7. Clustering 03:30 DepartureVisit (03:21, 03:30) ArrivalVisit (03:21, +inf) 03:21 03:55 d meters < d d + ε Time > t mins 03:27 --> ProcessWindowFunction distance (m)
  8. 8. GlobalWindow ArrivalVisit [03:21, +inf) 04:1003:5503:3003:21 DepartureVisit [03:21, 03:30] 04:29 ArrivalVisit [04:10, +inf) Time < d d + ε d + ε < d distance (m)< d
  9. 9. 03:3003:21 Time OnAnyPingTrigger fires here GlobalWindow Custom Trigger
  10. 10. 03:3003:21 Event time CountEvictor removes the oldest ping GlobalWindow 03:30 Event time GlobalWindow Count Evictor
  11. 11. ● Mobiles go offline ● Mobiles clocks can be off * Picture from https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 by Tyler Akidau
  12. 12. 03:5503:1003:21 (Evicted ping) GlobalWindow 03:1003:21 GlobalWindow Event time Event time LatePingEvictor 03:22 03:24 Processing time Processing time03:22 03:24 03:57
  13. 13. Location (Input) Candidate businesses Features Confidence score Distance Directions ... Luca @ California & Hyde St. Creator 0.001 0.026 ... 0.99 Trader Joe’s 0.021 0.071 ... 0.13 St. Francis Hospital ... ... ... ... ... ... ... ... ...
  14. 14. ● Fragile in-house state ● No concurrency, no scaling ● At-least-once guarantee --> duplicate pushes Origins story
  15. 15. x10 Visits recall increase up to ~ 103 ML predictions / sec 14 Flink instances down from ~102 Python ~ x5 times cheaper Impact
  16. 16. www.yelp.com/careers/ We're Hiring!
  17. 17. @YelpEngineering engineeringblog.yelp.com github.com/yelp yelp.com/careers

×