SlideShare a Scribd company logo
1 of 40
Prem Santosh
pxu@yelp.com
@premsantosh
Creating millions of user sessions using complex
event processing
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
Yelp’s Mission
Connecting people with great
local businesses.
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
● Fundamental element of engagement with Yelp
● Understand intent of users
● Generate other product metrics using sessions
Why Sessions?
Motivation
Batch
Processing
Raw Data Sessions
Motivation
● Slow
● Expensive
● One day late
● Sessions crossing midnight
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
Sessionizer Specs
● 1 Master
○ M4.xlarge (4 CPU, 16GB)
● 6 Core
○ M4.16xlarge (6 * 64 CPUs, 6 * 256GB)
● 6 Input kafka topics
● Total input throughput ~25K msg/sec
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
User Session
User Session
User Session
Activity A Activity B Activity C Activity D
UserID = XYZ
Sessionizer
Sessionizer
Event Logs
Sessionized Logs
Sessionizer Operators
KeyBy: UserID
.window(EventTimeSessionWindows)
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
Never ending bot sessions
t1 t2 t2 + 30
No Activity
Never ending bot sessions
t1 t2 t3
Never ending bot sessions
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
Duplicate Sessions
Session ID Number of Events
31cca9e6-4207-a323-36c121deaf73 45
95bff7c34-4de5-bc10-ef969a876cf7 27
31cca9e6-4207-a323-36c121deaf73 46
….. ….
Duplicate Sessions
Duplicate Sessions
t1 t2 t2 + 30
No Activity
t2 + 30 + allowedLateness
Late Event
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
Stale Topics
● Event time processing
● Event time watermark: to signal progress in event time.
● Watermarks are crucial when events can be out-of-order
Stale Topics
Sessionizer
Window
Output
Kafka
Watermark = min(T1, T1, T1)
Stale Topics
Sessionizer
Window
Kafka
Watermark = min(T5, T3, T5)
Stale Topics (Solution)
● Internal state stores time T since last watermark was
seen.
● If T > 10mins, override and set watermark
Stale Topics (Solution)
Sessionizer
Window
Output
Kafka
Watermark =
T5 - maxOutOfOrderness
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
S3 Throttling/ Checkpointing
● Due to large ingestion rate, we were checkpointing
frequently.
● Caused us to be throttled by S3 , causing checkpoint
failures
Fixed by FLINK-9061
● Motivation
● Sessionizer Specs
● Sessionizer
● Issues
○ Bot traffic
○ Duplicate sessions
○ Stale topics
○ S3 Checkpointing / Throttling
○ Flink Internal Timer State
Agenda
● Slow data structure to store timer state
● Stores timers in-memory
Flink Internal Timer State
● Slow data structure to store timer state
○ HeapInternalTimerService O(n) deletion
operation
○ High CPU usage
○ 50 million session mark → super slow processing
Flink Internal Timer State
● Stores timers in-memory
○ HeapInternalTimerService only
implementation
○ Session count ↑ results OOM
○ FLINK-9485 rocksDB implementation
Flink Internal Timer State
● Slow data structure to store timer state
○ FLINK-9423
● Stores timers in-memory
○ FLINK-9485
Flink Internal Timer State
Thank You!
pxu@yelp.com
@premsantosh
www.yelp.com/careers/
We're Hiring!
@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp

More Related Content

More from Flink Forward

More from Flink Forward (20)

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Flink Forward San Francisco 2019: Creating millions of user sessions using Complex Event Processing - Prem Santosh Udaya Shankar

Editor's Notes

  1. Use EMR (2000 C5 instances) Costs around $400 (its conservative) Because its slow it delays other metrics that are calculated from sessions
  2. Explain a user session How does a session end?
  3. Explain a user session
  4. Explain a user session
  5. Add a intro slide before every issue Think about a different name
  6. Mention the tradeoff here if infrastructure is down Find ticket which flink was implementing
  7. [Social Media Slide]