Battle of the Stream Processing Titans – Flink versus RisingWave

Battle of the Stream Processing Titans
– Flink versus RisingWave
Karin Wolok
Project Elevate
&
Yingjun Wu
RisingWave Labs

About Karin
• Developer Relations Consultant
(ProjectElevate.io)
• Ex-StarTree
• Ex-Neo4j
• Formerly ran campaigns for renowned
Individuals and orgs like Eminem, Live Nation,
ReMax, and Novartis.
• Conference speaker, presented at over 50
conferences globally
2

About Yingjun
• Founder and CEO of RisingWave Labs
• Ex-AWS Redshift
• Ex-IBM Almaden Research Center
• PhD, National University of Singapore
• Visiting PhD, Carnegie Mellon University
3

• People need real-time insights
Background
4
Stock market
monitoring
Inventory management
Parcel tracking Web clickstream

Background
5
sub-second seconds minutes hours days
Freshness
Business
value
Stock market
monitoring
Inventory management
Parcel tracking Web clickstream

Background
6
Batch processing
Freshness
Business
value

Background
7
Batch processing
Batch processing
Freshness
Business
value

Background
8
Batch processing
Stream processing
Freshness
Business
value

Batch Processing vs. Stream Processing
9
Batch processing Stream processing
User-initiated computation
Full computation
Event-initiated computation
Incremental computation

History of Stream Processing
10
NiagaraCQ
STREAM
Aurora
Borealis
Research prototypes
2000 2005 2010 2015 2020

History of Stream Processing
11
NiagaraCQ
STREAM
Aurora
Borealis
Research prototypes
2000 2005 2010 2015 2020

12
Stream processing framework Streaming database
Streaming regime

13
Stream processing framework Streaming database
Batch processing framework Data warehouse
Streaming regime
Batching regime
Counterpart Counterpart

Flink vs. RisingWave
• Applications and use cases
• User interface
• Internal architecture
14

Applications and Use Cases
15
1 microsecond 1 millisecond 1 second 1 minute 1 hour 1 day
High-frequency trading Fraud detection
IoT computing
Ads recommendation
Stock dashboarding
Delivery app
Inventory tracking
ML training
Data science
Accounting
Network monitoring
Travel booking

• Streaming ETL
• Continuously ingest data from upstream systems, perform
transformations, and deliver results to downstream systems
• Streaming analytics
• Monitoring, alerting, automation, etc…
16

• Streaming ETL
17
Databases
Messaging
systems
File
systems

• Streaming ETL
18
Databases
Messaging
systems
File
systems
Serving systems
Databases
Messaging
systems
File
systems

User Interface
19
MapReduce-style API, SQL/Python wrapper PostgreSQL-compatible, Python UDF

User Interface
20
MapReduce-style API, SQL/Python wrapper
Flink job to represent a data processing pipeline
PostgreSQL-compatible, Python UDF
Materialized view to represent a data processing pipeline

User Interface
21
MapReduce-style API, SQL/Python wrapper
Flink job to represent a data processing pipeline
Each Flink job is independent
PostgreSQL-compatible, Python UDF
Materialized view to represent a data processing pipeline
Materialized views can be dependent
Flink job1
Flink job1
Flink job3 MV1
MV2
MV3
MV4
MV5
MV6

Internal Architecture
• Execution performance
• Failure recovery
• Elastic scaling
22
State management

• Consider joining two data streams
• Impression stream
• Click stream
23
23
Output (adId, impressionTime, clickTime)
Impression (adId, impressionTime)
Click (adId, clickTime)
State
State
Hash table for click stream
Hash table for impression stream
How to manage internal states?

• Consider joining two data streams
• Impression stream
• Click stream
24
24
Output (adId, impressionTime, clickTime)
Impression (adId, impressionTime)
Click (adId, clickTime)
State
State
Hash table for click stream
Hash table for impression stream
Burst!
How to manage internal states?

25
MapReduce style, compute-storage coupled Cloud-native style, compute-storage decoupled
State
State
State
State
Storage
(S3)
Compute
(EC2)
State
Storage
(S3)
Compute
(EC2)
State

26
MapReduce style, compute-storage coupled Cloud-native style, compute-storage decoupled
State
State
State
State
Storage
(S3)
Compute
(EC2)
State
Storage
(S3)
Compute
(EC2)
State
Optimized for performance! Optimized for cost-efficiency!

Internal Architecture (Failure Recovery)
27
State State State
States
State State State
Compute
nodes
Persistent
storage
States
Checkpoint
Cache Cache Cache
“state as checkpoint”

Internal Architecture (Failure Recovery)
28
State State State
States
State State State
Compute
nodes
Persistent
storage
States
Checkpoint
Cache Cache Cache
State
Read from
remote state
Recover from
checkpoint

Internal Architecture (Elastic Scaling)
29
State State State
States
State State State
Compute
nodes
Persistent
storage
States
Checkpoint
Cache Cache Cache
Scale out Scale out

Summary
Applications and
use cases
Streaming ETL and streaming analytics
User interface
Low-level abstractions (Java) and high-
level wrappers (SQL and Python)
PostgreSQL-style SQL with Python UDF
support
Use Flink jobs to represent stream
processing pipelines; Flink jobs are
independent
Use materialized views to represent
stream processing pipelines;
materialized views can be dependent
with resource sharing enabled
Internal
architecture
Optimized for performance Optimized for cost-efficiency
Slow in failure recovery Fast in failure recovery
Slow in elastic scaling Fast in elastic scaling
30

Thanks! Q&A?
risingwave.com/slack

Battle of the Stream Processing Titans – Flink versus RisingWave

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Battle of the Stream Processing Titans – Flink versus RisingWave

Similaire à Battle of the Stream Processing Titans – Flink versus RisingWave (20)

Dernier

Dernier (20)

Battle of the Stream Processing Titans – Flink versus RisingWave