SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Battle of the Stream Processing Titans
– Flink versus RisingWave
Karin Wolok
Project Elevate
&
Yingjun Wu
RisingWave Labs
About Karin
• Developer Relations Consultant
(ProjectElevate.io)
• Ex-StarTree
• Ex-Neo4j
• Formerly ran campaigns for renowned
Individuals and orgs like Eminem, Live Nation,
ReMax, and Novartis.
• Conference speaker, presented at over 50
conferences globally
2
About Yingjun
• Founder and CEO of RisingWave Labs
• Ex-AWS Redshift
• Ex-IBM Almaden Research Center
• PhD, National University of Singapore
• Visiting PhD, Carnegie Mellon University
3
• People need real-time insights
Background
4
Stock market
monitoring
Inventory management
Parcel tracking Web clickstream
• People need real-time insights
Background
5
sub-second seconds minutes hours days
Freshness
Business
value
Stock market
monitoring
Inventory management
Parcel tracking Web clickstream
• People need real-time insights
Background
6
sub-second seconds minutes hours days
Batch processing
Freshness
Business
value
• People need real-time insights
Background
7
sub-second seconds minutes hours days
Batch processing
Batch processing
Freshness
Business
value
Background
8
sub-second seconds minutes hours days
Batch processing
Stream processing
Freshness
Business
value
• People need real-time insights
Batch Processing vs. Stream Processing
9
Batch processing Stream processing
User-initiated computation
Full computation
Event-initiated computation
Incremental computation
History of Stream Processing
10
NiagaraCQ
STREAM
Aurora
Borealis
Research prototypes
2000 2005 2010 2015 2020
History of Stream Processing
11
NiagaraCQ
STREAM
Aurora
Borealis
Research prototypes
2000 2005 2010 2015 2020
12
Stream processing framework Streaming database
Streaming regime
13
Stream processing framework Streaming database
Batch processing framework Data warehouse
Streaming regime
Batching regime
Counterpart Counterpart
Flink vs. RisingWave
• Applications and use cases
• User interface
• Internal architecture
14
Applications and Use Cases
15
1 microsecond 1 millisecond 1 second 1 minute 1 hour 1 day
High-frequency trading Fraud detection
IoT computing
Ads recommendation
Stock dashboarding
Delivery app
Inventory tracking
ML training
Data science
Accounting
Network monitoring
Travel booking
Applications and Use Cases
• Streaming ETL
• Continuously ingest data from upstream systems, perform
transformations, and deliver results to downstream systems
• Streaming analytics
• Monitoring, alerting, automation, etc…
16
Applications and Use Cases
• Streaming ETL
• Continuously ingest data from upstream systems, perform
transformations, and deliver results to downstream systems
• Streaming analytics
• Monitoring, alerting, automation, etc…
17
Databases
Messaging
systems
File
systems
Applications and Use Cases
• Streaming ETL
• Continuously ingest data from upstream systems, perform
transformations, and deliver results to downstream systems
• Streaming analytics
• Monitoring, alerting, automation, etc…
18
Databases
Messaging
systems
File
systems
Serving systems
Databases
Messaging
systems
File
systems
User Interface
19
MapReduce-style API, SQL/Python wrapper PostgreSQL-compatible, Python UDF
User Interface
20
MapReduce-style API, SQL/Python wrapper
Flink job to represent a data processing pipeline
PostgreSQL-compatible, Python UDF
Materialized view to represent a data processing pipeline
User Interface
21
MapReduce-style API, SQL/Python wrapper
Flink job to represent a data processing pipeline
Each Flink job is independent
PostgreSQL-compatible, Python UDF
Materialized view to represent a data processing pipeline
Materialized views can be dependent
Flink job1
Flink job1
Flink job3 MV1
MV2
MV3
MV4
MV5
MV6
Internal Architecture
• Execution performance
• Failure recovery
• Elastic scaling
22
State management
Internal Architecture
• Consider joining two data streams
• Impression stream
• Click stream
23
23
Output (adId, impressionTime, clickTime)
Impression (adId, impressionTime)
Click (adId, clickTime)
State
State
Hash table for click stream
Hash table for impression stream
How to manage internal states?
Internal Architecture
• Consider joining two data streams
• Impression stream
• Click stream
24
24
Output (adId, impressionTime, clickTime)
Impression (adId, impressionTime)
Click (adId, clickTime)
State
State
Hash table for click stream
Hash table for impression stream
Burst!
How to manage internal states?
Internal Architecture
25
MapReduce style, compute-storage coupled Cloud-native style, compute-storage decoupled
State
State
State
State
Storage
(S3)
Compute
(EC2)
State
Storage
(S3)
Compute
(EC2)
State
Internal Architecture
26
MapReduce style, compute-storage coupled Cloud-native style, compute-storage decoupled
State
State
State
State
Storage
(S3)
Compute
(EC2)
State
Storage
(S3)
Compute
(EC2)
State
Optimized for performance! Optimized for cost-efficiency!
Internal Architecture (Failure Recovery)
27
State State State
States
State State State
Compute
nodes
Persistent
storage
States
Checkpoint
Cache Cache Cache
“state as checkpoint”
Internal Architecture (Failure Recovery)
28
State State State
States
State State State
Compute
nodes
Persistent
storage
States
Checkpoint
Cache Cache Cache
“state as checkpoint”
State
Read from
remote state
Recover from
checkpoint
Internal Architecture (Elastic Scaling)
29
State State State
States
State State State
Compute
nodes
Persistent
storage
States
Checkpoint
Cache Cache Cache
“state as checkpoint”
Scale out Scale out
Summary
Applications and
use cases
Streaming ETL and streaming analytics
User interface
Low-level abstractions (Java) and high-
level wrappers (SQL and Python)
PostgreSQL-style SQL with Python UDF
support
Use Flink jobs to represent stream
processing pipelines; Flink jobs are
independent
Use materialized views to represent
stream processing pipelines;
materialized views can be dependent
with resource sharing enabled
Internal
architecture
Optimized for performance Optimized for cost-efficiency
Slow in failure recovery Fast in failure recovery
Slow in elastic scaling Fast in elastic scaling
30
Thanks! Q&A?
risingwave.com/slack

Contenu connexe

Tendances

Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain confluent
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To FlinkKnoldus Inc.
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsYingjun Wu
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Seattle Apache Flink Meetup
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergWalaa Eldin Moustafa
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 

Tendances (20)

Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache flink
Apache flinkApache flink
Apache flink
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 

Similaire à Battle of the Stream Processing Titans – Flink versus RisingWave

Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesDATAVERSITY
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLParis Carbone
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j
 
Managing Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsManaging Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsRandy Shoup
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitRekha Joshi
 
Baymeetup-FlinkResearch
Baymeetup-FlinkResearchBaymeetup-FlinkResearch
Baymeetup-FlinkResearchFoo Sounds
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroGaurav "GP" Pal
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Managing python at scale without breaking the bank
Managing python at scale without breaking the bankManaging python at scale without breaking the bank
Managing python at scale without breaking the bankPyData
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Event-Driven Architectures Done Right | Tim Berglund, Confluent
Event-Driven Architectures Done Right | Tim Berglund, ConfluentEvent-Driven Architectures Done Right | Tim Berglund, Confluent
Event-Driven Architectures Done Right | Tim Berglund, ConfluentHostedbyConfluent
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 

Similaire à Battle of the Stream Processing Titans – Flink versus RisingWave (20)

Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
 
Managing Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsManaging Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and Events
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
 
Baymeetup-FlinkResearch
Baymeetup-FlinkResearchBaymeetup-FlinkResearch
Baymeetup-FlinkResearch
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Managing python at scale without breaking the bank
Managing python at scale without breaking the bankManaging python at scale without breaking the bank
Managing python at scale without breaking the bank
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Event-Driven Architectures Done Right | Tim Berglund, Confluent
Event-Driven Architectures Done Right | Tim Berglund, ConfluentEvent-Driven Architectures Done Right | Tim Berglund, Confluent
Event-Driven Architectures Done Right | Tim Berglund, Confluent
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 

Dernier

Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 

Dernier (20)

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

Battle of the Stream Processing Titans – Flink versus RisingWave

  • 1. Battle of the Stream Processing Titans – Flink versus RisingWave Karin Wolok Project Elevate & Yingjun Wu RisingWave Labs
  • 2. About Karin • Developer Relations Consultant (ProjectElevate.io) • Ex-StarTree • Ex-Neo4j • Formerly ran campaigns for renowned Individuals and orgs like Eminem, Live Nation, ReMax, and Novartis. • Conference speaker, presented at over 50 conferences globally 2
  • 3. About Yingjun • Founder and CEO of RisingWave Labs • Ex-AWS Redshift • Ex-IBM Almaden Research Center • PhD, National University of Singapore • Visiting PhD, Carnegie Mellon University 3
  • 4. • People need real-time insights Background 4 Stock market monitoring Inventory management Parcel tracking Web clickstream
  • 5. • People need real-time insights Background 5 sub-second seconds minutes hours days Freshness Business value Stock market monitoring Inventory management Parcel tracking Web clickstream
  • 6. • People need real-time insights Background 6 sub-second seconds minutes hours days Batch processing Freshness Business value
  • 7. • People need real-time insights Background 7 sub-second seconds minutes hours days Batch processing Batch processing Freshness Business value
  • 8. Background 8 sub-second seconds minutes hours days Batch processing Stream processing Freshness Business value • People need real-time insights
  • 9. Batch Processing vs. Stream Processing 9 Batch processing Stream processing User-initiated computation Full computation Event-initiated computation Incremental computation
  • 10. History of Stream Processing 10 NiagaraCQ STREAM Aurora Borealis Research prototypes 2000 2005 2010 2015 2020
  • 11. History of Stream Processing 11 NiagaraCQ STREAM Aurora Borealis Research prototypes 2000 2005 2010 2015 2020
  • 12. 12 Stream processing framework Streaming database Streaming regime
  • 13. 13 Stream processing framework Streaming database Batch processing framework Data warehouse Streaming regime Batching regime Counterpart Counterpart
  • 14. Flink vs. RisingWave • Applications and use cases • User interface • Internal architecture 14
  • 15. Applications and Use Cases 15 1 microsecond 1 millisecond 1 second 1 minute 1 hour 1 day High-frequency trading Fraud detection IoT computing Ads recommendation Stock dashboarding Delivery app Inventory tracking ML training Data science Accounting Network monitoring Travel booking
  • 16. Applications and Use Cases • Streaming ETL • Continuously ingest data from upstream systems, perform transformations, and deliver results to downstream systems • Streaming analytics • Monitoring, alerting, automation, etc… 16
  • 17. Applications and Use Cases • Streaming ETL • Continuously ingest data from upstream systems, perform transformations, and deliver results to downstream systems • Streaming analytics • Monitoring, alerting, automation, etc… 17 Databases Messaging systems File systems
  • 18. Applications and Use Cases • Streaming ETL • Continuously ingest data from upstream systems, perform transformations, and deliver results to downstream systems • Streaming analytics • Monitoring, alerting, automation, etc… 18 Databases Messaging systems File systems Serving systems Databases Messaging systems File systems
  • 19. User Interface 19 MapReduce-style API, SQL/Python wrapper PostgreSQL-compatible, Python UDF
  • 20. User Interface 20 MapReduce-style API, SQL/Python wrapper Flink job to represent a data processing pipeline PostgreSQL-compatible, Python UDF Materialized view to represent a data processing pipeline
  • 21. User Interface 21 MapReduce-style API, SQL/Python wrapper Flink job to represent a data processing pipeline Each Flink job is independent PostgreSQL-compatible, Python UDF Materialized view to represent a data processing pipeline Materialized views can be dependent Flink job1 Flink job1 Flink job3 MV1 MV2 MV3 MV4 MV5 MV6
  • 22. Internal Architecture • Execution performance • Failure recovery • Elastic scaling 22 State management
  • 23. Internal Architecture • Consider joining two data streams • Impression stream • Click stream 23 23 Output (adId, impressionTime, clickTime) Impression (adId, impressionTime) Click (adId, clickTime) State State Hash table for click stream Hash table for impression stream How to manage internal states?
  • 24. Internal Architecture • Consider joining two data streams • Impression stream • Click stream 24 24 Output (adId, impressionTime, clickTime) Impression (adId, impressionTime) Click (adId, clickTime) State State Hash table for click stream Hash table for impression stream Burst! How to manage internal states?
  • 25. Internal Architecture 25 MapReduce style, compute-storage coupled Cloud-native style, compute-storage decoupled State State State State Storage (S3) Compute (EC2) State Storage (S3) Compute (EC2) State
  • 26. Internal Architecture 26 MapReduce style, compute-storage coupled Cloud-native style, compute-storage decoupled State State State State Storage (S3) Compute (EC2) State Storage (S3) Compute (EC2) State Optimized for performance! Optimized for cost-efficiency!
  • 27. Internal Architecture (Failure Recovery) 27 State State State States State State State Compute nodes Persistent storage States Checkpoint Cache Cache Cache “state as checkpoint”
  • 28. Internal Architecture (Failure Recovery) 28 State State State States State State State Compute nodes Persistent storage States Checkpoint Cache Cache Cache “state as checkpoint” State Read from remote state Recover from checkpoint
  • 29. Internal Architecture (Elastic Scaling) 29 State State State States State State State Compute nodes Persistent storage States Checkpoint Cache Cache Cache “state as checkpoint” Scale out Scale out
  • 30. Summary Applications and use cases Streaming ETL and streaming analytics User interface Low-level abstractions (Java) and high- level wrappers (SQL and Python) PostgreSQL-style SQL with Python UDF support Use Flink jobs to represent stream processing pipelines; Flink jobs are independent Use materialized views to represent stream processing pipelines; materialized views can be dependent with resource sharing enabled Internal architecture Optimized for performance Optimized for cost-efficiency Slow in failure recovery Fast in failure recovery Slow in elastic scaling Fast in elastic scaling 30