SlideShare une entreprise Scribd logo
1  sur  36
© 2019 Ververica
Aljoscha Krettek – Software Engineer, Flink PMC
Stephan Ewen – CTO, Flink PMC/Chair
Towards Flink 2.0: Rethinking the Stack and APIs
to unify Batch & Streaming
© 2019 Ververica2
This is joint work with many members of
the Apache Flink community
Timo, Dawid, Shaoxuan, Kurt, Guowei, Becket, Jincheng, Fabian, Till, Andrey, Gary, Piotr, Stefan, etc.
And many others …
© 2019 Ververica3
Some of this presents work that is in
progress in the Flink community. Other
things are planned and/or have design
documents. Some were discussed at one
point or another on the mailing lists or in
person.
This represents our understaning of the
current state, this is not a fixed roadmap,
Flink is an open-source Apache project.
© 2019 Ververica
Batch and Streaming
© 2019 Ververica5
Everything Is a Stream
© 2019 Ververica6
What changes faster? Data or Query?
ad-hoc queries, data exploration,
ML training and
(hyper) parameter tuning
continuous applications,
data pipelines, standing queries,
anomaly detection, ML evaluation, …
Data changes slowly
compared to fast
changing queries
Data changes fast
application logic
is long-lived
“batch” “streaming”
© 2019 Ververica7
Latency vs. Completeness (for geeks)
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
2016
Rogue
One
III.5
2017
Episode
VIII
© 2019 Ververica8
Latency vs. Completeness (more formally)
*from the excellent Streaming 101 by Tyler Akidau:
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-
© 2019 Ververica9
Latency vs. Completeness
Bounded/
Batch
Unbounded/
Streaming
Data is as complete
as it gets within that
Batch Job
No fine latency control
Trade of latency
versus completeness
Processing-time
timers for latency
control
© 2019 Ververica10
What does this mean for processing?
older more recent
files,
message
bus, or …
© 2019 Ververica11
Stream-style processing
older more recent
watermark
unprocesse
d
© 2019 Ververica12
Batch-style processing
older more recent
watermark
unprocesse
d
© 2019 Ververica13
Batch and Streaming Processing Styles
S S S
M M M
R R R
S S S
S S S
M M M
R R R
S S S
more batch-y more stream-y
running
not running
can do things one-by-one everything is always-on
running
© 2019 Ververica14
So this is the reason why we have
different APIs, different batch and
stream processing systems?
• Different requirements
• Optimization potential for batch and
streaming
• Also: historic developments and slow-
changing organizations
💡
© 2019 Ververica
Current Stack
© 2019 Ververica16
Runtime
Tasks / JobGraph / Network Interconnect / Fault Tolerance
DataSet
“Operators“ and Drivers / Operator Graph / Visitors
DataStream
StreamOperators / StreamTransformation Graph* /
Monolithic Translators
Table API / SQL
Logical Nodes* / Different translation paths
© 2019 Ververica17
What could be improved?
• Each API has its own internal graph representation  code duplication
• Multiple translation components between the different graphs  code duplication
– DataStream API has an intermediate graph structure: StreamTransformation  StreamGraph 
JobGraph
• Separate (incompatible) operator implementations
– DataStream API has StreamOperator, DataSet API has Drivers  two map operators, two flatMap
operators
– These are run by different lower-level Tasks
– DataSet operators are optimized for different requirements than DataSet operators
• Table API is translated to two distinct lower-level APIs  two different translation
stacks
– ”project operator” for DataStream and for DataSet
• Connectors for each API are separate  a whole bunch of connectors all over the
From a system design/code quality/architecture/development perspective
© 2019 Ververica18
What does this mean for users?
• You have to decide between DataSet and DataStream when writing a job
– Two (slightly) different APIs, with different capabilities
– Different set of supported connectors: no Kafka DataSet connector, no HBase DataStream connector
– Different performance characteristics
– Different fault-tolerance behavior
– Different scheduling logic
• With Table API, you only have to learn one API
– Still, the set of supported connectors depends on the underlying execution API
– Feature set depends on whether there is an implementation for your underlying API
• You cannot combine more batch-y with more stream-y sources/sinks
• A “soft problem”: with two stacks of everything, less developer power will go into
each one individual stack  less features, worse performance, more bugs that are
fixed slower
© 2019 Ververica
Future Stack
© 2019 Ververica20
DataSet
“Operators“ and Drivers / Operator
Graph / Visitors
DataStream
StreamOperator /
StreamTransformation Graph* /
Monolithic Translators
Batch is a subset of streaming!
Can’t we just?
✅❌
Done!
🥳
© 2019 Ververica21
Unifying the Batch and Streaming APIs
• DataStream API functionality is already a superset of DataSet API functionality
• We need to introduce BoundedStream to harness optimization potential, semantics
are clear from earlier:
–No processing-time timers
–Watermark “jumps” from –Infinity to +Infinity at end of processing
• DataStream translation and runtime (operators) need to be enhanced to use the
added optimization potential
• Streaming execution is the generic case that always works, “batch” enables
additional “optimization rules”: bounded operators, different scheduling  we get
feature parity automatically ✅
• Sources need to be unified as well  see later
© 2019 Ververica22
A typical “unified” use case: Bootstrapping state
* See for example Bootstrapping State in Flink by Gregory Fee https://sf-2018.flink-
“stream
”
source
“batch”
source
Stateful
operatio
n
batch-y partstream-y part
• We have a streaming use
case
• We want to bootstrap the
state of some operations
from a historical source
• First execute bounded parts
of the graph, then start the
rest
© 2019 Ververica23
Under-the-hood Changes
• StreamTransformation/StreamGraph
need to be beefed up to carry the
additional information about
boundedness
• Translation, scheduling, deployment,
memory management and network stack
needs to take this into account
Graph Representation / DAG Operator / Task
• StreamOperator needs to support
batch-style execution  see next slide
• Network stack must eventually support
blocking inputs
© 2019 Ververica24
Selective Push Model Operator FLINK-11875
batch: pull-based operator (or Driver) streaming: push-based StreamOperator
• StreamOperator needs additional API
to tell the runtime which input to
consume
• Network stack/graph needs to be
enhanced to deal with blocking inputs,
😱
© 2019 Ververica25
Current Source Interfaces
InputFormat
createInputSplits(): splits
openSplit(split)
assignInputSplit()
nextRecord(): T
closeCurrentSplit()
SourceFunction
run(OutputContext)
close()
batch streaming
© 2019 Ververica26
Batch InputFormat Processing
TaskManager TaskManager TaskManager
JobManager
(1) request split
(2) send split
(3) process split
• Splits are assigned to TaskManagers by the JobManager, which runs a copy of
the InputFormat  Flink knows about splits and can be clever about scheduling,
be reactive
• Splits can be processed in arbitrary order
• Split processing pulls records from the InputFormat
• InputFormat knows nothing about watermarks, timestamps, checkpointing  bad
for streaming
© 2019 Ververica27
Stream SourceFunction Processing
• Source have a run-loop that they manage completely on their own
• Sources have flexibility and can efficiently work with the source system: batch
accesses, dealing with multiple topics from one consumer, threading model,
etc…
• Flink does not know what’s going on inside and can’t be clever about it
• Sources have to implement their own per-partition watermarking, idleness
tracking, what have you
TaskManagerTaskManager TaskManager
(1) do your thing
© 2019 Ververica28
A New (unified) Source Interface
• This must support both batch and streaming use cases, allow Flink to be clever, be
able to deal with event-time, watermarks, source idiosyncrasies, and enable
snapshotting
• This should enable new features: generic idleness detection, event-time alignment
FLIP-27
Source
createSplitEnumerator()
createSplitReader
SplitEnumerator
discoverNewSplits()
nextSplit()
snapshotState()
isDone()
SplitReader
addSplit()
hasAvailable(): Future
snapshotState()
emitNext(Context): Status
* FLINK-10886: Event-time alignment for sources; Jamie Grier (Lyft) contributed the first parts of this
© 2019 Ververica29
A New (unified) SourceInterface: Execution Style I
TaskManager TaskManager TaskManager
JobManager
(1) request split
(2) send split
(3) process split
• Splits are assigned to TaskManagers by the JobManager, which runs a copy of
the SplitEnumerator  Flink knows about splits and can be clever about
scheduling, be reactive
• Splits can be processed in arbitrary order
• Split processing is driven by the TaskManager working with SplitReader
• SplitReader emits watermarks but Flink deals with idleness, per-split
watermarking
© 2019 Ververica30
A New (unified) SourceInterface: Execution Style II
© 2019 Ververica31
Table API / SQL
• API: easy, it’s already unified ✅
• Translation and runtime (operators) need to be enhanced to use the added
optimization potential but use the StreamOperator for both batch and streaming
style execution
• Streaming execution is the generic case that always works, “batch” enables
additional “optimization rules”: bounded operators, different scheduling  we get
feature parity automatically ✅
• Sources will be unified from the unified source interface ✅
• This is already available in the Blink fork (by Alibaba), FLINK-11439 is the effort of
getting that into Flink
© 2019 Ververica32
StreamTransformation DAG / StreamOperator
DataStream
“Physical“ Application API
Table API / SQL
Declarative API
Runtime
The Future
Stack
© 2019 Ververica
Closing
© 2019 Ververica34
Summary
• Semantics of unified processing are quite clear already
• For some things, work is ongoing and there are design documents (FLIPs)
• Some things are farther in the future
• This project requires changes in all components/layers of Flink:
–API, deployment, network stack, scheduling, fault tolerance
• You can follow all of this on the public mailing lists and FLIPs!
• The Flink tech stack is going to be quite nice! 😎
© 2019 Ververica
www.ververica.com @VervericaDataaljoscha@ververica.com
stephan@ververica.com
© 2019 Ververica
Thank you!
Questions?

Contenu connexe

Tendances

Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Do Flink on Web with FLOW
Do Flink on Web with FLOWDo Flink on Web with FLOW
Do Flink on Web with FLOWDongwon Kim
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksEron Wright
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
 
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu QianVirtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu QianFlink Forward
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Flink Forward
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkTill Rohrmann
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
 
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangVirtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangFlink Forward
 
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...Flink Forward
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Flink Forward
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward
 
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Flink Forward
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product AnnouncementFlink Forward
 

Tendances (19)

Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Do Flink on Web with FLOW
Do Flink on Web with FLOWDo Flink on Web with FLOW
Do Flink on Web with FLOW
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu QianVirtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...Towards Flink 2.0:  Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Ver...
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangVirtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
 
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
 
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
 

Similaire à Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and APIs to unify Batch & Stream - Stephan Ewen & Aljoscha Krettek

Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Flink Forward
 
What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?Timo Walther
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®Aljoscha Krettek
 
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaRoyston Lobo
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online TrainingLearntek1
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaInfluxData
 
Twelve-Factor application pattern with Spring Framework
Twelve-Factor application pattern with Spring FrameworkTwelve-Factor application pattern with Spring Framework
Twelve-Factor application pattern with Spring Frameworkdinkar thakur
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache ApexPramod Immaneni
 
Spring_Boot_Microservices-5_Day_Session.pptx
Spring_Boot_Microservices-5_Day_Session.pptxSpring_Boot_Microservices-5_Day_Session.pptx
Spring_Boot_Microservices-5_Day_Session.pptxPrabhakaran Ravichandran
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-finalshintaro mizuno
 
2015 UJUG, Servlet 4.0 portion
2015 UJUG, Servlet 4.0 portion2015 UJUG, Servlet 4.0 portion
2015 UJUG, Servlet 4.0 portionmnriem
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...apidays
 

Similaire à Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and APIs to unify Batch & Stream - Stephan Ewen & Aljoscha Krettek (20)

Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Twelve-Factor application pattern with Spring Framework
Twelve-Factor application pattern with Spring FrameworkTwelve-Factor application pattern with Spring Framework
Twelve-Factor application pattern with Spring Framework
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
 
Flink SQL in Action
Flink SQL in ActionFlink SQL in Action
Flink SQL in Action
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Spring_Boot_Microservices-5_Day_Session.pptx
Spring_Boot_Microservices-5_Day_Session.pptxSpring_Boot_Microservices-5_Day_Session.pptx
Spring_Boot_Microservices-5_Day_Session.pptx
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-final
 
2015 UJUG, Servlet 4.0 portion
2015 UJUG, Servlet 4.0 portion2015 UJUG, Servlet 4.0 portion
2015 UJUG, Servlet 4.0 portion
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 

Plus de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Plus de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Dernier

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and APIs to unify Batch & Stream - Stephan Ewen & Aljoscha Krettek

  • 1. © 2019 Ververica Aljoscha Krettek – Software Engineer, Flink PMC Stephan Ewen – CTO, Flink PMC/Chair Towards Flink 2.0: Rethinking the Stack and APIs to unify Batch & Streaming
  • 2. © 2019 Ververica2 This is joint work with many members of the Apache Flink community Timo, Dawid, Shaoxuan, Kurt, Guowei, Becket, Jincheng, Fabian, Till, Andrey, Gary, Piotr, Stefan, etc. And many others …
  • 3. © 2019 Ververica3 Some of this presents work that is in progress in the Flink community. Other things are planned and/or have design documents. Some were discussed at one point or another on the mailing lists or in person. This represents our understaning of the current state, this is not a fixed roadmap, Flink is an open-source Apache project.
  • 4. © 2019 Ververica Batch and Streaming
  • 6. © 2019 Ververica6 What changes faster? Data or Query? ad-hoc queries, data exploration, ML training and (hyper) parameter tuning continuous applications, data pipelines, standing queries, anomaly detection, ML evaluation, … Data changes slowly compared to fast changing queries Data changes fast application logic is long-lived “batch” “streaming”
  • 7. © 2019 Ververica7 Latency vs. Completeness (for geeks) 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time 2016 Rogue One III.5 2017 Episode VIII
  • 8. © 2019 Ververica8 Latency vs. Completeness (more formally) *from the excellent Streaming 101 by Tyler Akidau: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-
  • 9. © 2019 Ververica9 Latency vs. Completeness Bounded/ Batch Unbounded/ Streaming Data is as complete as it gets within that Batch Job No fine latency control Trade of latency versus completeness Processing-time timers for latency control
  • 10. © 2019 Ververica10 What does this mean for processing? older more recent files, message bus, or …
  • 11. © 2019 Ververica11 Stream-style processing older more recent watermark unprocesse d
  • 12. © 2019 Ververica12 Batch-style processing older more recent watermark unprocesse d
  • 13. © 2019 Ververica13 Batch and Streaming Processing Styles S S S M M M R R R S S S S S S M M M R R R S S S more batch-y more stream-y running not running can do things one-by-one everything is always-on running
  • 14. © 2019 Ververica14 So this is the reason why we have different APIs, different batch and stream processing systems? • Different requirements • Optimization potential for batch and streaming • Also: historic developments and slow- changing organizations 💡
  • 16. © 2019 Ververica16 Runtime Tasks / JobGraph / Network Interconnect / Fault Tolerance DataSet “Operators“ and Drivers / Operator Graph / Visitors DataStream StreamOperators / StreamTransformation Graph* / Monolithic Translators Table API / SQL Logical Nodes* / Different translation paths
  • 17. © 2019 Ververica17 What could be improved? • Each API has its own internal graph representation  code duplication • Multiple translation components between the different graphs  code duplication – DataStream API has an intermediate graph structure: StreamTransformation  StreamGraph  JobGraph • Separate (incompatible) operator implementations – DataStream API has StreamOperator, DataSet API has Drivers  two map operators, two flatMap operators – These are run by different lower-level Tasks – DataSet operators are optimized for different requirements than DataSet operators • Table API is translated to two distinct lower-level APIs  two different translation stacks – ”project operator” for DataStream and for DataSet • Connectors for each API are separate  a whole bunch of connectors all over the From a system design/code quality/architecture/development perspective
  • 18. © 2019 Ververica18 What does this mean for users? • You have to decide between DataSet and DataStream when writing a job – Two (slightly) different APIs, with different capabilities – Different set of supported connectors: no Kafka DataSet connector, no HBase DataStream connector – Different performance characteristics – Different fault-tolerance behavior – Different scheduling logic • With Table API, you only have to learn one API – Still, the set of supported connectors depends on the underlying execution API – Feature set depends on whether there is an implementation for your underlying API • You cannot combine more batch-y with more stream-y sources/sinks • A “soft problem”: with two stacks of everything, less developer power will go into each one individual stack  less features, worse performance, more bugs that are fixed slower
  • 20. © 2019 Ververica20 DataSet “Operators“ and Drivers / Operator Graph / Visitors DataStream StreamOperator / StreamTransformation Graph* / Monolithic Translators Batch is a subset of streaming! Can’t we just? ✅❌ Done! 🥳
  • 21. © 2019 Ververica21 Unifying the Batch and Streaming APIs • DataStream API functionality is already a superset of DataSet API functionality • We need to introduce BoundedStream to harness optimization potential, semantics are clear from earlier: –No processing-time timers –Watermark “jumps” from –Infinity to +Infinity at end of processing • DataStream translation and runtime (operators) need to be enhanced to use the added optimization potential • Streaming execution is the generic case that always works, “batch” enables additional “optimization rules”: bounded operators, different scheduling  we get feature parity automatically ✅ • Sources need to be unified as well  see later
  • 22. © 2019 Ververica22 A typical “unified” use case: Bootstrapping state * See for example Bootstrapping State in Flink by Gregory Fee https://sf-2018.flink- “stream ” source “batch” source Stateful operatio n batch-y partstream-y part • We have a streaming use case • We want to bootstrap the state of some operations from a historical source • First execute bounded parts of the graph, then start the rest
  • 23. © 2019 Ververica23 Under-the-hood Changes • StreamTransformation/StreamGraph need to be beefed up to carry the additional information about boundedness • Translation, scheduling, deployment, memory management and network stack needs to take this into account Graph Representation / DAG Operator / Task • StreamOperator needs to support batch-style execution  see next slide • Network stack must eventually support blocking inputs
  • 24. © 2019 Ververica24 Selective Push Model Operator FLINK-11875 batch: pull-based operator (or Driver) streaming: push-based StreamOperator • StreamOperator needs additional API to tell the runtime which input to consume • Network stack/graph needs to be enhanced to deal with blocking inputs, 😱
  • 25. © 2019 Ververica25 Current Source Interfaces InputFormat createInputSplits(): splits openSplit(split) assignInputSplit() nextRecord(): T closeCurrentSplit() SourceFunction run(OutputContext) close() batch streaming
  • 26. © 2019 Ververica26 Batch InputFormat Processing TaskManager TaskManager TaskManager JobManager (1) request split (2) send split (3) process split • Splits are assigned to TaskManagers by the JobManager, which runs a copy of the InputFormat  Flink knows about splits and can be clever about scheduling, be reactive • Splits can be processed in arbitrary order • Split processing pulls records from the InputFormat • InputFormat knows nothing about watermarks, timestamps, checkpointing  bad for streaming
  • 27. © 2019 Ververica27 Stream SourceFunction Processing • Source have a run-loop that they manage completely on their own • Sources have flexibility and can efficiently work with the source system: batch accesses, dealing with multiple topics from one consumer, threading model, etc… • Flink does not know what’s going on inside and can’t be clever about it • Sources have to implement their own per-partition watermarking, idleness tracking, what have you TaskManagerTaskManager TaskManager (1) do your thing
  • 28. © 2019 Ververica28 A New (unified) Source Interface • This must support both batch and streaming use cases, allow Flink to be clever, be able to deal with event-time, watermarks, source idiosyncrasies, and enable snapshotting • This should enable new features: generic idleness detection, event-time alignment FLIP-27 Source createSplitEnumerator() createSplitReader SplitEnumerator discoverNewSplits() nextSplit() snapshotState() isDone() SplitReader addSplit() hasAvailable(): Future snapshotState() emitNext(Context): Status * FLINK-10886: Event-time alignment for sources; Jamie Grier (Lyft) contributed the first parts of this
  • 29. © 2019 Ververica29 A New (unified) SourceInterface: Execution Style I TaskManager TaskManager TaskManager JobManager (1) request split (2) send split (3) process split • Splits are assigned to TaskManagers by the JobManager, which runs a copy of the SplitEnumerator  Flink knows about splits and can be clever about scheduling, be reactive • Splits can be processed in arbitrary order • Split processing is driven by the TaskManager working with SplitReader • SplitReader emits watermarks but Flink deals with idleness, per-split watermarking
  • 30. © 2019 Ververica30 A New (unified) SourceInterface: Execution Style II
  • 31. © 2019 Ververica31 Table API / SQL • API: easy, it’s already unified ✅ • Translation and runtime (operators) need to be enhanced to use the added optimization potential but use the StreamOperator for both batch and streaming style execution • Streaming execution is the generic case that always works, “batch” enables additional “optimization rules”: bounded operators, different scheduling  we get feature parity automatically ✅ • Sources will be unified from the unified source interface ✅ • This is already available in the Blink fork (by Alibaba), FLINK-11439 is the effort of getting that into Flink
  • 32. © 2019 Ververica32 StreamTransformation DAG / StreamOperator DataStream “Physical“ Application API Table API / SQL Declarative API Runtime The Future Stack
  • 34. © 2019 Ververica34 Summary • Semantics of unified processing are quite clear already • For some things, work is ongoing and there are design documents (FLIPs) • Some things are farther in the future • This project requires changes in all components/layers of Flink: –API, deployment, network stack, scheduling, fault tolerance • You can follow all of this on the public mailing lists and FLIPs! • The Flink tech stack is going to be quite nice! 😎
  • 35. © 2019 Ververica www.ververica.com @VervericaDataaljoscha@ververica.com stephan@ververica.com
  • 36. © 2019 Ververica Thank you! Questions?

Notes de l'éditeur

  1. Streaming Keep up with real time, some extra capacity for catch-up Receive data roughly in order as produced Latency is important Batch Fast forward through months/years of history Massively parallel unordered reads Throughput most important
  2. Time in data stream must be quasi monotonous to produce time progress (watermarks) Always have close-to-latest incremental results Resource requirements change over time Recovery must catch up very fast
  3. Order of time in data does not matter (parallel unordered reads) Bulk operations (2 phase hash/sort) Longer time for recovery (no low latency SLA) Resource requirements change fast throughout the execution of a single job
  4. Understanding this difference will help later, when we discuss scheduling changes.
  5. Possibly put these on separate slides, with fewer words. Or even some graphics.
  6. Possibly put these on separate slides, with fewer words. Or even some graphics.
  7. There are some quirks when you use DataStream for batch a groupReduce would be window with a GlobalWindow MapPartition would have to finalizing things in close() Joins would have to specify global window Of course, state requirements are bad for the naïve approach, i.e. large state, inefficient access patterns Joins and grouping can be a lot faster with specific algorithms Hash Join, Merge join, etc…
  8. Recall the earlier processing-styles slide: batch wants step by step streaming is all at once This has been mentioned a lot. Lyft has given a talk about this at last FF
  9. For example different window operator Different join implementations The scheduling stuff and networking would be a whole talk on their own. Memory management is another issue.
  10. Pull-based operator is how most databases were/are implemented. Note how the pull model enables hash join, merge join, … Side inputs benefit from a pull-based model Bring the dog-drinking-from-hose example, also for Join operator This will allow porting batch operators/algorithms to StreamOperator
  11. Note that this nicely jibes with the pull-based model. Enables the things we need for batch.
  12. Mention the dog with the hose. Sources just keep spitting out records as fast as they can.