SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
Ufuk Celebi
uce@apache.org
HUG London
October 15, 2015
Streaming Data Flow with
Apache Flink
Recent History
April ‘14 December ‘14
v0.5 v0.6 v0.7
April ‘15
Project
Incubation
Top Level
Project
v0.8 v0.9
Currently moving towards 0.10 and 1.0 release.
What is Flink?
Streaming
Topologies
Stream
Time
Window
Count
Low Latency
Long Batch Pipelines
Resource Utilization
1.2
1.4
1.5
1.2
0.8
0.9
1.0
0.8
Rating Matrix User Matrix Item Matrix
1.5
1.7
1.2
0.6
1.0
1.1
0.8
0.4
W X Y ZW X Y Z
A
B
C
D
4.0
4.5
5.0
3.5
2.0
3.5
4.0
2.0
1.0
= X
User
Machine Learning
Iterative Algorithms
Graph Analysis
53
1 2
4
0.5
0.2 0.9
0.3
0.1
0.4
0.7
Mutable State
Overview
Deployment

Local (Single JVM) · Cluster (Standalone, YARN)
DataStream API
Unbounded Data
DataSet API
Bounded Data
Runtime
Distributed Streaming Data Flow
Libraries
Machine Learning · Graph Processing · SQL-like API
Stream Processing
Real world data is unbounded and is pushed to
systems.
BatchStreaming
Stream Platform Architecture
Server
Logs
Trxn
Logs
Sensor
Logs
Downstream
Systems
Flink
– Analyze and correlate streams
– Create derived streams
Kafka
– Gather and backup streams
– Offer streams
Cornerstones of Flink
Low Latency for fast results.
High Throughput to handle many events per second.
Exactly-once guarantees for correct results.
Intuitive APIs for productivity.
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
Pipelining
s1 t1 w1
s2 t2 w2
Source Tokenizer Window Count
Complete Pipeline Online Concurrently.
Pipelining
s1 t1 w1
s2 t2 w2
Source Tokenizer Window Count
Complete Pipeline Online Concurrently.
Chained tasks
Pipelining
s1
s2 t2 w2
t1 w1
Source Tokenizer Window Count
Complete Pipeline Online Concurrently.
Chained tasks Pipelined Shuffle
Streaming Fault Tolerance
At Least Once
• Ensure that all operators see all events.
Exactly Once
• Ensure that all operators see all events.
• Do not perform duplicates updates to operator state.
Streaming Fault Tolerance
At Least Once
• Ensure that all operators see all events.
Exactly Once
• Ensure that all operators see all events.
• Do not perform duplicates updates to operator state.
Flink guarantees exactly once processing.
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: State 1:
Source 2: State 2:
Source 3: Sink 1:
Source 4: Sink 2:
Offset: 6791
Offset: 7252
Offset: 5589
Offset: 6843
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: State 1:
Source 2: State 2:
Source 3: Sink 1:
Source 4: Sink 2:
Offset: 6791
Offset: 7252
Offset: 5589
Offset: 6843
Start Checkpoint
Message
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1:
Source 2: 7252 State 2:
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
Emit Barriers
Acknowledge with
Position
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1:
Source 2: 7252 State 2:
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
Received barrier
at each input
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1:
Source 2: 7252 State 2:
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
s1 Write Snapshot
of its state
Received barrier
at each input
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1: PTR1
Source 2: 7252 State 2: PTR2
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
s1
Acknowledge with
pointer to state
s2
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1: PTR1
Source 2: 7252 State 2: PTR2
Source 3: 5589 Sink 1: ACK
Source 4: 6843 Sink 2: ACK
s1 s2
Acknowledge Checkpoint
Received barrier
at each input
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1: PTR1
Source 2: 7252 State 2: PTR2
Source 3: 5589 Sink 1: ACK
Source 4: 6843 Sink 2: ACK
s1 s2
Operator State
User-defined state
• Flink’s transformations are long running operators
• Feel free to keep objects around
• Hooks to include into system’s checkpoint
Windowed streams
• Time, count, and data-driven windows
• Managed by the system
Batch on Streaming
DataStream API
Unbounded Data
DataSet API
Bounded Data
Runtime
Distributed Streaming Data Flow
Libraries
Machine Learning · Graph Processing · SQL-like API
Batch on Streaming
Run a bounded stream (data set) on

a stream processor.
Bounded
data set
Unbounded
data stream
Batch on Streaming
Stream Windows
Pipelined
Data Exchange
Global View
Pipelined or Blocking
Data Exchange
Infinite Streams Finite Streams
Run a bounded stream (data set) on

a stream processor.
Batch Pipelines
Batch Pipelines
Data exchange

is mostly streamed
Batch Pipelines
Data exchange

is mostly streamed
Some operators block
(e.g. sort, hash table)
DataSet API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
Batch-specific optimizations
Managed memory
• On- and off-heap memory
• Internal operators (e.g. join or sort) with out-of-core
support
• Serialization stack for user-types
Cost-based optimizer
• Program adapts to changing data size
Getting Started
Project Page: http://flink.apache.org
Getting Started
Project Page: http://flink.apache.org
Quickstarts: Java & Scala API
Getting Started
Project Page: http://flink.apache.org
Docs: Programming Guides
Getting Started
Project Page: http://flink.apache.org
Get Involved: Mailing Lists, Stack Overflow, IRC, …
Blogs
http://flink.apache.org/blog
http://data-artisans.com/blog
Twitter
@ApacheFlink
Mailing lists
(news|user|dev)@flink.apache.org
Apache Flink

Contenu connexe

Tendances

Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for androidEsa Firman
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoakleneau
 
An Introduction to Reactive Cocoa
An Introduction to Reactive CocoaAn Introduction to Reactive Cocoa
An Introduction to Reactive CocoaSmartLogic
 
Functional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJSFunctional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJSOswald Campesato
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the webMichiel Borkent
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015Michiel Borkent
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster DivingRonnBlack
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJSBrainhub
 
ReactiveCocoa in Practice
ReactiveCocoa in PracticeReactiveCocoa in Practice
ReactiveCocoa in PracticeOutware Mobile
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelArangoDB Database
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]Igor Lozynskyi
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidEgor Andreevich
 
Luis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio
 

Tendances (20)

Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for android
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoa
 
An Introduction to Reactive Cocoa
An Introduction to Reactive CocoaAn Introduction to Reactive Cocoa
An Introduction to Reactive Cocoa
 
Rxjs ppt
Rxjs pptRxjs ppt
Rxjs ppt
 
Functional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJSFunctional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJS
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the web
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015
 
Parallel streams in java 8
Parallel streams in java 8Parallel streams in java 8
Parallel streams in java 8
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster Diving
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJS
 
ReactiveCocoa in Practice
ReactiveCocoa in PracticeReactiveCocoa in Practice
ReactiveCocoa in Practice
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
 
Full Stack Clojure
Full Stack ClojureFull Stack Clojure
Full Stack Clojure
 
Presto overview
Presto overviewPresto overview
Presto overview
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
Rxjava meetup presentation
Rxjava meetup presentationRxjava meetup presentation
Rxjava meetup presentation
 
Rxjs ngvikings
Rxjs ngvikingsRxjs ngvikings
Rxjs ngvikings
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich Android
 
Luis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio on RxJS
Luis Atencio on RxJS
 

Similaire à Streaming Dataflow with Apache Flink

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Till Rohrmann
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go BadSteve Loughran
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward
 
Going Reactive with Relational Databases
Going Reactive with Relational DatabasesGoing Reactive with Relational Databases
Going Reactive with Relational DatabasesIvaylo Pashov
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server InternalsPraveen Gollakota
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Loophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in ChromeLoophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in Chromecgvwzq
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogicRakuten Group, Inc.
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftDmitry Sotnikov
 
Journey through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicJourney through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicSmartCat
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward
 
Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5SmartCat
 
A journey through the machine learning model deployment to production
A journey through the machine learning model deployment to productionA journey through the machine learning model deployment to production
A journey through the machine learning model deployment to productionInstitute of Contemporary Sciences
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 

Similaire à Streaming Dataflow with Apache Flink (20)

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go Bad
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
 
Going Reactive with Relational Databases
Going Reactive with Relational DatabasesGoing Reactive with Relational Databases
Going Reactive with Relational Databases
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Loophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in ChromeLoophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in Chrome
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack Swift
 
Journey through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicJourney through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko Kuveljic
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
 
Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5
 
A journey through the machine learning model deployment to production
A journey through the machine learning model deployment to productionA journey through the machine learning model deployment to production
A journey through the machine learning model deployment to production
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 

Plus de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaminghuguk
 

Plus de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaming
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Streaming Dataflow with Apache Flink

  • 1. Ufuk Celebi uce@apache.org HUG London October 15, 2015 Streaming Data Flow with Apache Flink
  • 2. Recent History April ‘14 December ‘14 v0.5 v0.6 v0.7 April ‘15 Project Incubation Top Level Project v0.8 v0.9 Currently moving towards 0.10 and 1.0 release.
  • 3. What is Flink? Streaming Topologies Stream Time Window Count Low Latency Long Batch Pipelines Resource Utilization 1.2 1.4 1.5 1.2 0.8 0.9 1.0 0.8 Rating Matrix User Matrix Item Matrix 1.5 1.7 1.2 0.6 1.0 1.1 0.8 0.4 W X Y ZW X Y Z A B C D 4.0 4.5 5.0 3.5 2.0 3.5 4.0 2.0 1.0 = X User Machine Learning Iterative Algorithms Graph Analysis 53 1 2 4 0.5 0.2 0.9 0.3 0.1 0.4 0.7 Mutable State
  • 4. Overview Deployment
 Local (Single JVM) · Cluster (Standalone, YARN) DataStream API Unbounded Data DataSet API Bounded Data Runtime Distributed Streaming Data Flow Libraries Machine Learning · Graph Processing · SQL-like API
  • 5. Stream Processing Real world data is unbounded and is pushed to systems. BatchStreaming
  • 6. Stream Platform Architecture Server Logs Trxn Logs Sensor Logs Downstream Systems Flink – Analyze and correlate streams – Create derived streams Kafka – Gather and backup streams – Offer streams
  • 7. Cornerstones of Flink Low Latency for fast results. High Throughput to handle many events per second. Exactly-once guarantees for correct results. Intuitive APIs for productivity.
  • 8. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 9. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 10. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 11. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 12. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 13. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 14. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 15. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 16. Pipelining s1 t1 w1 s2 t2 w2 Source Tokenizer Window Count Complete Pipeline Online Concurrently.
  • 17. Pipelining s1 t1 w1 s2 t2 w2 Source Tokenizer Window Count Complete Pipeline Online Concurrently. Chained tasks
  • 18. Pipelining s1 s2 t2 w2 t1 w1 Source Tokenizer Window Count Complete Pipeline Online Concurrently. Chained tasks Pipelined Shuffle
  • 19.
  • 20. Streaming Fault Tolerance At Least Once • Ensure that all operators see all events. Exactly Once • Ensure that all operators see all events. • Do not perform duplicates updates to operator state.
  • 21. Streaming Fault Tolerance At Least Once • Ensure that all operators see all events. Exactly Once • Ensure that all operators see all events. • Do not perform duplicates updates to operator state. Flink guarantees exactly once processing.
  • 22. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 23. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 24. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 25. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 26. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 27. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 28. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 29. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 30. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 31. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 32. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: State 1: Source 2: State 2: Source 3: Sink 1: Source 4: Sink 2: Offset: 6791 Offset: 7252 Offset: 5589 Offset: 6843
  • 33. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: State 1: Source 2: State 2: Source 3: Sink 1: Source 4: Sink 2: Offset: 6791 Offset: 7252 Offset: 5589 Offset: 6843 Start Checkpoint Message
  • 34. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: Source 2: 7252 State 2: Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: Emit Barriers Acknowledge with Position
  • 35. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: Source 2: 7252 State 2: Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: Received barrier at each input
  • 36. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: Source 2: 7252 State 2: Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: s1 Write Snapshot of its state Received barrier at each input
  • 37. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: PTR1 Source 2: 7252 State 2: PTR2 Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: s1 Acknowledge with pointer to state s2
  • 38. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: PTR1 Source 2: 7252 State 2: PTR2 Source 3: 5589 Sink 1: ACK Source 4: 6843 Sink 2: ACK s1 s2 Acknowledge Checkpoint Received barrier at each input
  • 39. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: PTR1 Source 2: 7252 State 2: PTR2 Source 3: 5589 Sink 1: ACK Source 4: 6843 Sink 2: ACK s1 s2
  • 40. Operator State User-defined state • Flink’s transformations are long running operators • Feel free to keep objects around • Hooks to include into system’s checkpoint Windowed streams • Time, count, and data-driven windows • Managed by the system
  • 41. Batch on Streaming DataStream API Unbounded Data DataSet API Bounded Data Runtime Distributed Streaming Data Flow Libraries Machine Learning · Graph Processing · SQL-like API
  • 42. Batch on Streaming Run a bounded stream (data set) on
 a stream processor. Bounded data set Unbounded data stream
  • 43. Batch on Streaming Stream Windows Pipelined Data Exchange Global View Pipelined or Blocking Data Exchange Infinite Streams Finite Streams Run a bounded stream (data set) on
 a stream processor.
  • 46. Batch Pipelines Data exchange
 is mostly streamed Some operators block (e.g. sort, hash table)
  • 47. DataSet API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 48. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 49. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 50. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 51. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 52. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 53. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 54. Batch-specific optimizations Managed memory • On- and off-heap memory • Internal operators (e.g. join or sort) with out-of-core support • Serialization stack for user-types Cost-based optimizer • Program adapts to changing data size
  • 55. Getting Started Project Page: http://flink.apache.org
  • 56. Getting Started Project Page: http://flink.apache.org Quickstarts: Java & Scala API
  • 57. Getting Started Project Page: http://flink.apache.org Docs: Programming Guides
  • 58. Getting Started Project Page: http://flink.apache.org Get Involved: Mailing Lists, Stack Overflow, IRC, …