SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
The Flink Big Data 
Analytics Platform 
Marton Balassi, Gyula Fora" 
{mbalassi, gyfora}@apache.org 
Hungarian Academy of Sciences
What is Apache Flink? 
• Started in 2009 by the Berlin-based database 
research groups 
• In the Apache Incubator since mid 2014 
• 61 contributors as of the 0.7.0 release 
Open 
Source 
• Fast, general purpose distributed data processing 
system 
• Combining batch and stream processing 
• Up to 100x faster than Hadoop 
Fast + 
Reliable 
• Programming APIs for Java and Scala 
• Tested on large clusters 
Ready to 
use 
11/18/2014 Flink - M. Balassi & Gy. Fora 2
What is Apache Flink? 
Master 
Worker 
Worker 
Flink 
Cluster 
Analytical Program 
Flink 
Client 
& 
Op0mizer 
11/18/2014 Flink - M. Balassi & Gy. Fora 3
This Talk 
• Introduction to Flink 
• API overview 
• Distinguishing Flink 
• Flink from a user perspective 
• Performance 
• Flink roadmap and closing 
11/18/2014 Flink - M. Balassi & Gy. Fora 4
Open source Big Data Landscape 
Hive 
Mahout 
MapReduce 
Crunch 
Flink 
Spark 
Storm 
Yarn 
Mesos 
HDFS 
Cascading 
Tez 
Pig 
Applica3ons 
Data 
processing 
engines 
App 
and 
resource 
management 
Storage, 
streams 
HBase 
KaCa 
… 
11/18/2014 Flink - M. Balassi & Gy. Fora 5
Flink stack 
Common 
API 
Storage 
Streams 
Hybrid 
Batch/Streaming 
Run0me 
Files 
HDFS 
S3 
Cluster 
Manager 
Na0ve YARN 
EC2 
Flink 
Op0mizer 
Scala 
API 
(batch) 
Graph 
API 
(„Spargel“) 
JDBC 
Rabbit 
Redis 
Azure 
KaCa 
MQ 
… 
Java 
Collec0ons 
Streams 
Builder 
Apache 
Tez 
Python 
API 
Java 
API 
(streaming) 
Apache 
MRQL 
Batch 
Streaming 
Java 
API 
(batch) 
Local 
Execu0on 
11/18/2014 Flink - M. Balassi & Gy. Fora 6
Flink APIs
Programming model 
Data abstractions: Data Set, Data Stream 
DataSet/Stream 
A 
A (1) 
A (2) 
DataSet/Stream 
B 
B (1) 
B (2) 
C (1) 
C (2) 
X 
X 
Y 
Y 
Program 
Parallel Execution 
X Y 
Operator X Operator Y 
DataSet/Stream 
C 
11/18/2014 Flink - M. Balassi & Gy. Fora 8
Flexible pipelines 
Map, FlatMap, MapPartition, Filter, Project, Reduce, 
ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross, 
Iterate, Iterate Delta, Iterate-Vertex-Centric, Windowing 
Reduce 
Join 
Map 
Reduce 
Map 
Iterate 
Source 
Sink 
Source 
11/18/2014 Flink - M. Balassi & Gy. Fora 9
WordCount, Java API 
DataSet<String> text = env.readTextFile(input); 
DataSet<Tuple2<String, Integer>> result = text 
.flatMap((str, out) -> { 
for (String token : value.split("W")) { 
out.collect(new Tuple2<>(token, 1)); 
}) 
.groupBy(0) 
.sum(1); 
11/18/2014 Flink - M. Balassi & Gy. Fora 10
WordCount, Scala API 
val input = env.readTextFile(input); 
val words = input flatMap { line => line.split("W+") } 
val counts = words groupBy { word => word } count() 
11/18/2014 Flink - M. Balassi & Gy. Fora 11
WordCount, Streaming API 
DataStream<String> text = env.readTextFile(input); 
DataStream<Tuple2<String, Integer>> result = text 
.flatMap((str, out) -> { 
for (String token : value.split("W")) { 
out.collect(new Tuple2<>(token, 1)); 
}) 
.groupBy(0) 
.sum(1); 
11/18/2014 Flink - M. Balassi & Gy. Fora 12
Is there anything beyond WordCount? 
11/18/2014 Flink - M. Balassi & Gy. Fora 13
Beyond Key/Value Pairs 
// outputs pairs of pages and impressions 
class Impression { 
public String url; 
public long count; 
} 
class Page { 
public String url; 
public String topic; 
} 
DataSet<Page> pages = ...; 
DataSet<Impression> impressions = ...; 
DataSet<Impression> aggregated = 
impressions 
.groupBy("url") 
.sum("count"); 
pages.join(impressions).where("url").equalTo("url") 
.print() 
// outputs pairs of matching pages and impressions 
11/18/2014 Flink - M. Balassi & Gy. Fora 14
Preview: Logical Types 
DataSet<Row> dates = env.readCsv(...).as("order_id", "date"); 
DataSet<Row> sessions = env.readCsv(...).as("id", "session"); 
DataSet<Row> joined = dates 
.join(session).where("order_id").equals("id"); 
joined.groupBy("date").reduceGroup(new SessionFilter()) 
class SessionFilter implements GroupReduceFunction<SessionType> { 
public void reduce(Iterable<SessionType> value, Collector out){ 
... 
} 
} 
public class SessionType { 
public String order_id; 
public Date date; 
public String session; 
} 
11/18/2014 Flink - M. Balassi & Gy. Fora 15
Distinguishing Flink
Hybrid batch/streaming runtime 
• True data streaming on the runtime layer 
• No unnecessary synchronization steps 
• Batch and stream processing seamlessly integrated 
• Easy testing and development for both approaches 
11/18/2014 Flink - M. Balassi & Gy. Fora 17
Flink Streaming 
• Most Data Set operators are also available for 
Data Streams 
• Temporal and streaming specific operators 
– Windowing semantics (applying transformation on chunks of data) 
– Window reduce, join, cross etc. 
• Connectors for different data sources 
– Kafka, Flume, RabbitMQ, Twitter etc. 
• Support for iterative stream processing 
11/18/2014 Flink - M. Balassi & Gy. Fora 18
Flink Streaming 
//Build new model every minute 
DataStream<Double[]> model= env 
.addSource(new TrainingDataSource()) 
.window(60000) 
.reduceGroup(new ModelBuilder()); 
//Predict new data using the most up-to-date model 
DataStream<Integer> prediction = env 
.addSource(new NewDataSource()) 
.connect(model) 
.map(new Predictor()); 
M 
N. Data Output 
P 
T. Data 
11/18/2014 Flink - M. Balassi & Gy. Fora 19
Lambda architecture 
Want to do some dev-ops? 
Source: https://www.mapr.com/developercentral/lambda-architecture 
11/18/2014 Flink - M. Balassi & Gy. Fora 20
Lambda architecture in Flink 
- One System 
- One API 
- One cluster 
11/18/2014 Flink - M. Balassi & Gy. Fora 21
Iterations in other systems 
Client 
Loop 
outside 
the 
system 
Step Step Step Step Step 
Client 
Loop 
outside 
the 
system 
Step Step Step Step Step 
11/18/2014 Flink - M. Balassi & Gy. Fora 22
Iterations in Flink 
Streaming dataflow 
with feedback 
map 
red. 
join 
join 
System is iteration-aware, performs automatic optimization 
11/18/2014 Flink - M. Balassi & Gy. Fora 23
Dependability 
JVM Heap 
• Flink manages its own memory 
• Caching and data processing happens in a dedicated 
memory fraction 
• System never breaks the 
JVM heap, gracefully spills 
Unmanaged 
Heap 
Flink 
Managed 
Heap 
Network 
Buffers 
User 
Code 
Hashing/Sor0ng/Caching 
Shuffles/Broadcasts 
11/18/2014 Flink - M. Balassi & Gy. Fora 24
Operating on" 
Serialized Data 
• serializes data every time 
à Highly robust, never gives up on you 
• works on objects, RDDs may be stored serialized 
àSerialization considered slow, only when needed 
• makes serialization really cheap: 
à partial deserialization, operates on serialized form 
à Efficient and robust! 
11/18/2014 Flink - M. Balassi & Gy. Fora 25
Flink from a user perspective
Flink programs run everywhere 
Cluster 
(Batch) 
Local 
Debugging 
Cluster 
(Streaming) 
Fink 
Run3me 
or 
Apache 
Tez 
As 
Java 
Collec0on 
Programs 
Embedded 
(e.g., 
Web 
Container) 
11/18/2014 Flink - M. Balassi & Gy. Fora 27
Migrate Easily 
Flink out-of-the-box supports 
• Hadoop data types (writables) 
• Hadoop Input/Output Formats 
• Hadoop functions and object model 
Input Map Reduce Output 
S DataSet Red DataSet Join DataSet 
Output 
DataSet Map DataSet 
Input 
11/18/2014 Flink - M. Balassi & Gy. Fora 28
Little tuning or configuration required 
• Requires no memory thresholds to configure 
– Flink manages its own memory 
• Requires no complicated network configs 
– Pipelining engine requires much less memory for data exchange 
• Requires no serializers to be configured 
– Flink handles its own type extraction and data representation 
• Programs can be adjusted to data automatically 
– Flink’s optimizer can choose execution strategies automatically 
11/18/2014 Flink - M. Balassi & Gy. Fora 29
Understanding Programs 
Visualizes the operations and the data 
movement of programs 
Analyze after execution 
Screenshot from Flink’s plan visualizer 
11/18/2014 Flink - M. Balassi & Gy. Fora 30
Understanding Programs 
Analyze after execution (times, stragglers, …) 
11/18/2014 Flink - M. Balassi & Gy. Fora 31
Performance
Distributed Grep 
Filter 
Term 1 
HDFS 
Filter 
Term 2 
Filter 
Term 3 
Matches 
Matches 
Matches 
• 1 TB of data (log files) 
• 24 machines with 
• 32 GB Memory 
• Regular HDDs 
• HDFS 2.4.0 
• Flink 0.7-incubating- 
SNAPSHOT 
• Spark 1.2.0-SNAPSHOT 
èFlink up to 2.5x faster 
11/18/2014 Flink - M. Balassi & Gy. Fora 33
Distributed Grep: Flink Execution 
11/18/2014 Flink - M. Balassi & Gy. Fora 34
Distributed Grep: Spark Execution 
Spark executes the job in 3 stages: 
Filter 
HDFS Term 1 Matches 
Stage 1 
Filter 
Stage 2 HDFS Term 2 Matches 
Filter 
Stage 3 HDFS Term 3 Matches 
11/18/2014 Flink - M. Balassi & Gy. Fora 35
Spark in-memory pinning 
Filter 
Term 1 
Cache in-memory 
to avoid reading the 
data for each filter 
HDFS 
Filter 
Term 2 
Filter 
Term 3 
Matches 
Matches 
Matches 
In- 
Memory 
RDD 
JavaSparkContext sc = new JavaSparkContext(conf); 
JavaRDD<String> file = sc.textFile(inFile).persist(StorageLevel.MEMORY_AND_DISK()); 
for(int p = 0; p < patterns.length; p++) { 
final String pattern = patterns[p]; 
JavaRDD<String> res = file.filter(new Function<String, Boolean>() { … }); } 
11/18/2014 Flink - M. Balassi & Gy. Fora 36
Spark in-memory pinning 
37 minutes 
9 
minutes 
RDD is 100% in-memory 
Spark starts spilling 
RDD to disk 
11/18/2014 Flink - M. Balassi & Gy. Fora 37
PageRank 
• Dataset: 
– Twitter Follower Graph 
– 41,652,230 vertices (users) 
– 1,468,365,182 edges 
(followings) 
– 12 GB input data 
11/18/2014 Flink - M. Balassi & Gy. Fora 38
PageRank results 
11/18/2014 Flink - M. Balassi & Gy. Fora 39
PageRank on Flink with " 
Delta Iterations 
• The algorithm runs 60 iterations until convergence (runtime 
includes convergence check) 
100 minutes 
8.5 minutes 
11/18/2014 Flink - M. Balassi & Gy. Fora 40
Again, explaining the difference 
On average, a (delta) interation runs for 6.7 seconds 
Flink (Bulk): 48 sec. 
Spark (Bulk): 99 sec. 
11/18/2014 Flink - M. Balassi & Gy. Fora 41
Streaming performance 
11/18/2014 Flink - M. Balassi & Gy. Fora 42
Flink Roadmap 
• Flink has a major release every 3 months," 
with >=1 big-fixing releases in-between 
• Finer-grained fault tolerance 
• Logical (SQL-like) field addressing 
• Python API 
• Flink Streaming, Lambda architecture support 
• Flink on Tez 
• ML on Flink (e.g., Mahout DSL) 
• Graph DSL on Flink 
• … and more 
11/18/2014 Flink - M. Balassi & Gy. Fora 43
11/18/2014 Flink - M. Balassi & Gy. Fora 44
http://flink.incubator.apache.org 
github.com/apache/incubator-flink 
@ApacheFlink 
Marton Balassi, Gyula Fora" 
{mbalassi, gyfora}@apache.org

Contenu connexe

Tendances

SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink StreamingTuri, Inc.
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Andra Lungu
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFlink Forward
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processingYahoo Developer Network
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkFlink Forward
 

Tendances (20)

SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 

En vedette

K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 BasicFlink Forward
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFlink Forward
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeFlink Forward
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleFlink Forward
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Flink Forward
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsFlink Forward
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsFlink Forward
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkFlink Forward
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkFlink Forward
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flinkFlink Forward
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityFabian Hueske
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsFlink Forward
 

En vedette (20)

K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 

Similaire à Flink Apachecon Presentation

Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and RDatabricks
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureGyula Fóra
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...VMware Tanzu
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache FlinkDataWorks Summit
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Aljoscha Krettek
 
ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilSunita Shrivastava
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...UA DevOps Conference
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015Databricks
 
Using DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature StoreUsing DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature StoreAlbaTorrado
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202Timothy Spann
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesDatabricks
 

Similaire à Flink Apachecon Presentation (20)

Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch Council
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
 
Using DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature StoreUsing DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature Store
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 

Plus de Gyula Fóra

Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King Gyula Fóra
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingGyula Fóra
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
 

Plus de Gyula Fóra (6)

Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at King
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 

Dernier

Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Dernier (20)

Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Flink Apachecon Presentation

  • 1. The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org Hungarian Academy of Sciences
  • 2. What is Apache Flink? • Started in 2009 by the Berlin-based database research groups • In the Apache Incubator since mid 2014 • 61 contributors as of the 0.7.0 release Open Source • Fast, general purpose distributed data processing system • Combining batch and stream processing • Up to 100x faster than Hadoop Fast + Reliable • Programming APIs for Java and Scala • Tested on large clusters Ready to use 11/18/2014 Flink - M. Balassi & Gy. Fora 2
  • 3. What is Apache Flink? Master Worker Worker Flink Cluster Analytical Program Flink Client & Op0mizer 11/18/2014 Flink - M. Balassi & Gy. Fora 3
  • 4. This Talk • Introduction to Flink • API overview • Distinguishing Flink • Flink from a user perspective • Performance • Flink roadmap and closing 11/18/2014 Flink - M. Balassi & Gy. Fora 4
  • 5. Open source Big Data Landscape Hive Mahout MapReduce Crunch Flink Spark Storm Yarn Mesos HDFS Cascading Tez Pig Applica3ons Data processing engines App and resource management Storage, streams HBase KaCa … 11/18/2014 Flink - M. Balassi & Gy. Fora 5
  • 6. Flink stack Common API Storage Streams Hybrid Batch/Streaming Run0me Files HDFS S3 Cluster Manager Na0ve YARN EC2 Flink Op0mizer Scala API (batch) Graph API („Spargel“) JDBC Rabbit Redis Azure KaCa MQ … Java Collec0ons Streams Builder Apache Tez Python API Java API (streaming) Apache MRQL Batch Streaming Java API (batch) Local Execu0on 11/18/2014 Flink - M. Balassi & Gy. Fora 6
  • 8. Programming model Data abstractions: Data Set, Data Stream DataSet/Stream A A (1) A (2) DataSet/Stream B B (1) B (2) C (1) C (2) X X Y Y Program Parallel Execution X Y Operator X Operator Y DataSet/Stream C 11/18/2014 Flink - M. Balassi & Gy. Fora 8
  • 9. Flexible pipelines Map, FlatMap, MapPartition, Filter, Project, Reduce, ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross, Iterate, Iterate Delta, Iterate-Vertex-Centric, Windowing Reduce Join Map Reduce Map Iterate Source Sink Source 11/18/2014 Flink - M. Balassi & Gy. Fora 9
  • 10. WordCount, Java API DataSet<String> text = env.readTextFile(input); DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .sum(1); 11/18/2014 Flink - M. Balassi & Gy. Fora 10
  • 11. WordCount, Scala API val input = env.readTextFile(input); val words = input flatMap { line => line.split("W+") } val counts = words groupBy { word => word } count() 11/18/2014 Flink - M. Balassi & Gy. Fora 11
  • 12. WordCount, Streaming API DataStream<String> text = env.readTextFile(input); DataStream<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .sum(1); 11/18/2014 Flink - M. Balassi & Gy. Fora 12
  • 13. Is there anything beyond WordCount? 11/18/2014 Flink - M. Balassi & Gy. Fora 13
  • 14. Beyond Key/Value Pairs // outputs pairs of pages and impressions class Impression { public String url; public long count; } class Page { public String url; public String topic; } DataSet<Page> pages = ...; DataSet<Impression> impressions = ...; DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count"); pages.join(impressions).where("url").equalTo("url") .print() // outputs pairs of matching pages and impressions 11/18/2014 Flink - M. Balassi & Gy. Fora 14
  • 15. Preview: Logical Types DataSet<Row> dates = env.readCsv(...).as("order_id", "date"); DataSet<Row> sessions = env.readCsv(...).as("id", "session"); DataSet<Row> joined = dates .join(session).where("order_id").equals("id"); joined.groupBy("date").reduceGroup(new SessionFilter()) class SessionFilter implements GroupReduceFunction<SessionType> { public void reduce(Iterable<SessionType> value, Collector out){ ... } } public class SessionType { public String order_id; public Date date; public String session; } 11/18/2014 Flink - M. Balassi & Gy. Fora 15
  • 17. Hybrid batch/streaming runtime • True data streaming on the runtime layer • No unnecessary synchronization steps • Batch and stream processing seamlessly integrated • Easy testing and development for both approaches 11/18/2014 Flink - M. Balassi & Gy. Fora 17
  • 18. Flink Streaming • Most Data Set operators are also available for Data Streams • Temporal and streaming specific operators – Windowing semantics (applying transformation on chunks of data) – Window reduce, join, cross etc. • Connectors for different data sources – Kafka, Flume, RabbitMQ, Twitter etc. • Support for iterative stream processing 11/18/2014 Flink - M. Balassi & Gy. Fora 18
  • 19. Flink Streaming //Build new model every minute DataStream<Double[]> model= env .addSource(new TrainingDataSource()) .window(60000) .reduceGroup(new ModelBuilder()); //Predict new data using the most up-to-date model DataStream<Integer> prediction = env .addSource(new NewDataSource()) .connect(model) .map(new Predictor()); M N. Data Output P T. Data 11/18/2014 Flink - M. Balassi & Gy. Fora 19
  • 20. Lambda architecture Want to do some dev-ops? Source: https://www.mapr.com/developercentral/lambda-architecture 11/18/2014 Flink - M. Balassi & Gy. Fora 20
  • 21. Lambda architecture in Flink - One System - One API - One cluster 11/18/2014 Flink - M. Balassi & Gy. Fora 21
  • 22. Iterations in other systems Client Loop outside the system Step Step Step Step Step Client Loop outside the system Step Step Step Step Step 11/18/2014 Flink - M. Balassi & Gy. Fora 22
  • 23. Iterations in Flink Streaming dataflow with feedback map red. join join System is iteration-aware, performs automatic optimization 11/18/2014 Flink - M. Balassi & Gy. Fora 23
  • 24. Dependability JVM Heap • Flink manages its own memory • Caching and data processing happens in a dedicated memory fraction • System never breaks the JVM heap, gracefully spills Unmanaged Heap Flink Managed Heap Network Buffers User Code Hashing/Sor0ng/Caching Shuffles/Broadcasts 11/18/2014 Flink - M. Balassi & Gy. Fora 24
  • 25. Operating on" Serialized Data • serializes data every time à Highly robust, never gives up on you • works on objects, RDDs may be stored serialized àSerialization considered slow, only when needed • makes serialization really cheap: à partial deserialization, operates on serialized form à Efficient and robust! 11/18/2014 Flink - M. Balassi & Gy. Fora 25
  • 26. Flink from a user perspective
  • 27. Flink programs run everywhere Cluster (Batch) Local Debugging Cluster (Streaming) Fink Run3me or Apache Tez As Java Collec0on Programs Embedded (e.g., Web Container) 11/18/2014 Flink - M. Balassi & Gy. Fora 27
  • 28. Migrate Easily Flink out-of-the-box supports • Hadoop data types (writables) • Hadoop Input/Output Formats • Hadoop functions and object model Input Map Reduce Output S DataSet Red DataSet Join DataSet Output DataSet Map DataSet Input 11/18/2014 Flink - M. Balassi & Gy. Fora 28
  • 29. Little tuning or configuration required • Requires no memory thresholds to configure – Flink manages its own memory • Requires no complicated network configs – Pipelining engine requires much less memory for data exchange • Requires no serializers to be configured – Flink handles its own type extraction and data representation • Programs can be adjusted to data automatically – Flink’s optimizer can choose execution strategies automatically 11/18/2014 Flink - M. Balassi & Gy. Fora 29
  • 30. Understanding Programs Visualizes the operations and the data movement of programs Analyze after execution Screenshot from Flink’s plan visualizer 11/18/2014 Flink - M. Balassi & Gy. Fora 30
  • 31. Understanding Programs Analyze after execution (times, stragglers, …) 11/18/2014 Flink - M. Balassi & Gy. Fora 31
  • 33. Distributed Grep Filter Term 1 HDFS Filter Term 2 Filter Term 3 Matches Matches Matches • 1 TB of data (log files) • 24 machines with • 32 GB Memory • Regular HDDs • HDFS 2.4.0 • Flink 0.7-incubating- SNAPSHOT • Spark 1.2.0-SNAPSHOT èFlink up to 2.5x faster 11/18/2014 Flink - M. Balassi & Gy. Fora 33
  • 34. Distributed Grep: Flink Execution 11/18/2014 Flink - M. Balassi & Gy. Fora 34
  • 35. Distributed Grep: Spark Execution Spark executes the job in 3 stages: Filter HDFS Term 1 Matches Stage 1 Filter Stage 2 HDFS Term 2 Matches Filter Stage 3 HDFS Term 3 Matches 11/18/2014 Flink - M. Balassi & Gy. Fora 35
  • 36. Spark in-memory pinning Filter Term 1 Cache in-memory to avoid reading the data for each filter HDFS Filter Term 2 Filter Term 3 Matches Matches Matches In- Memory RDD JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> file = sc.textFile(inFile).persist(StorageLevel.MEMORY_AND_DISK()); for(int p = 0; p < patterns.length; p++) { final String pattern = patterns[p]; JavaRDD<String> res = file.filter(new Function<String, Boolean>() { … }); } 11/18/2014 Flink - M. Balassi & Gy. Fora 36
  • 37. Spark in-memory pinning 37 minutes 9 minutes RDD is 100% in-memory Spark starts spilling RDD to disk 11/18/2014 Flink - M. Balassi & Gy. Fora 37
  • 38. PageRank • Dataset: – Twitter Follower Graph – 41,652,230 vertices (users) – 1,468,365,182 edges (followings) – 12 GB input data 11/18/2014 Flink - M. Balassi & Gy. Fora 38
  • 39. PageRank results 11/18/2014 Flink - M. Balassi & Gy. Fora 39
  • 40. PageRank on Flink with " Delta Iterations • The algorithm runs 60 iterations until convergence (runtime includes convergence check) 100 minutes 8.5 minutes 11/18/2014 Flink - M. Balassi & Gy. Fora 40
  • 41. Again, explaining the difference On average, a (delta) interation runs for 6.7 seconds Flink (Bulk): 48 sec. Spark (Bulk): 99 sec. 11/18/2014 Flink - M. Balassi & Gy. Fora 41
  • 42. Streaming performance 11/18/2014 Flink - M. Balassi & Gy. Fora 42
  • 43. Flink Roadmap • Flink has a major release every 3 months," with >=1 big-fixing releases in-between • Finer-grained fault tolerance • Logical (SQL-like) field addressing • Python API • Flink Streaming, Lambda architecture support • Flink on Tez • ML on Flink (e.g., Mahout DSL) • Graph DSL on Flink • … and more 11/18/2014 Flink - M. Balassi & Gy. Fora 43
  • 44. 11/18/2014 Flink - M. Balassi & Gy. Fora 44
  • 45. http://flink.incubator.apache.org github.com/apache/incubator-flink @ApacheFlink Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org