SlideShare une entreprise Scribd logo
The Flink Big Data 
Analytics Platform 
Marton Balassi, Gyula Fora" 
{mbalassi, gyfora}@apache.org 
Hungarian Academy of Sciences
What is Apache Flink? 
• Started in 2009 by the Berlin-based database 
research groups 
• In the Apache Incubator since mid 2014 
• 61 contributors as of the 0.7.0 release 
Open 
Source 
• Fast, general purpose distributed data processing 
system 
• Combining batch and stream processing 
• Up to 100x faster than Hadoop 
Fast + 
Reliable 
• Programming APIs for Java and Scala 
• Tested on large clusters 
Ready to 
use 
11/18/2014 Flink - M. Balassi & Gy. Fora 2
What is Apache Flink? 
Master 
Worker 
Worker 
Flink 
Cluster 
Analytical Program 
Flink 
Client 
& 
Op0mizer 
11/18/2014 Flink - M. Balassi & Gy. Fora 3
This Talk 
• Introduction to Flink 
• API overview 
• Distinguishing Flink 
• Flink from a user perspective 
• Performance 
• Flink roadmap and closing 
11/18/2014 Flink - M. Balassi & Gy. Fora 4
Open source Big Data Landscape 
Hive 
Mahout 
MapReduce 
Crunch 
Flink 
Spark 
Storm 
Yarn 
Mesos 
HDFS 
Cascading 
Tez 
Pig 
Applica3ons 
Data 
processing 
engines 
App 
and 
resource 
management 
Storage, 
streams 
HBase 
KaCa 
… 
11/18/2014 Flink - M. Balassi & Gy. Fora 5
Flink stack 
Common 
API 
Storage 
Streams 
Hybrid 
Batch/Streaming 
Run0me 
Files 
HDFS 
S3 
Cluster 
Manager 
Na0ve YARN 
EC2 
Flink 
Op0mizer 
Scala 
API 
(batch) 
Graph 
API 
(„Spargel“) 
JDBC 
Rabbit 
Redis 
Azure 
KaCa 
MQ 
… 
Java 
Collec0ons 
Streams 
Builder 
Apache 
Tez 
Python 
API 
Java 
API 
(streaming) 
Apache 
MRQL 
Batch 
Streaming 
Java 
API 
(batch) 
Local 
Execu0on 
11/18/2014 Flink - M. Balassi & Gy. Fora 6
Flink APIs
Programming model 
Data abstractions: Data Set, Data Stream 
DataSet/Stream 
A 
A (1) 
A (2) 
DataSet/Stream 
B 
B (1) 
B (2) 
C (1) 
C (2) 
X 
X 
Y 
Y 
Program 
Parallel Execution 
X Y 
Operator X Operator Y 
DataSet/Stream 
C 
11/18/2014 Flink - M. Balassi & Gy. Fora 8
Flexible pipelines 
Map, FlatMap, MapPartition, Filter, Project, Reduce, 
ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross, 
Iterate, Iterate Delta, Iterate-Vertex-Centric, Windowing 
Reduce 
Join 
Map 
Reduce 
Map 
Iterate 
Source 
Sink 
Source 
11/18/2014 Flink - M. Balassi & Gy. Fora 9
WordCount, Java API 
DataSet<String> text = env.readTextFile(input); 
DataSet<Tuple2<String, Integer>> result = text 
.flatMap((str, out) -> { 
for (String token : value.split("W")) { 
out.collect(new Tuple2<>(token, 1)); 
}) 
.groupBy(0) 
.sum(1); 
11/18/2014 Flink - M. Balassi & Gy. Fora 10
WordCount, Scala API 
val input = env.readTextFile(input); 
val words = input flatMap { line => line.split("W+") } 
val counts = words groupBy { word => word } count() 
11/18/2014 Flink - M. Balassi & Gy. Fora 11
WordCount, Streaming API 
DataStream<String> text = env.readTextFile(input); 
DataStream<Tuple2<String, Integer>> result = text 
.flatMap((str, out) -> { 
for (String token : value.split("W")) { 
out.collect(new Tuple2<>(token, 1)); 
}) 
.groupBy(0) 
.sum(1); 
11/18/2014 Flink - M. Balassi & Gy. Fora 12
Is there anything beyond WordCount? 
11/18/2014 Flink - M. Balassi & Gy. Fora 13
Beyond Key/Value Pairs 
// outputs pairs of pages and impressions 
class Impression { 
public String url; 
public long count; 
} 
class Page { 
public String url; 
public String topic; 
} 
DataSet<Page> pages = ...; 
DataSet<Impression> impressions = ...; 
DataSet<Impression> aggregated = 
impressions 
.groupBy("url") 
.sum("count"); 
pages.join(impressions).where("url").equalTo("url") 
.print() 
// outputs pairs of matching pages and impressions 
11/18/2014 Flink - M. Balassi & Gy. Fora 14
Preview: Logical Types 
DataSet<Row> dates = env.readCsv(...).as("order_id", "date"); 
DataSet<Row> sessions = env.readCsv(...).as("id", "session"); 
DataSet<Row> joined = dates 
.join(session).where("order_id").equals("id"); 
joined.groupBy("date").reduceGroup(new SessionFilter()) 
class SessionFilter implements GroupReduceFunction<SessionType> { 
public void reduce(Iterable<SessionType> value, Collector out){ 
... 
} 
} 
public class SessionType { 
public String order_id; 
public Date date; 
public String session; 
} 
11/18/2014 Flink - M. Balassi & Gy. Fora 15
Distinguishing Flink
Hybrid batch/streaming runtime 
• True data streaming on the runtime layer 
• No unnecessary synchronization steps 
• Batch and stream processing seamlessly integrated 
• Easy testing and development for both approaches 
11/18/2014 Flink - M. Balassi & Gy. Fora 17
Flink Streaming 
• Most Data Set operators are also available for 
Data Streams 
• Temporal and streaming specific operators 
– Windowing semantics (applying transformation on chunks of data) 
– Window reduce, join, cross etc. 
• Connectors for different data sources 
– Kafka, Flume, RabbitMQ, Twitter etc. 
• Support for iterative stream processing 
11/18/2014 Flink - M. Balassi & Gy. Fora 18
Flink Streaming 
//Build new model every minute 
DataStream<Double[]> model= env 
.addSource(new TrainingDataSource()) 
.window(60000) 
.reduceGroup(new ModelBuilder()); 
//Predict new data using the most up-to-date model 
DataStream<Integer> prediction = env 
.addSource(new NewDataSource()) 
.connect(model) 
.map(new Predictor()); 
M 
N. Data Output 
P 
T. Data 
11/18/2014 Flink - M. Balassi & Gy. Fora 19
Lambda architecture 
Want to do some dev-ops? 
Source: https://www.mapr.com/developercentral/lambda-architecture 
11/18/2014 Flink - M. Balassi & Gy. Fora 20
Lambda architecture in Flink 
- One System 
- One API 
- One cluster 
11/18/2014 Flink - M. Balassi & Gy. Fora 21
Iterations in other systems 
Client 
Loop 
outside 
the 
system 
Step Step Step Step Step 
Client 
Loop 
outside 
the 
system 
Step Step Step Step Step 
11/18/2014 Flink - M. Balassi & Gy. Fora 22
Iterations in Flink 
Streaming dataflow 
with feedback 
map 
red. 
join 
join 
System is iteration-aware, performs automatic optimization 
11/18/2014 Flink - M. Balassi & Gy. Fora 23
Dependability 
JVM Heap 
• Flink manages its own memory 
• Caching and data processing happens in a dedicated 
memory fraction 
• System never breaks the 
JVM heap, gracefully spills 
Unmanaged 
Heap 
Flink 
Managed 
Heap 
Network 
Buffers 
User 
Code 
Hashing/Sor0ng/Caching 
Shuffles/Broadcasts 
11/18/2014 Flink - M. Balassi & Gy. Fora 24
Operating on" 
Serialized Data 
• serializes data every time 
à Highly robust, never gives up on you 
• works on objects, RDDs may be stored serialized 
àSerialization considered slow, only when needed 
• makes serialization really cheap: 
à partial deserialization, operates on serialized form 
à Efficient and robust! 
11/18/2014 Flink - M. Balassi & Gy. Fora 25
Flink from a user perspective
Flink programs run everywhere 
Cluster 
(Batch) 
Local 
Debugging 
Cluster 
(Streaming) 
Fink 
Run3me 
or 
Apache 
Tez 
As 
Java 
Collec0on 
Programs 
Embedded 
(e.g., 
Web 
Container) 
11/18/2014 Flink - M. Balassi & Gy. Fora 27
Migrate Easily 
Flink out-of-the-box supports 
• Hadoop data types (writables) 
• Hadoop Input/Output Formats 
• Hadoop functions and object model 
Input Map Reduce Output 
S DataSet Red DataSet Join DataSet 
Output 
DataSet Map DataSet 
Input 
11/18/2014 Flink - M. Balassi & Gy. Fora 28
Little tuning or configuration required 
• Requires no memory thresholds to configure 
– Flink manages its own memory 
• Requires no complicated network configs 
– Pipelining engine requires much less memory for data exchange 
• Requires no serializers to be configured 
– Flink handles its own type extraction and data representation 
• Programs can be adjusted to data automatically 
– Flink’s optimizer can choose execution strategies automatically 
11/18/2014 Flink - M. Balassi & Gy. Fora 29
Understanding Programs 
Visualizes the operations and the data 
movement of programs 
Analyze after execution 
Screenshot from Flink’s plan visualizer 
11/18/2014 Flink - M. Balassi & Gy. Fora 30
Understanding Programs 
Analyze after execution (times, stragglers, …) 
11/18/2014 Flink - M. Balassi & Gy. Fora 31
Performance
Distributed Grep 
Filter 
Term 1 
HDFS 
Filter 
Term 2 
Filter 
Term 3 
Matches 
Matches 
Matches 
• 1 TB of data (log files) 
• 24 machines with 
• 32 GB Memory 
• Regular HDDs 
• HDFS 2.4.0 
• Flink 0.7-incubating- 
SNAPSHOT 
• Spark 1.2.0-SNAPSHOT 
èFlink up to 2.5x faster 
11/18/2014 Flink - M. Balassi & Gy. Fora 33
Distributed Grep: Flink Execution 
11/18/2014 Flink - M. Balassi & Gy. Fora 34
Distributed Grep: Spark Execution 
Spark executes the job in 3 stages: 
Filter 
HDFS Term 1 Matches 
Stage 1 
Filter 
Stage 2 HDFS Term 2 Matches 
Filter 
Stage 3 HDFS Term 3 Matches 
11/18/2014 Flink - M. Balassi & Gy. Fora 35
Spark in-memory pinning 
Filter 
Term 1 
Cache in-memory 
to avoid reading the 
data for each filter 
HDFS 
Filter 
Term 2 
Filter 
Term 3 
Matches 
Matches 
Matches 
In- 
Memory 
RDD 
JavaSparkContext sc = new JavaSparkContext(conf); 
JavaRDD<String> file = sc.textFile(inFile).persist(StorageLevel.MEMORY_AND_DISK()); 
for(int p = 0; p < patterns.length; p++) { 
final String pattern = patterns[p]; 
JavaRDD<String> res = file.filter(new Function<String, Boolean>() { … }); } 
11/18/2014 Flink - M. Balassi & Gy. Fora 36
Spark in-memory pinning 
37 minutes 
9 
minutes 
RDD is 100% in-memory 
Spark starts spilling 
RDD to disk 
11/18/2014 Flink - M. Balassi & Gy. Fora 37
PageRank 
• Dataset: 
– Twitter Follower Graph 
– 41,652,230 vertices (users) 
– 1,468,365,182 edges 
(followings) 
– 12 GB input data 
11/18/2014 Flink - M. Balassi & Gy. Fora 38
PageRank results 
11/18/2014 Flink - M. Balassi & Gy. Fora 39
PageRank on Flink with " 
Delta Iterations 
• The algorithm runs 60 iterations until convergence (runtime 
includes convergence check) 
100 minutes 
8.5 minutes 
11/18/2014 Flink - M. Balassi & Gy. Fora 40
Again, explaining the difference 
On average, a (delta) interation runs for 6.7 seconds 
Flink (Bulk): 48 sec. 
Spark (Bulk): 99 sec. 
11/18/2014 Flink - M. Balassi & Gy. Fora 41
Streaming performance 
11/18/2014 Flink - M. Balassi & Gy. Fora 42
Flink Roadmap 
• Flink has a major release every 3 months," 
with >=1 big-fixing releases in-between 
• Finer-grained fault tolerance 
• Logical (SQL-like) field addressing 
• Python API 
• Flink Streaming, Lambda architecture support 
• Flink on Tez 
• ML on Flink (e.g., Mahout DSL) 
• Graph DSL on Flink 
• … and more 
11/18/2014 Flink - M. Balassi & Gy. Fora 43
11/18/2014 Flink - M. Balassi & Gy. Fora 44
http://flink.incubator.apache.org 
github.com/apache/incubator-flink 
@ApacheFlink 
Marton Balassi, Gyula Fora" 
{mbalassi, gyfora}@apache.org

Contenu connexe

Tendances

SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
Turi, Inc.
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
Stephan Ewen
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
Stephan Ewen
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
Flink Forward
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Yahoo Developer Network
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Flink Forward
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Flink Forward
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 

Tendances (20)

SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 

En vedette

K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Flink Forward
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
Flink Forward
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Flink Forward
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
Flink Forward
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
Flink Forward
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Flink Forward
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
Flink Forward
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Flink Forward
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Flink Forward
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
Flink Forward
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
Fabian Hueske
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Flink Forward
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
Flink Forward
 

En vedette (20)

K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 

Similaire à Flink Apachecon Presentation

Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Databricks
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
Gyula Fóra
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
VMware Tanzu
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
Yoni Davidson
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
DataWorks Summit
 
ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch Council
Sunita Shrivastava
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
Neo4j
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
Kostas Tzoumas
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
Infinity Tech Solutions
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
UA DevOps Conference
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
Databricks
 
Using DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature StoreUsing DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature Store
AlbaTorrado
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
Timothy Spann
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Databricks
 

Similaire à Flink Apachecon Presentation (20)

Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
 
ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch Council
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
 
Using DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature StoreUsing DuckDB ArrowFlight to Power a Feature Store
Using DuckDB ArrowFlight to Power a Feature Store
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 

Plus de Gyula Fóra

Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King
Gyula Fóra
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at King
Gyula Fóra
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Gyula Fóra
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fóra
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Gyula Fóra
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
Gyula Fóra
 

Plus de Gyula Fóra (6)

Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at King
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 

Dernier

社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
Riya Sen
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Samuel Jackson
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
revolutionary575
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
kuldeepsharmaks8120
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Alireza Kamrani
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
kittycrispy617
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
kinni singh$A17
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
bhupeshkumar0889
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
gargjiya84
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 

Dernier (20)

社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 

Flink Apachecon Presentation

  • 1. The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org Hungarian Academy of Sciences
  • 2. What is Apache Flink? • Started in 2009 by the Berlin-based database research groups • In the Apache Incubator since mid 2014 • 61 contributors as of the 0.7.0 release Open Source • Fast, general purpose distributed data processing system • Combining batch and stream processing • Up to 100x faster than Hadoop Fast + Reliable • Programming APIs for Java and Scala • Tested on large clusters Ready to use 11/18/2014 Flink - M. Balassi & Gy. Fora 2
  • 3. What is Apache Flink? Master Worker Worker Flink Cluster Analytical Program Flink Client & Op0mizer 11/18/2014 Flink - M. Balassi & Gy. Fora 3
  • 4. This Talk • Introduction to Flink • API overview • Distinguishing Flink • Flink from a user perspective • Performance • Flink roadmap and closing 11/18/2014 Flink - M. Balassi & Gy. Fora 4
  • 5. Open source Big Data Landscape Hive Mahout MapReduce Crunch Flink Spark Storm Yarn Mesos HDFS Cascading Tez Pig Applica3ons Data processing engines App and resource management Storage, streams HBase KaCa … 11/18/2014 Flink - M. Balassi & Gy. Fora 5
  • 6. Flink stack Common API Storage Streams Hybrid Batch/Streaming Run0me Files HDFS S3 Cluster Manager Na0ve YARN EC2 Flink Op0mizer Scala API (batch) Graph API („Spargel“) JDBC Rabbit Redis Azure KaCa MQ … Java Collec0ons Streams Builder Apache Tez Python API Java API (streaming) Apache MRQL Batch Streaming Java API (batch) Local Execu0on 11/18/2014 Flink - M. Balassi & Gy. Fora 6
  • 8. Programming model Data abstractions: Data Set, Data Stream DataSet/Stream A A (1) A (2) DataSet/Stream B B (1) B (2) C (1) C (2) X X Y Y Program Parallel Execution X Y Operator X Operator Y DataSet/Stream C 11/18/2014 Flink - M. Balassi & Gy. Fora 8
  • 9. Flexible pipelines Map, FlatMap, MapPartition, Filter, Project, Reduce, ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross, Iterate, Iterate Delta, Iterate-Vertex-Centric, Windowing Reduce Join Map Reduce Map Iterate Source Sink Source 11/18/2014 Flink - M. Balassi & Gy. Fora 9
  • 10. WordCount, Java API DataSet<String> text = env.readTextFile(input); DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .sum(1); 11/18/2014 Flink - M. Balassi & Gy. Fora 10
  • 11. WordCount, Scala API val input = env.readTextFile(input); val words = input flatMap { line => line.split("W+") } val counts = words groupBy { word => word } count() 11/18/2014 Flink - M. Balassi & Gy. Fora 11
  • 12. WordCount, Streaming API DataStream<String> text = env.readTextFile(input); DataStream<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .sum(1); 11/18/2014 Flink - M. Balassi & Gy. Fora 12
  • 13. Is there anything beyond WordCount? 11/18/2014 Flink - M. Balassi & Gy. Fora 13
  • 14. Beyond Key/Value Pairs // outputs pairs of pages and impressions class Impression { public String url; public long count; } class Page { public String url; public String topic; } DataSet<Page> pages = ...; DataSet<Impression> impressions = ...; DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count"); pages.join(impressions).where("url").equalTo("url") .print() // outputs pairs of matching pages and impressions 11/18/2014 Flink - M. Balassi & Gy. Fora 14
  • 15. Preview: Logical Types DataSet<Row> dates = env.readCsv(...).as("order_id", "date"); DataSet<Row> sessions = env.readCsv(...).as("id", "session"); DataSet<Row> joined = dates .join(session).where("order_id").equals("id"); joined.groupBy("date").reduceGroup(new SessionFilter()) class SessionFilter implements GroupReduceFunction<SessionType> { public void reduce(Iterable<SessionType> value, Collector out){ ... } } public class SessionType { public String order_id; public Date date; public String session; } 11/18/2014 Flink - M. Balassi & Gy. Fora 15
  • 17. Hybrid batch/streaming runtime • True data streaming on the runtime layer • No unnecessary synchronization steps • Batch and stream processing seamlessly integrated • Easy testing and development for both approaches 11/18/2014 Flink - M. Balassi & Gy. Fora 17
  • 18. Flink Streaming • Most Data Set operators are also available for Data Streams • Temporal and streaming specific operators – Windowing semantics (applying transformation on chunks of data) – Window reduce, join, cross etc. • Connectors for different data sources – Kafka, Flume, RabbitMQ, Twitter etc. • Support for iterative stream processing 11/18/2014 Flink - M. Balassi & Gy. Fora 18
  • 19. Flink Streaming //Build new model every minute DataStream<Double[]> model= env .addSource(new TrainingDataSource()) .window(60000) .reduceGroup(new ModelBuilder()); //Predict new data using the most up-to-date model DataStream<Integer> prediction = env .addSource(new NewDataSource()) .connect(model) .map(new Predictor()); M N. Data Output P T. Data 11/18/2014 Flink - M. Balassi & Gy. Fora 19
  • 20. Lambda architecture Want to do some dev-ops? Source: https://www.mapr.com/developercentral/lambda-architecture 11/18/2014 Flink - M. Balassi & Gy. Fora 20
  • 21. Lambda architecture in Flink - One System - One API - One cluster 11/18/2014 Flink - M. Balassi & Gy. Fora 21
  • 22. Iterations in other systems Client Loop outside the system Step Step Step Step Step Client Loop outside the system Step Step Step Step Step 11/18/2014 Flink - M. Balassi & Gy. Fora 22
  • 23. Iterations in Flink Streaming dataflow with feedback map red. join join System is iteration-aware, performs automatic optimization 11/18/2014 Flink - M. Balassi & Gy. Fora 23
  • 24. Dependability JVM Heap • Flink manages its own memory • Caching and data processing happens in a dedicated memory fraction • System never breaks the JVM heap, gracefully spills Unmanaged Heap Flink Managed Heap Network Buffers User Code Hashing/Sor0ng/Caching Shuffles/Broadcasts 11/18/2014 Flink - M. Balassi & Gy. Fora 24
  • 25. Operating on" Serialized Data • serializes data every time à Highly robust, never gives up on you • works on objects, RDDs may be stored serialized àSerialization considered slow, only when needed • makes serialization really cheap: à partial deserialization, operates on serialized form à Efficient and robust! 11/18/2014 Flink - M. Balassi & Gy. Fora 25
  • 26. Flink from a user perspective
  • 27. Flink programs run everywhere Cluster (Batch) Local Debugging Cluster (Streaming) Fink Run3me or Apache Tez As Java Collec0on Programs Embedded (e.g., Web Container) 11/18/2014 Flink - M. Balassi & Gy. Fora 27
  • 28. Migrate Easily Flink out-of-the-box supports • Hadoop data types (writables) • Hadoop Input/Output Formats • Hadoop functions and object model Input Map Reduce Output S DataSet Red DataSet Join DataSet Output DataSet Map DataSet Input 11/18/2014 Flink - M. Balassi & Gy. Fora 28
  • 29. Little tuning or configuration required • Requires no memory thresholds to configure – Flink manages its own memory • Requires no complicated network configs – Pipelining engine requires much less memory for data exchange • Requires no serializers to be configured – Flink handles its own type extraction and data representation • Programs can be adjusted to data automatically – Flink’s optimizer can choose execution strategies automatically 11/18/2014 Flink - M. Balassi & Gy. Fora 29
  • 30. Understanding Programs Visualizes the operations and the data movement of programs Analyze after execution Screenshot from Flink’s plan visualizer 11/18/2014 Flink - M. Balassi & Gy. Fora 30
  • 31. Understanding Programs Analyze after execution (times, stragglers, …) 11/18/2014 Flink - M. Balassi & Gy. Fora 31
  • 33. Distributed Grep Filter Term 1 HDFS Filter Term 2 Filter Term 3 Matches Matches Matches • 1 TB of data (log files) • 24 machines with • 32 GB Memory • Regular HDDs • HDFS 2.4.0 • Flink 0.7-incubating- SNAPSHOT • Spark 1.2.0-SNAPSHOT èFlink up to 2.5x faster 11/18/2014 Flink - M. Balassi & Gy. Fora 33
  • 34. Distributed Grep: Flink Execution 11/18/2014 Flink - M. Balassi & Gy. Fora 34
  • 35. Distributed Grep: Spark Execution Spark executes the job in 3 stages: Filter HDFS Term 1 Matches Stage 1 Filter Stage 2 HDFS Term 2 Matches Filter Stage 3 HDFS Term 3 Matches 11/18/2014 Flink - M. Balassi & Gy. Fora 35
  • 36. Spark in-memory pinning Filter Term 1 Cache in-memory to avoid reading the data for each filter HDFS Filter Term 2 Filter Term 3 Matches Matches Matches In- Memory RDD JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> file = sc.textFile(inFile).persist(StorageLevel.MEMORY_AND_DISK()); for(int p = 0; p < patterns.length; p++) { final String pattern = patterns[p]; JavaRDD<String> res = file.filter(new Function<String, Boolean>() { … }); } 11/18/2014 Flink - M. Balassi & Gy. Fora 36
  • 37. Spark in-memory pinning 37 minutes 9 minutes RDD is 100% in-memory Spark starts spilling RDD to disk 11/18/2014 Flink - M. Balassi & Gy. Fora 37
  • 38. PageRank • Dataset: – Twitter Follower Graph – 41,652,230 vertices (users) – 1,468,365,182 edges (followings) – 12 GB input data 11/18/2014 Flink - M. Balassi & Gy. Fora 38
  • 39. PageRank results 11/18/2014 Flink - M. Balassi & Gy. Fora 39
  • 40. PageRank on Flink with " Delta Iterations • The algorithm runs 60 iterations until convergence (runtime includes convergence check) 100 minutes 8.5 minutes 11/18/2014 Flink - M. Balassi & Gy. Fora 40
  • 41. Again, explaining the difference On average, a (delta) interation runs for 6.7 seconds Flink (Bulk): 48 sec. Spark (Bulk): 99 sec. 11/18/2014 Flink - M. Balassi & Gy. Fora 41
  • 42. Streaming performance 11/18/2014 Flink - M. Balassi & Gy. Fora 42
  • 43. Flink Roadmap • Flink has a major release every 3 months," with >=1 big-fixing releases in-between • Finer-grained fault tolerance • Logical (SQL-like) field addressing • Python API • Flink Streaming, Lambda architecture support • Flink on Tez • ML on Flink (e.g., Mahout DSL) • Graph DSL on Flink • … and more 11/18/2014 Flink - M. Balassi & Gy. Fora 43
  • 44. 11/18/2014 Flink - M. Balassi & Gy. Fora 44
  • 45. http://flink.incubator.apache.org github.com/apache/incubator-flink @ApacheFlink Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org