SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Till Rohrmann
Flink committer
trohrmann@apache.org
@stsffap
Machine Learning
with
Apache Flink
What is Flink
§  Large-scale data processing engine
§  Easy and powerful APIs for batch and real-time
streaming analysis (Java / Scala)
§  Backed by a very robust execution backend
•  with true streaming capabilities,
•  custom memory manager,
•  native iteration execution,
•  and a cost-based optimizer.
2
Technology inside Flink
§  Technology inspired by compilers +
MPP databases + distributed systems
§  For ease of use, reliable performance,
and scalability
case	
  class	
  Path	
  (from:	
  Long,	
  to:	
  Long)	
  
val	
  tc	
  =	
  edges.iterate(10)	
  {	
  	
  
	
  	
  paths:	
  DataSet[Path]	
  =>	
  
	
  	
  	
  	
  val	
  next	
  =	
  paths	
  
	
  	
  	
  	
  	
  	
  .join(edges)	
  
	
  	
  	
  	
  	
  	
  .where("to")	
  
	
  	
  	
  	
  	
  	
  .equalTo("from")	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  (path,	
  edge)	
  =>	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Path(path.from,	
  edge.to)	
  
	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  .union(paths)	
  
	
  	
  	
  	
  	
  	
  .distinct()	
  
	
  	
  	
  	
  next	
  
	
  	
  }	
  
Cost-based
optimizer
Type extraction
stack
Memory
manager
Out-of-core
algos
real-time
streaming
Task
scheduling
Recovery
metadata
Data
serialization
stack
Streaming
network
stack
...
Pre-flight
(client) Master
Workers
How do you use Flink?
4
Example: WordCount
5
case	
  class	
  Word	
  (word:	
  String,	
  frequency:	
  Int)	
  
	
  
val	
  env	
  =	
  ExecutionEnvironment.getExecutionEnvironment()	
  
	
  
val	
  lines	
  =	
  env.readTextFile(...)	
  
	
  
lines	
  
	
  	
  	
  .flatMap	
  {line	
  =>	
  line.split("	
  ").map(word	
  =>	
  Word(word,1))}	
  	
  	
  
	
  	
  	
  .groupBy("word").sum("frequency”)	
  
	
  	
  	
  .print()	
  
	
  
env.execute()	
  	
  	
  	
  
Flink has mirrored Java and Scala APIs that offer the same
functionality, including by-name addressing.
Flink API in a Nutshell
§  map, flatMap, filter,
groupBy, reduce,
reduceGroup,
aggregate, join,
coGroup, cross,
project, distinct, union,
iterate, iterateDelta, ...
§  All Hadoop input
formats are supported
§  API similar for data sets
and data streams with
slightly different
operator semantics
§  Window functions for
data streams
§  Counters,
accumulators, and
broadcast variables
6
Machine learning with Flink
7
Does ML work like that?
8
More realistic scenario!
9
Machine learning pipelines
§  Pipelining inspired by scikit-learn
§  Transformer: Modify data
§  Learner: Train a model
§  Reusable components
§  Let’s you quickly build ML pipelines
§  Model inherits pipeline of learner
10
Linear regression in polynomial space
val	
  polynomialBase	
  =	
  PolynomialBase()	
  
val	
  learner	
  =	
  MultipleLinearRegression()	
  
	
  
val	
  pipeline	
  =	
  polynomialBase.chain(learner)	
  
	
  
val	
  trainingDS	
  =	
  env.fromCollection(trainingData)	
  
	
  
val	
  parameters	
  =	
  ParameterMap()	
  
	
  	
  .add(PolynomialBase.Degree,	
  3)	
  
	
  	
  .add(MultipleLinearRegression.Stepsize,	
  0.002)	
  
	
  	
  .add(MultipleLinearRegression.Iterations,	
  100)	
  
	
  
val	
  model	
  =	
  pipeline.fit(trainingDS,	
  parameters)	
  
11
Input	
  Data	
  
Polynomial	
  
Base	
  
Mapper	
  
Mul4ple	
  
Linear	
  
Regression	
  
Linear	
  
Model	
  
Current state of Flink-ML
§  Existing learners
•  Multiple linear regression
•  Alternating least squares
•  Communication efficient distributed dual
coordinate ascent (PR pending)
§  Feature transformer
•  Polynomial base feature mapper
§  Tooling
12
Distributed linear algebra
§  Linear algebra universal
language for data
analysis
§  High-level abstraction
§  Fast prototyping
§  Pre- and post-processing
step
13
Example: Gaussian non-negative matrix
factorization
§  Given input matrix V, find W and H such
that
§  Iterative approximation
14
Ht+1 = Ht ∗ Wt
T
V /Wt
T
Wt Ht( )
Wt+1 = Wt ∗ VHt+1
T
/Wt Ht+1Ht+1
T
( )
V ≈ WH
var	
  i	
  =	
  0	
  
var	
  H:	
  CheckpointedDrm[Int]	
  =	
  randomMatrix(k,	
  V.numCols)	
  
var	
  W:	
  CheckpointedDrm[Int]	
  =	
  randomMatrix(V.numRows,	
  k)	
  
	
  
while(i	
  <	
  maxIterations)	
  {	
  
	
  	
  H	
  =	
  H	
  *	
  (W.t	
  %*%	
  V	
  /	
  W.t	
  %*%	
  W	
  %*%	
  H)	
  
	
  	
  W	
  =	
  W	
  *	
  (V	
  %*%	
  	
  H.t	
  /	
  W	
  %*%	
  H	
  %*%	
  H.t)	
  
	
  	
  i	
  +=	
  1	
  
}	
  
Why is Flink a good fit for ML?
15
Flink’s features
§  Stateful iterations
•  Keep state across iterations
§  Delta iterations
•  Limit computation to elements which matter
§  Pipelining
•  Avoiding materialization of large
intermediate state
16
CoCoA
17
minw∈Rd P(w):=
λ
2
w
2
+
1
n
ℓi wT
xi( )
i=1
n
∑
#
$
%
&
'
(
Bulk Iterations
18
partial
solution
partial
solutionX
other
datasets
Y
initial
solution
iteration
result
Replace
Step function
Delta iterations
19
partial
solution
delta
setX
other
datasets
Y
initial
solution
iteration
result
workset A B workset
Merge deltas
Replace
initial
workset
Effect of delta iterations
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
1 6 11 16 21 26 31 36 41 46 51 56 61
#ofelementsupdated
iteration
Iteration performance
21
0
10
20
30
40
50
60
Hadoop Flink bulk Flink delta
Time(minutes)
61 iterations and 30 iterations of
PageRank on a Twitter follower
graph with Hadoop MapReduce
and Flink using bulk and delta
iterations
30 iterations
61 iterations
MapReduce
How to factorize really large
matrices?
22
Collaborative Filtering
§  Recommend items based on users with
similar preferences
§  Latent factor models capture underlying
characteristics of items and preferences
of user
§  Predicted preference:
23
ˆru,i = xu
T
yi
Matrix factorization
24
minX,Y ru,i − xu
T
yi( )
2
+ λ nu xu
2
+ ni yi
2
i
∑
u
∑
#
$
%
&
'
(
ru,i≠0
∑
R ≈ XT
Y
R
X
Y
Alternating least squares
§  Fixing one matrix gives a quadratic form
§  Solution guarantees to decrease overall
cost function
§  To calculate , all rated item vectors and
ratings are needed
25
xu = YSu
YT
+ λnuΙ( )
−1
Yru
T
Sii
u
=
1 if ru,i ≠ 0
0 else
"
#
$
%$
xu
Data partitioning
26
Naïve ALS
case	
  class	
  Rating(userID:	
  Int,	
  itemID:	
  Int,	
  rating:	
  Double)	
  
case	
  class	
  ColumnVector(columnIndex:	
  Int,	
  vector:	
  Array[Double])	
  
	
  
val	
  items:	
  DataSet[ColumnVector]	
  =	
  _	
  
val	
  ratings:	
  DataSet[Rating]	
  =	
  _	
  
	
  
//	
  Generate	
  tuples	
  of	
  items	
  with	
  their	
  ratings	
  
val	
  uVA	
  =	
  items.join(ratings).where(0).equalTo(1)	
  {	
  
	
  	
  (item,	
  ratingEntry)	
  =>	
  {	
  
	
  	
  	
  	
  val	
  Rating(uID,	
  _,	
  rating)	
  =	
  ratingEntry	
  
	
  	
  	
  	
  (uID,	
  rating,	
  item.vector)	
  
	
  	
  }	
  
}	
  
	
  
	
  
27
Naïve ALS contd.
uVA.groupBy(0).reduceGroup	
  {	
  
	
  	
  vectors	
  =>	
  {	
  
	
  	
  	
  	
  var	
  uID	
  =	
  -­‐1	
  
	
  	
  	
  	
  val	
  matrix	
  =	
  FloatMatrix.zeros(factors,	
  factors)	
  
	
  	
  	
  	
  val	
  vector	
  =	
  FloatMatrix.zeros(factors)	
  
	
  	
  	
  	
  var	
  n	
  =	
  0	
  
	
  
	
  	
  	
  	
  for((id,	
  rating,	
  v)	
  <-­‐	
  vectors)	
  {	
  
	
  	
  	
  	
  	
  	
  uID	
  =	
  id	
  
	
  	
  	
  	
  	
  	
  vector	
  +=	
  rating	
  *	
  v	
  
	
  	
  	
  	
  	
  	
  matrix	
  +=	
  outerProduct(v	
  ,	
  v)	
  
	
  	
  	
  	
  	
  	
  n	
  +=	
  1	
  
	
  	
  	
  	
  }	
  
	
  
	
  	
  	
  	
  for(idx	
  <-­‐	
  0	
  until	
  factors)	
  {	
  
	
  	
  	
  	
  	
  	
  matrix(idx,	
  idx)	
  +=	
  lambda	
  *	
  n	
  
	
  	
  	
  	
  }	
  
	
  
	
  	
  	
  	
  new	
  ColumnVector(uID,	
  Solve(matrix,	
  vector))	
  
	
  	
  }	
  
}	
  
28
Problems of naïve ALS
§  Problem:
•  Item vectors are sent redundantly à High
network load
§  Solution:
•  Blocking of user and item vectors to share
common data
•  Avoids blown up intermediate state
29
Data partitioning
30
Performance comparison
31
•  40	
  node	
  GCE	
  cluster,	
  highmem-­‐8	
  
•  10	
  ALS	
  itera4on	
  with	
  50	
  latent	
  factors	
  
Runtimeinminutes
0
225
450
675
900
Number of non-zero entries (billion)
0 7.5 15 22.5 30
Blocked ALS Blocked ALS highmem-16 Naive ALS
5.5h
14h
2.5h
1h
Table 2
Entries in billion Naive Join Naive Join Broadcast Broadcast
80 0.08 201.326 3.35543333333333 190.723 3.17871666666667
Streaming machine learning
32
Why is streaming ML important?
§  Spam detection in mails
§  Patterns might change over time
§  Retraining of model necessary
§  Best solution: Online models
33
Applications
§  Spam detection
§  Recommendation
§  News feed
personalization
§  Credit card fraud
detection
34
Apache SAMOA
§  Scalable Advanced Massive Online
Analysis
§  Distributed streaming machine learning
framework
§  Incubation at the Apache Software
Foundation
§  Runs on multiple streaming processing
engines (S4, Storm, Samza)
§  Support for Flink is pending pull request
35
Supported algorithms
§  Classification: Vertical
Hoeffding Tree
§  Clustering: CluStream
§  Regression: Adaptive
Model Rules
§  Frequent pattern mining:
PARMA
36
Closing
37
Flink-ML Outlook
§  Support more algorithms
§  Support for distributed linear algebra
§  Integration with streaming machine learning
§  Interactive programs and Zeppelin
38
flink.apache.org
@ApacheFlink

Contenu connexe

Tendances

FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkTheodoros Vasiloudis
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin MeetupMárton Balassi
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...ucelebi
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetupKostas Tzoumas
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)ucelebi
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkFlink Forward
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingFabian Hueske
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Stephan Ewen
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon PresentationGyula Fóra
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 

Tendances (20)

FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 

Similaire à Machine Learning with Apache Flink at Stockholm Machine Learning Group

Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CARobert Metzger
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI EcosystemJiangjie Qin
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantParis Carbone
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAPEDB
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Introduction to matlab lecture 1 of 4
Introduction to matlab lecture 1 of 4Introduction to matlab lecture 1 of 4
Introduction to matlab lecture 1 of 4Randa Elanwar
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
 

Similaire à Machine Learning with Apache Flink at Stockholm Machine Learning Group (20)

Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem
 
data-stream-processing-SEEP.pptx
data-stream-processing-SEEP.pptxdata-stream-processing-SEEP.pptx
data-stream-processing-SEEP.pptx
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Introduction to matlab lecture 1 of 4
Introduction to matlab lecture 1 of 4Introduction to matlab lecture 1 of 4
Introduction to matlab lecture 1 of 4
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 

Plus de Till Rohrmann

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Till Rohrmann
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and BeyondTill Rohrmann
 
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinElastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinTill Rohrmann
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkTill Rohrmann
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Till Rohrmann
 
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinTill Rohrmann
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSTill Rohrmann
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4Till Rohrmann
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Till Rohrmann
 
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Till Rohrmann
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Till Rohrmann
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Till Rohrmann
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Till Rohrmann
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Till Rohrmann
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 

Plus de Till Rohrmann (18)

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and Beyond
 
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinElastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 Berlin
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017
 
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 

Dernier

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 

Dernier (20)

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 

Machine Learning with Apache Flink at Stockholm Machine Learning Group

  • 2. What is Flink §  Large-scale data processing engine §  Easy and powerful APIs for batch and real-time streaming analysis (Java / Scala) §  Backed by a very robust execution backend •  with true streaming capabilities, •  custom memory manager, •  native iteration execution, •  and a cost-based optimizer. 2
  • 3. Technology inside Flink §  Technology inspired by compilers + MPP databases + distributed systems §  For ease of use, reliable performance, and scalability case  class  Path  (from:  Long,  to:  Long)   val  tc  =  edges.iterate(10)  {        paths:  DataSet[Path]  =>          val  next  =  paths              .join(edges)              .where("to")              .equalTo("from")  {                  (path,  edge)  =>                        Path(path.from,  edge.to)              }              .union(paths)              .distinct()          next      }   Cost-based optimizer Type extraction stack Memory manager Out-of-core algos real-time streaming Task scheduling Recovery metadata Data serialization stack Streaming network stack ... Pre-flight (client) Master Workers
  • 4. How do you use Flink? 4
  • 5. Example: WordCount 5 case  class  Word  (word:  String,  frequency:  Int)     val  env  =  ExecutionEnvironment.getExecutionEnvironment()     val  lines  =  env.readTextFile(...)     lines        .flatMap  {line  =>  line.split("  ").map(word  =>  Word(word,1))}            .groupBy("word").sum("frequency”)        .print()     env.execute()         Flink has mirrored Java and Scala APIs that offer the same functionality, including by-name addressing.
  • 6. Flink API in a Nutshell §  map, flatMap, filter, groupBy, reduce, reduceGroup, aggregate, join, coGroup, cross, project, distinct, union, iterate, iterateDelta, ... §  All Hadoop input formats are supported §  API similar for data sets and data streams with slightly different operator semantics §  Window functions for data streams §  Counters, accumulators, and broadcast variables 6
  • 8. Does ML work like that? 8
  • 10. Machine learning pipelines §  Pipelining inspired by scikit-learn §  Transformer: Modify data §  Learner: Train a model §  Reusable components §  Let’s you quickly build ML pipelines §  Model inherits pipeline of learner 10
  • 11. Linear regression in polynomial space val  polynomialBase  =  PolynomialBase()   val  learner  =  MultipleLinearRegression()     val  pipeline  =  polynomialBase.chain(learner)     val  trainingDS  =  env.fromCollection(trainingData)     val  parameters  =  ParameterMap()      .add(PolynomialBase.Degree,  3)      .add(MultipleLinearRegression.Stepsize,  0.002)      .add(MultipleLinearRegression.Iterations,  100)     val  model  =  pipeline.fit(trainingDS,  parameters)   11 Input  Data   Polynomial   Base   Mapper   Mul4ple   Linear   Regression   Linear   Model  
  • 12. Current state of Flink-ML §  Existing learners •  Multiple linear regression •  Alternating least squares •  Communication efficient distributed dual coordinate ascent (PR pending) §  Feature transformer •  Polynomial base feature mapper §  Tooling 12
  • 13. Distributed linear algebra §  Linear algebra universal language for data analysis §  High-level abstraction §  Fast prototyping §  Pre- and post-processing step 13
  • 14. Example: Gaussian non-negative matrix factorization §  Given input matrix V, find W and H such that §  Iterative approximation 14 Ht+1 = Ht ∗ Wt T V /Wt T Wt Ht( ) Wt+1 = Wt ∗ VHt+1 T /Wt Ht+1Ht+1 T ( ) V ≈ WH var  i  =  0   var  H:  CheckpointedDrm[Int]  =  randomMatrix(k,  V.numCols)   var  W:  CheckpointedDrm[Int]  =  randomMatrix(V.numRows,  k)     while(i  <  maxIterations)  {      H  =  H  *  (W.t  %*%  V  /  W.t  %*%  W  %*%  H)      W  =  W  *  (V  %*%    H.t  /  W  %*%  H  %*%  H.t)      i  +=  1   }  
  • 15. Why is Flink a good fit for ML? 15
  • 16. Flink’s features §  Stateful iterations •  Keep state across iterations §  Delta iterations •  Limit computation to elements which matter §  Pipelining •  Avoiding materialization of large intermediate state 16
  • 20. Effect of delta iterations 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 1 6 11 16 21 26 31 36 41 46 51 56 61 #ofelementsupdated iteration
  • 21. Iteration performance 21 0 10 20 30 40 50 60 Hadoop Flink bulk Flink delta Time(minutes) 61 iterations and 30 iterations of PageRank on a Twitter follower graph with Hadoop MapReduce and Flink using bulk and delta iterations 30 iterations 61 iterations MapReduce
  • 22. How to factorize really large matrices? 22
  • 23. Collaborative Filtering §  Recommend items based on users with similar preferences §  Latent factor models capture underlying characteristics of items and preferences of user §  Predicted preference: 23 ˆru,i = xu T yi
  • 24. Matrix factorization 24 minX,Y ru,i − xu T yi( ) 2 + λ nu xu 2 + ni yi 2 i ∑ u ∑ # $ % & ' ( ru,i≠0 ∑ R ≈ XT Y R X Y
  • 25. Alternating least squares §  Fixing one matrix gives a quadratic form §  Solution guarantees to decrease overall cost function §  To calculate , all rated item vectors and ratings are needed 25 xu = YSu YT + λnuΙ( ) −1 Yru T Sii u = 1 if ru,i ≠ 0 0 else " # $ %$ xu
  • 27. Naïve ALS case  class  Rating(userID:  Int,  itemID:  Int,  rating:  Double)   case  class  ColumnVector(columnIndex:  Int,  vector:  Array[Double])     val  items:  DataSet[ColumnVector]  =  _   val  ratings:  DataSet[Rating]  =  _     //  Generate  tuples  of  items  with  their  ratings   val  uVA  =  items.join(ratings).where(0).equalTo(1)  {      (item,  ratingEntry)  =>  {          val  Rating(uID,  _,  rating)  =  ratingEntry          (uID,  rating,  item.vector)      }   }       27
  • 28. Naïve ALS contd. uVA.groupBy(0).reduceGroup  {      vectors  =>  {          var  uID  =  -­‐1          val  matrix  =  FloatMatrix.zeros(factors,  factors)          val  vector  =  FloatMatrix.zeros(factors)          var  n  =  0            for((id,  rating,  v)  <-­‐  vectors)  {              uID  =  id              vector  +=  rating  *  v              matrix  +=  outerProduct(v  ,  v)              n  +=  1          }            for(idx  <-­‐  0  until  factors)  {              matrix(idx,  idx)  +=  lambda  *  n          }            new  ColumnVector(uID,  Solve(matrix,  vector))      }   }   28
  • 29. Problems of naïve ALS §  Problem: •  Item vectors are sent redundantly à High network load §  Solution: •  Blocking of user and item vectors to share common data •  Avoids blown up intermediate state 29
  • 31. Performance comparison 31 •  40  node  GCE  cluster,  highmem-­‐8   •  10  ALS  itera4on  with  50  latent  factors   Runtimeinminutes 0 225 450 675 900 Number of non-zero entries (billion) 0 7.5 15 22.5 30 Blocked ALS Blocked ALS highmem-16 Naive ALS 5.5h 14h 2.5h 1h Table 2 Entries in billion Naive Join Naive Join Broadcast Broadcast 80 0.08 201.326 3.35543333333333 190.723 3.17871666666667
  • 33. Why is streaming ML important? §  Spam detection in mails §  Patterns might change over time §  Retraining of model necessary §  Best solution: Online models 33
  • 34. Applications §  Spam detection §  Recommendation §  News feed personalization §  Credit card fraud detection 34
  • 35. Apache SAMOA §  Scalable Advanced Massive Online Analysis §  Distributed streaming machine learning framework §  Incubation at the Apache Software Foundation §  Runs on multiple streaming processing engines (S4, Storm, Samza) §  Support for Flink is pending pull request 35
  • 36. Supported algorithms §  Classification: Vertical Hoeffding Tree §  Clustering: CluStream §  Regression: Adaptive Model Rules §  Frequent pattern mining: PARMA 36
  • 38. Flink-ML Outlook §  Support more algorithms §  Support for distributed linear algebra §  Integration with streaming machine learning §  Interactive programs and Zeppelin 38