SlideShare a Scribd company logo
1 of 67
Download to read offline
real time big data management
Albert Bifet (@abifet)
Paris, 7 October 2015
albert.bifet@telecom-paristech.fr
data streams
Big Data & Real Time
1
hadoop
Hadoop architecture deals with datasets, not
data streams
2
apache s4
Apache S4
3
apache s4
4
apache storm
Storm from Twitter
5
apache storm
Stream, Spout, Bolt, Topology
6
apache storm
Stream, Spout, Bolt, Topology
7
apache storm
Storm characteristics for real-time data processing workloads:
1 Fast
2 Scalable
3 Fault-tolerant
4 Reliable
5 Easy to operate
8
apache kafka from linkedin
Apache Kafka is a fast, scalable, durable, and fault-tolerant
publish-subscribe messaging system.
9
apache kafka from linkedin
Apache Kafka is a fast, scalable, durable, and fault-tolerant
publish-subscribe messaging system.
9
apache samza from linkedin
Storm and Samza are fairly similar. Both systems provide:
1 a partitioned stream model,
2 a distributed execution environment,
3 an API for stream processing,
4 fault tolerance,
5 Kafka integration
10
apache spark streaming
Spark Streaming is an extension of Spark that allows
processing data stream using micro-batches of data.
11
apache flink motivation
apache flink motivation
1 Real time computation: streaming computation
2 Fast, as there is not need to write to disk
3 Easy to write code
13
real time computation: streaming computation
MapReduce Limitations
Example
How compute in real time (latency less than 1 second):
1 predictions
2 frequent items as Twitter hashtags
3 sentiment analysis
14
easy to write code
case class Word ( word : String , frequency : I n t )
DataSet API (batch):
val lines : DataSet [ String ] = env . readTextFile ( . . . )
lines . flatMap { l i n e => l i n e . s p l i t ( ” ” )
.map( word => Word( word , 1 ) ) }
. groupBy ( ” word ” ) . sum( ” frequency ” )
. p r i n t ( )
15
easy to write code
case class Word ( word : String , frequency : I n t )
DataSet API (batch):
val lines : DataSet [ String ] = env . readTextFile ( . . . )
lines . flatMap { l i n e => l i n e . s p l i t ( ” ” )
.map( word => Word( word , 1 ) ) }
. groupBy ( ” word ” ) . sum( ” frequency ” )
. p r i n t ( )
DataStream API (streaming):
val lines : DataStream [ String ] = env . fromSocketStream ( . . . )
lines . flatMap { l i n e => l i n e . s p l i t ( ” ” )
.map( word => Word( word , 1 ) ) }
. window ( Time . of (5 ,SECONDS ) ) . every ( Time . of (1 ,SECONDS) )
. groupBy ( ” word ” ) . sum( ” frequency ” )
. p r i n t ( )
16
what is apache flink?
Figure 1: Apache Flink Overview
17
batch and streaming engines
Figure 2: Batch, streaming and hybrid data processing engines.
18
batch comparison
Figure 3: Comparison between Hadoop, Spark And Flink.
19
streaming comparison
Figure 4: Comparison between Storm, Spark And Flink.
20
scala language
• What is Scala?
• object oriented
• functional
• How is Scala?
• Scala is compatible
• Scala is concise
• Scala is high-level
• Scala is statically typed
21
short course on scala
• Easy to use: includes an interpreter
• Variables:
• val: immutable (preferable)
• var: mutable
• Scala treats everything as objects with methods
• Scala has first-class functions
• Functions:
def max( x : Int , y : I n t ) : I n t = {
i f ( x > y ) x
else y
}
def max2( x : Int , y : I n t ) = i f ( x > y ) x else y
22
short course on scala
• Functional:
args . foreach ( ( arg : String ) => p r i n t l n ( arg ) )
args . foreach ( arg => p r i n t l n ( arg ) )
args . foreach ( p r i n t l n )
• Imperative:
for ( arg <− args ) p r i n t l n ( arg )
• Scala achieves a conceptual simplicity by treating
everything, from arrays to expressions, as objects with
methods.
( 1 ) . + ( 2 )
greetStrings (0) = ” Hello ”
greetStrings . update (0 , ” Hello ” )
val numNames2 = Array . apply ( ” zero ” , ”one ” , ”two ” )23
short course on scala
• Array: mutable sequence of objects that share the same type
• List: immutable sequence of objects that share the same
type
• Tuple: immutable sequence of objects that does not share
the same type
val pair = (99 , ” Luftballons ” )
p r i n t l n ( pair . _1 )
p r i n t l n ( pair . _2 )
24
short course on scala
• Sets and maps
var jetSet = Set ( ” Boeing ” , ” Airbus ” )
jetSet += ” Lear ”
p r i n t l n ( jetSet . contains ( ” Cessna ” ) )
import scala . collection . mutable .Map
val treasureMap = Map[ Int , String ] ( )
treasureMap += (1 −> ”Go to island . ” )
treasureMap += (2 −> ” Find big X on ground . ” )
treasureMap += (3 −> ” Dig . ” )
p r i n t l n ( treasureMap ( 2 ) )
25
short course on scala
• Functional style
• Does not contain any var
def printArgs ( args : Array [ String ] ) : Unit = {
var i = 0
while ( i < args . length ) {
p r i n t l n ( args ( i ) )
i += 1
}
}
def printArgs ( args : Array [ String ] ) : Unit = {
for ( arg <− args )
p r i n t l n ( arg )
}
def printArgs ( args : Array [ String ] ) : Unit = {
args . foreach ( p r i n t l n )
}
def formatArgs ( args : Array [ String ] ) = args . mkString ( ”  n ” )
p r i n t l n ( formatArgs ( args ) ) 26
short course on scala
• Prefer vals, immutable objects, and methods without side
effects.
• Use vars, mutable objects, and methods with side effects
when you have a specific need and justification for them.
• In a Scala program, a semicolon at the end of a statement is
usually optional.
• A singleton object definition looks like a class definition,
except instead of the keyword class you use the keyword
object .
• Scala provides a trait, scala.Application:
object FallWinterSpringSummer extends Application {
for ( season <− List ( ” f a l l ” , ” winter ” , ” spring ” ) )
p r i n t l n ( season + ” : ”+ calculate ( season ) )
}
27
short course on scala
• Scala has first-class functions: you can write down
functions as unnamed literals and then pass them around as
values.
( x : I n t ) => x + 1
• Short forms of function literals
someNumbers . f i l t e r ( ( x : I n t ) => x > 0)
someNumbers . f i l t e r ( ( x ) => x > 0)
someNumbers . f i l t e r ( x => x > 0)
someNumbers . f i l t e r ( _ > 0)
someNumbers . foreach ( x => p r i n t l n ( x ) )
someNumbers . foreach ( p r i n t l n _ )
28
short course on scala
• Zipping lists: zip and unzip
• The zip operation takes two lists and forms a list of pairs:
• A useful special case is to zip a list with its index. This is done
most efficiently with the zipWithIndex method,
• Mapping over lists: map , flatMap and foreach
• Filtering lists: filter , partition , find , takeWhile , dropWhile ,
and span
• Folding lists: /: and : or foldLeft and foldRight.
( z / : List ( a , b , c ) ) ( op )
equals
op ( op ( op ( z , a ) , b ) , c )
• Sorting lists: sortWith
29
apache flink architecture
references
Apache Flink Documentation
http://dataartisans.github.io/flink-training/ 31
api introduction
Flink programs
1 Input from source
2 Apply operations
3 Output to source
32
batch and streaming apis
1 DataSet API
• Example: Map/Reduce paradigm
2 DataStream API
• Example: Live Stock Feed
33
streaming and batch comparison
34
architecture overview
Figure 5: The JobManager is the coordinator of the Flink system
TaskManagers are the workers that execute parts of the parallel
programs.
35
client
1 Optimize
2 Construct job graph
3 Pass job graph to job manager
4 Retrieve job results
36
job manager
1 Parallelization: Create Execution Graph
2 Scheduling: Assign tasks to task managers
3 State: Supervise the execution
37
task manager
1 Operations are split up into tasks depending on the specified
parallelism
2 Each parallel instance of an operation runs in a separate
task slot
3 The scheduler may run several tasks from different
operators in one task slot
38
component stack
39
component stack
1 API layer: implements multiple APIs that create operator
DAGs for their programs. Each API needs to provide utilities
(serializers, comparators) that describe the interaction
between its data types and the runtime.
2 Optimizer and common api layer: takes programs in the form
of operator DAGs. The operators are specific (e.g., Map,
Join, Filter, Reduce, …), but are data type agnostic.
3 Runtime layer: receives a program in the form of a
JobGraph. A JobGraph is a generic parallel data flow with
arbitrary tasks that consume and produce data streams.
40
flink topologies
Flink programs
1 Input from source
2 Apply operations
3 Output to source
41
sources (selection)
• Collection-based
• fromCollection
• fromElements
• File-based
• TextInputFormat
• CsvInputFormat
• Other
• SocketInputFormat
• KafkaInputFormat
• Databases
42
sinks (selection)
• File-based
• TextOutputFormat
• CsvOutputFormat
• PrintOutput
• Others
• SocketOutputFormat
• KafkaOutputFormat
• Databases
43
apache flink algorithns
flink skeleton program
1 Obtain an ExecutionEnvironment,
2 Load/create the initial data,
3 Specify transformations on this data,
4 Specify where to put the results of your computations,
5 Trigger the program execution
45
java wordcount example
public class WordCountExample {
public s ta t ic void main ( String [ ] args ) throws Exception {
f i n a l ExecutionEnvironment env =
ExecutionEnvironment . getExecutionEnvironment ( ) ;
DataSet < String > text = env . fromElements (
”Who’ s there ? ” ,
” I think I hear them . Stand , ho ! Who’ s there ? ” ) ;
DataSet <Tuple2 < String , Integer >> wordCounts = text
. flatMap (new L i n e S p l i t t e r ( ) )
. groupBy (0)
.sum( 1 ) ;
wordCounts . p r i n t ( ) ;
}
. . . .
}
46
java wordcount example
public class WordCountExample {
public s ta tic void main ( String [ ] args ) throws Exception {
. . . .
}
public s ta tic class L i n e S p l i t t e r implements
FlatMapFunction < String , Tuple2 < String , Integer >> {
@Override
public void flatMap ( String line , Collector <Tuple2 < String ,
Integer >> out ) {
for ( String word : l i n e . s p l i t ( ” ” ) ) {
out . collect (new Tuple2 < String , Integer >(word , 1 ) ) ;
}
}
}
}
47
scala wordcount example
import org . apache . f l i n k . api . scala . _
object WordCount {
def main ( args : Array [ String ] ) {
val env = ExecutionEnvironment . getExecutionEnvironment
val text = env . fromElements (
”Who’ s there ? ” ,
” I think I hear them . Stand , ho ! Who’ s there ? ” )
val counts = text . flatMap
{ _ . toLowerCase . s p l i t ( ”  W+”) f i l t e r { _ . nonEmpty } }
.map { ( _ , 1) }
. groupBy (0)
.sum(1)
counts . p r i n t ( )
}
}
48
java 8 wordcount example
public class WordCountExample {
public s tat ic void main ( String [ ] args ) throws Exception {
f i n a l ExecutionEnvironment env =
ExecutionEnvironment . getExecutionEnvironment ( ) ;
DataSet < String > text = env . fromElements (
”Who’ s there ?”
” I think I hear them . Stand , ho ! Who’ s there ? ” ) ;
text .map( l i n e −> l i n e . s p l i t ( ” ” ) )
. flatMap ( ( String [ ] wordArray ,
Collector <Tuple2 < String , Integer >> out )
−> Arrays . stream ( wordArray )
. forEach ( t −> out . collect (new Tuple2 < >( t , 1 ) ) )
)
. groupBy (0)
.sum(1)
. p r i n t ( ) ;
}
}
49
data streams algorithns
flink skeleton program
1 Obtain an StreamExecutionEnvironment,
2 Load/create the initial data,
3 Specify transformations on this data,
4 Specify where to put the results of your computations,
5 Trigger the program execution
51
java wordcount example
public class StreamingWordCount {
public st a t i c void main ( String [ ] args ) {
StreamExecutionEnvironment env =
StreamExecutionEnvironment . getExecutionEnvironment ( ) ;
DataStream <Tuple2 < String , Integer >> dataStream = env
. socketTextStream ( ” localhost ” , 9999)
. flatMap (new S p l i t t e r ( ) )
. groupBy (0)
.sum( 1 ) ;
dataStream . p r i n t ( ) ;
env . execute ( ” Socket Stream WordCount ” ) ;
}
. . . .
}
52
java wordcount example
public class StreamingWordCount {
public st a t i c void main ( String [ ] args ) throws Exception {
. . . .
}
public st a ti c class S p l i t t e r implements
FlatMapFunction < String , Tuple2 < String , Integer >> {
@Override
public void flatMap ( String sentence ,
Collector <Tuple2 < String , Integer >> out ) throws Exception
for ( String word : sentence . s p l i t ( ” ” ) ) {
out . collect (new Tuple2 < String , Integer >(word , 1 ) ) ;
}
}
}
}
53
scala wordcount example
object WordCount {
def main ( args : Array [ String ] ) {
val env = StreamExecutionEnvironment . getExecutionEnvironment
val text = env . socketTextStream ( ” localhost ” , 9999)
val counts = text . flatMap { _ . toLowerCase . s p l i t ( ”  W+”)
f i l t e r { _ . nonEmpty } }
.map { ( _ , 1) }
. groupBy (0)
.sum(1)
counts . p r i n t
env . execute ( ” Scala Socket Stream WordCount ” )
}
}
54
obtain an streamexecutionenvironment
The StreamExecutionEnvironment is the basis for all Flink programs.
StreamExecutionEnvironment . getExecutionEnvironment
StreamExecutionEnvironment . createLocalEnvironment ( parallelism )
StreamExecutionEnvironment . createRemoteEnvironment ( host : String ,
port : String , parallelism : Int , j a r F i l e s : String *)
env . socketTextStream ( host , port )
env . fromElements ( elements . . . )
env . addSource ( sourceFunction )
55
specify transformations on this data
• Map
• FlatMap
• Filter
• Reduce
• Fold
• Union
56
3) specify transformations on this data
Map
Takes one element and produces one element.
data .map { x => x . toInt }
57
3) specify transformations on this data
FlatMap
Takes one element and produces zero, one, or more elements.
data . flatMap { str => str . s p l i t ( ” ” ) }
58
3) specify transformations on this data
Filter
Evaluates a boolean function for each element and retains
those for which the function returns true.
data . f i l t e r { _ > 1000 }
59
3) specify transformations on this data
Reduce
Combines a group of elements into a single element by
repeatedly combining two elements into one. Reduce may be
applied on a full data set, or on a grouped data set.
data . reduce { _ + _ }
60
3) specify transformations on this data
Union
Produces the union of two data sets.
data . union ( data2 )
61
window operators
The user has different ways of using the result of a window
operation:
• windowedDataStream.flatten() - streams the results
element wise and returns a DataStream<T> where T is the
type of the underlying windowed stream
• windowedDataStream.getDiscretizedStream() -
returns a DataStream<StreamWindow<T» for applying
some advanced logic on the stream windows itself.
• Calling any window transformation further transforms the
windows, while preserving the windowing logic
dataStream . window ( Time . of (5 , TimeUnit .SECONDS) )
. every ( Time . of (1 , TimeUnit .SECONDS ) ) ;
dataStream . window ( Count . of (100))
. every ( Time . of (1 , TimeUnit .MINUTES ) ) ;
6 62
gelly: flink graph api
• Gelly is a Java Graph API for Flink.
• In Gelly, graphs can be transformed and modified using
high-level functions similar to the ones provided by the
batch processing API.
• In Gelly, a Graph is represented by a DataSet of vertices and
a DataSet of edges.
• The Graph nodes are represented by the Vertex type. A
Vertex is defined by a unique ID and a value.
// create a new vertex with a Long ID and a String value
Vertex <Long , String > v = new Vertex <Long , String >(1L , ” foo ” ) ;
// create a new vertex with a Long ID and no value
Vertex <Long , NullValue > v =
new Vertex <Long , NullValue >(1L , NullValue . getInstance ( ) ) ;
63
gelly: flink graph api
• The graph edges are represented by the Edge type.
• An Edge is defined by a source ID (the ID of the source
Vertex), a target ID (the ID of the target Vertex) and an
optional value.
• The source and target IDs should be of the same type as the
Vertex IDs. Edges with no value have a NullValue value type.
Edge<Long , Double > e = new Edge<Long , Double >(1L , 2L , 0 . 5 ) ;
// reverse the source and target of this edge
Edge<Long , Double > reversed = e . reverse ( ) ;
Double weight = e . getValue ( ) ; // weight = 0.5
64
table api - relational queries
• Flink provides an API that allows specifying operations
using SQL-like expressions.
• Instead of manipulating DataSet or DataStream you work
with Table on which relational operations can be performed.
import org . apache . f l i n k . api . scala . _
import org . apache . f l i n k . api . scala . table . _
case class WC( word : String , count : I nt )
val input = env . fromElements (WC( ” hello ” , 1) ,
WC( ” hello ” , 1) , WC( ” ciao ” , 1))
val expr = input . toTable
val result = expr . groupBy ( ’ word )
. select ( ’ word , ’ count .sum as ’ count ) . toDataSet [WC]
65

More Related Content

What's hot

MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker DiarizationHONGJOO LEE
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupSri Ambati
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filterxlight
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structuresshrinivasvasala
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyAndrii Gakhov
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Kira
 
Bloom filter
Bloom filterBloom filter
Bloom filterwang ping
 

What's hot (20)

MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filter
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structures
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)
 
Bloom filter
Bloom filterBloom filter
Bloom filter
 

Viewers also liked

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopJames Kebinger
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
 
Bigdata analytics-twitter
Bigdata analytics-twitterBigdata analytics-twitter
Bigdata analytics-twitterdfilppi
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyUri Cohen
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP Technology
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)Kevin Weil
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesSigmoid
 
Alfresco in few points - Search Tutorial
Alfresco in few points - Search TutorialAlfresco in few points - Search Tutorial
Alfresco in few points - Search TutorialPASCAL Jean Marie
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Wei-Yu Chen
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big DataMax Lin
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitternkallen
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Takahiro Inoue
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 

Viewers also liked (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
The Big 5- Twitter
The Big 5- TwitterThe Big 5- Twitter
The Big 5- Twitter
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with Hadoop
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
Bigdata analytics-twitter
Bigdata analytics-twitterBigdata analytics-twitter
Bigdata analytics-twitter
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case study
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data Analysis
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Alfresco in few points - Search Tutorial
Alfresco in few points - Search TutorialAlfresco in few points - Search Tutorial
Alfresco in few points - Search Tutorial
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big Data
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 

Similar to Real Time Big Data Management

Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsCodeOps Technologies LLP
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemSages
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big DataLeonardo Gamas
 
Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkAngelo Leto
 
(How) can we benefit from adopting scala?
(How) can we benefit from adopting scala?(How) can we benefit from adopting scala?
(How) can we benefit from adopting scala?Tomasz Wrobel
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With ScalaKnoldus Inc.
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
New Features in JDK 8
New Features in JDK 8New Features in JDK 8
New Features in JDK 8Martin Toshev
 
Modern Programming in Java 8 - Lambdas, Streams and Date Time API
Modern Programming in Java 8 - Lambdas, Streams and Date Time APIModern Programming in Java 8 - Lambdas, Streams and Date Time API
Modern Programming in Java 8 - Lambdas, Streams and Date Time APIGanesh Samarthyam
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Languageleague
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkmxmxm
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法x1 ichi
 

Similar to Real Time Big Data Management (20)

Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and Streams
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with spark
 
Scala @ TomTom
Scala @ TomTomScala @ TomTom
Scala @ TomTom
 
(How) can we benefit from adopting scala?
(How) can we benefit from adopting scala?(How) can we benefit from adopting scala?
(How) can we benefit from adopting scala?
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With Scala
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Introducing scala
Introducing scalaIntroducing scala
Introducing scala
 
New Features in JDK 8
New Features in JDK 8New Features in JDK 8
New Features in JDK 8
 
Scala Bootcamp 1
Scala Bootcamp 1Scala Bootcamp 1
Scala Bootcamp 1
 
Modern Programming in Java 8 - Lambdas, Streams and Date Time API
Modern Programming in Java 8 - Lambdas, Streams and Date Time APIModern Programming in Java 8 - Lambdas, Streams and Date Time API
Modern Programming in Java 8 - Lambdas, Streams and Date Time API
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
What is new in Java 8
What is new in Java 8What is new in Java 8
What is new in Java 8
 
Seminar on MATLAB
Seminar on MATLABSeminar on MATLAB
Seminar on MATLAB
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
 

More from Albert Bifet

Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online AnalysisAlbert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streamsAlbert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAlbert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesAlbert Bifet
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsAlbert Bifet
 

More from Albert Bifet (15)

Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed Trees
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data Streams
 

Recently uploaded

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 

Recently uploaded (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

Real Time Big Data Management

  • 1. real time big data management Albert Bifet (@abifet) Paris, 7 October 2015 albert.bifet@telecom-paristech.fr
  • 2. data streams Big Data & Real Time 1
  • 3. hadoop Hadoop architecture deals with datasets, not data streams 2
  • 7. apache storm Stream, Spout, Bolt, Topology 6
  • 8. apache storm Stream, Spout, Bolt, Topology 7
  • 9. apache storm Storm characteristics for real-time data processing workloads: 1 Fast 2 Scalable 3 Fault-tolerant 4 Reliable 5 Easy to operate 8
  • 10. apache kafka from linkedin Apache Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. 9
  • 11. apache kafka from linkedin Apache Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. 9
  • 12. apache samza from linkedin Storm and Samza are fairly similar. Both systems provide: 1 a partitioned stream model, 2 a distributed execution environment, 3 an API for stream processing, 4 fault tolerance, 5 Kafka integration 10
  • 13. apache spark streaming Spark Streaming is an extension of Spark that allows processing data stream using micro-batches of data. 11
  • 15. apache flink motivation 1 Real time computation: streaming computation 2 Fast, as there is not need to write to disk 3 Easy to write code 13
  • 16. real time computation: streaming computation MapReduce Limitations Example How compute in real time (latency less than 1 second): 1 predictions 2 frequent items as Twitter hashtags 3 sentiment analysis 14
  • 17. easy to write code case class Word ( word : String , frequency : I n t ) DataSet API (batch): val lines : DataSet [ String ] = env . readTextFile ( . . . ) lines . flatMap { l i n e => l i n e . s p l i t ( ” ” ) .map( word => Word( word , 1 ) ) } . groupBy ( ” word ” ) . sum( ” frequency ” ) . p r i n t ( ) 15
  • 18. easy to write code case class Word ( word : String , frequency : I n t ) DataSet API (batch): val lines : DataSet [ String ] = env . readTextFile ( . . . ) lines . flatMap { l i n e => l i n e . s p l i t ( ” ” ) .map( word => Word( word , 1 ) ) } . groupBy ( ” word ” ) . sum( ” frequency ” ) . p r i n t ( ) DataStream API (streaming): val lines : DataStream [ String ] = env . fromSocketStream ( . . . ) lines . flatMap { l i n e => l i n e . s p l i t ( ” ” ) .map( word => Word( word , 1 ) ) } . window ( Time . of (5 ,SECONDS ) ) . every ( Time . of (1 ,SECONDS) ) . groupBy ( ” word ” ) . sum( ” frequency ” ) . p r i n t ( ) 16
  • 19. what is apache flink? Figure 1: Apache Flink Overview 17
  • 20. batch and streaming engines Figure 2: Batch, streaming and hybrid data processing engines. 18
  • 21. batch comparison Figure 3: Comparison between Hadoop, Spark And Flink. 19
  • 22. streaming comparison Figure 4: Comparison between Storm, Spark And Flink. 20
  • 23. scala language • What is Scala? • object oriented • functional • How is Scala? • Scala is compatible • Scala is concise • Scala is high-level • Scala is statically typed 21
  • 24. short course on scala • Easy to use: includes an interpreter • Variables: • val: immutable (preferable) • var: mutable • Scala treats everything as objects with methods • Scala has first-class functions • Functions: def max( x : Int , y : I n t ) : I n t = { i f ( x > y ) x else y } def max2( x : Int , y : I n t ) = i f ( x > y ) x else y 22
  • 25. short course on scala • Functional: args . foreach ( ( arg : String ) => p r i n t l n ( arg ) ) args . foreach ( arg => p r i n t l n ( arg ) ) args . foreach ( p r i n t l n ) • Imperative: for ( arg <− args ) p r i n t l n ( arg ) • Scala achieves a conceptual simplicity by treating everything, from arrays to expressions, as objects with methods. ( 1 ) . + ( 2 ) greetStrings (0) = ” Hello ” greetStrings . update (0 , ” Hello ” ) val numNames2 = Array . apply ( ” zero ” , ”one ” , ”two ” )23
  • 26. short course on scala • Array: mutable sequence of objects that share the same type • List: immutable sequence of objects that share the same type • Tuple: immutable sequence of objects that does not share the same type val pair = (99 , ” Luftballons ” ) p r i n t l n ( pair . _1 ) p r i n t l n ( pair . _2 ) 24
  • 27. short course on scala • Sets and maps var jetSet = Set ( ” Boeing ” , ” Airbus ” ) jetSet += ” Lear ” p r i n t l n ( jetSet . contains ( ” Cessna ” ) ) import scala . collection . mutable .Map val treasureMap = Map[ Int , String ] ( ) treasureMap += (1 −> ”Go to island . ” ) treasureMap += (2 −> ” Find big X on ground . ” ) treasureMap += (3 −> ” Dig . ” ) p r i n t l n ( treasureMap ( 2 ) ) 25
  • 28. short course on scala • Functional style • Does not contain any var def printArgs ( args : Array [ String ] ) : Unit = { var i = 0 while ( i < args . length ) { p r i n t l n ( args ( i ) ) i += 1 } } def printArgs ( args : Array [ String ] ) : Unit = { for ( arg <− args ) p r i n t l n ( arg ) } def printArgs ( args : Array [ String ] ) : Unit = { args . foreach ( p r i n t l n ) } def formatArgs ( args : Array [ String ] ) = args . mkString ( ” n ” ) p r i n t l n ( formatArgs ( args ) ) 26
  • 29. short course on scala • Prefer vals, immutable objects, and methods without side effects. • Use vars, mutable objects, and methods with side effects when you have a specific need and justification for them. • In a Scala program, a semicolon at the end of a statement is usually optional. • A singleton object definition looks like a class definition, except instead of the keyword class you use the keyword object . • Scala provides a trait, scala.Application: object FallWinterSpringSummer extends Application { for ( season <− List ( ” f a l l ” , ” winter ” , ” spring ” ) ) p r i n t l n ( season + ” : ”+ calculate ( season ) ) } 27
  • 30. short course on scala • Scala has first-class functions: you can write down functions as unnamed literals and then pass them around as values. ( x : I n t ) => x + 1 • Short forms of function literals someNumbers . f i l t e r ( ( x : I n t ) => x > 0) someNumbers . f i l t e r ( ( x ) => x > 0) someNumbers . f i l t e r ( x => x > 0) someNumbers . f i l t e r ( _ > 0) someNumbers . foreach ( x => p r i n t l n ( x ) ) someNumbers . foreach ( p r i n t l n _ ) 28
  • 31. short course on scala • Zipping lists: zip and unzip • The zip operation takes two lists and forms a list of pairs: • A useful special case is to zip a list with its index. This is done most efficiently with the zipWithIndex method, • Mapping over lists: map , flatMap and foreach • Filtering lists: filter , partition , find , takeWhile , dropWhile , and span • Folding lists: /: and : or foldLeft and foldRight. ( z / : List ( a , b , c ) ) ( op ) equals op ( op ( op ( z , a ) , b ) , c ) • Sorting lists: sortWith 29
  • 34. api introduction Flink programs 1 Input from source 2 Apply operations 3 Output to source 32
  • 35. batch and streaming apis 1 DataSet API • Example: Map/Reduce paradigm 2 DataStream API • Example: Live Stock Feed 33
  • 36. streaming and batch comparison 34
  • 37. architecture overview Figure 5: The JobManager is the coordinator of the Flink system TaskManagers are the workers that execute parts of the parallel programs. 35
  • 38. client 1 Optimize 2 Construct job graph 3 Pass job graph to job manager 4 Retrieve job results 36
  • 39. job manager 1 Parallelization: Create Execution Graph 2 Scheduling: Assign tasks to task managers 3 State: Supervise the execution 37
  • 40. task manager 1 Operations are split up into tasks depending on the specified parallelism 2 Each parallel instance of an operation runs in a separate task slot 3 The scheduler may run several tasks from different operators in one task slot 38
  • 42. component stack 1 API layer: implements multiple APIs that create operator DAGs for their programs. Each API needs to provide utilities (serializers, comparators) that describe the interaction between its data types and the runtime. 2 Optimizer and common api layer: takes programs in the form of operator DAGs. The operators are specific (e.g., Map, Join, Filter, Reduce, …), but are data type agnostic. 3 Runtime layer: receives a program in the form of a JobGraph. A JobGraph is a generic parallel data flow with arbitrary tasks that consume and produce data streams. 40
  • 43. flink topologies Flink programs 1 Input from source 2 Apply operations 3 Output to source 41
  • 44. sources (selection) • Collection-based • fromCollection • fromElements • File-based • TextInputFormat • CsvInputFormat • Other • SocketInputFormat • KafkaInputFormat • Databases 42
  • 45. sinks (selection) • File-based • TextOutputFormat • CsvOutputFormat • PrintOutput • Others • SocketOutputFormat • KafkaOutputFormat • Databases 43
  • 47. flink skeleton program 1 Obtain an ExecutionEnvironment, 2 Load/create the initial data, 3 Specify transformations on this data, 4 Specify where to put the results of your computations, 5 Trigger the program execution 45
  • 48. java wordcount example public class WordCountExample { public s ta t ic void main ( String [ ] args ) throws Exception { f i n a l ExecutionEnvironment env = ExecutionEnvironment . getExecutionEnvironment ( ) ; DataSet < String > text = env . fromElements ( ”Who’ s there ? ” , ” I think I hear them . Stand , ho ! Who’ s there ? ” ) ; DataSet <Tuple2 < String , Integer >> wordCounts = text . flatMap (new L i n e S p l i t t e r ( ) ) . groupBy (0) .sum( 1 ) ; wordCounts . p r i n t ( ) ; } . . . . } 46
  • 49. java wordcount example public class WordCountExample { public s ta tic void main ( String [ ] args ) throws Exception { . . . . } public s ta tic class L i n e S p l i t t e r implements FlatMapFunction < String , Tuple2 < String , Integer >> { @Override public void flatMap ( String line , Collector <Tuple2 < String , Integer >> out ) { for ( String word : l i n e . s p l i t ( ” ” ) ) { out . collect (new Tuple2 < String , Integer >(word , 1 ) ) ; } } } } 47
  • 50. scala wordcount example import org . apache . f l i n k . api . scala . _ object WordCount { def main ( args : Array [ String ] ) { val env = ExecutionEnvironment . getExecutionEnvironment val text = env . fromElements ( ”Who’ s there ? ” , ” I think I hear them . Stand , ho ! Who’ s there ? ” ) val counts = text . flatMap { _ . toLowerCase . s p l i t ( ” W+”) f i l t e r { _ . nonEmpty } } .map { ( _ , 1) } . groupBy (0) .sum(1) counts . p r i n t ( ) } } 48
  • 51. java 8 wordcount example public class WordCountExample { public s tat ic void main ( String [ ] args ) throws Exception { f i n a l ExecutionEnvironment env = ExecutionEnvironment . getExecutionEnvironment ( ) ; DataSet < String > text = env . fromElements ( ”Who’ s there ?” ” I think I hear them . Stand , ho ! Who’ s there ? ” ) ; text .map( l i n e −> l i n e . s p l i t ( ” ” ) ) . flatMap ( ( String [ ] wordArray , Collector <Tuple2 < String , Integer >> out ) −> Arrays . stream ( wordArray ) . forEach ( t −> out . collect (new Tuple2 < >( t , 1 ) ) ) ) . groupBy (0) .sum(1) . p r i n t ( ) ; } } 49
  • 53. flink skeleton program 1 Obtain an StreamExecutionEnvironment, 2 Load/create the initial data, 3 Specify transformations on this data, 4 Specify where to put the results of your computations, 5 Trigger the program execution 51
  • 54. java wordcount example public class StreamingWordCount { public st a t i c void main ( String [ ] args ) { StreamExecutionEnvironment env = StreamExecutionEnvironment . getExecutionEnvironment ( ) ; DataStream <Tuple2 < String , Integer >> dataStream = env . socketTextStream ( ” localhost ” , 9999) . flatMap (new S p l i t t e r ( ) ) . groupBy (0) .sum( 1 ) ; dataStream . p r i n t ( ) ; env . execute ( ” Socket Stream WordCount ” ) ; } . . . . } 52
  • 55. java wordcount example public class StreamingWordCount { public st a t i c void main ( String [ ] args ) throws Exception { . . . . } public st a ti c class S p l i t t e r implements FlatMapFunction < String , Tuple2 < String , Integer >> { @Override public void flatMap ( String sentence , Collector <Tuple2 < String , Integer >> out ) throws Exception for ( String word : sentence . s p l i t ( ” ” ) ) { out . collect (new Tuple2 < String , Integer >(word , 1 ) ) ; } } } } 53
  • 56. scala wordcount example object WordCount { def main ( args : Array [ String ] ) { val env = StreamExecutionEnvironment . getExecutionEnvironment val text = env . socketTextStream ( ” localhost ” , 9999) val counts = text . flatMap { _ . toLowerCase . s p l i t ( ” W+”) f i l t e r { _ . nonEmpty } } .map { ( _ , 1) } . groupBy (0) .sum(1) counts . p r i n t env . execute ( ” Scala Socket Stream WordCount ” ) } } 54
  • 57. obtain an streamexecutionenvironment The StreamExecutionEnvironment is the basis for all Flink programs. StreamExecutionEnvironment . getExecutionEnvironment StreamExecutionEnvironment . createLocalEnvironment ( parallelism ) StreamExecutionEnvironment . createRemoteEnvironment ( host : String , port : String , parallelism : Int , j a r F i l e s : String *) env . socketTextStream ( host , port ) env . fromElements ( elements . . . ) env . addSource ( sourceFunction ) 55
  • 58. specify transformations on this data • Map • FlatMap • Filter • Reduce • Fold • Union 56
  • 59. 3) specify transformations on this data Map Takes one element and produces one element. data .map { x => x . toInt } 57
  • 60. 3) specify transformations on this data FlatMap Takes one element and produces zero, one, or more elements. data . flatMap { str => str . s p l i t ( ” ” ) } 58
  • 61. 3) specify transformations on this data Filter Evaluates a boolean function for each element and retains those for which the function returns true. data . f i l t e r { _ > 1000 } 59
  • 62. 3) specify transformations on this data Reduce Combines a group of elements into a single element by repeatedly combining two elements into one. Reduce may be applied on a full data set, or on a grouped data set. data . reduce { _ + _ } 60
  • 63. 3) specify transformations on this data Union Produces the union of two data sets. data . union ( data2 ) 61
  • 64. window operators The user has different ways of using the result of a window operation: • windowedDataStream.flatten() - streams the results element wise and returns a DataStream<T> where T is the type of the underlying windowed stream • windowedDataStream.getDiscretizedStream() - returns a DataStream<StreamWindow<T» for applying some advanced logic on the stream windows itself. • Calling any window transformation further transforms the windows, while preserving the windowing logic dataStream . window ( Time . of (5 , TimeUnit .SECONDS) ) . every ( Time . of (1 , TimeUnit .SECONDS ) ) ; dataStream . window ( Count . of (100)) . every ( Time . of (1 , TimeUnit .MINUTES ) ) ; 6 62
  • 65. gelly: flink graph api • Gelly is a Java Graph API for Flink. • In Gelly, graphs can be transformed and modified using high-level functions similar to the ones provided by the batch processing API. • In Gelly, a Graph is represented by a DataSet of vertices and a DataSet of edges. • The Graph nodes are represented by the Vertex type. A Vertex is defined by a unique ID and a value. // create a new vertex with a Long ID and a String value Vertex <Long , String > v = new Vertex <Long , String >(1L , ” foo ” ) ; // create a new vertex with a Long ID and no value Vertex <Long , NullValue > v = new Vertex <Long , NullValue >(1L , NullValue . getInstance ( ) ) ; 63
  • 66. gelly: flink graph api • The graph edges are represented by the Edge type. • An Edge is defined by a source ID (the ID of the source Vertex), a target ID (the ID of the target Vertex) and an optional value. • The source and target IDs should be of the same type as the Vertex IDs. Edges with no value have a NullValue value type. Edge<Long , Double > e = new Edge<Long , Double >(1L , 2L , 0 . 5 ) ; // reverse the source and target of this edge Edge<Long , Double > reversed = e . reverse ( ) ; Double weight = e . getValue ( ) ; // weight = 0.5 64
  • 67. table api - relational queries • Flink provides an API that allows specifying operations using SQL-like expressions. • Instead of manipulating DataSet or DataStream you work with Table on which relational operations can be performed. import org . apache . f l i n k . api . scala . _ import org . apache . f l i n k . api . scala . table . _ case class WC( word : String , count : I nt ) val input = env . fromElements (WC( ” hello ” , 1) , WC( ” hello ” , 1) , WC( ” ciao ” , 1)) val expr = input . toTable val result = expr . groupBy ( ’ word ) . select ( ’ word , ’ count .sum as ’ count ) . toDataSet [WC] 65