SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
spark streaming 
with C* 
jacek.lewandowski@datastax.com
…applies where you need 
near-realtime data analysis
Spark vs Spark Streaming 
zillions of bytes gigabytes per second 
static dataset 
stream of data
What can you do with it? 
applications sensors web mobile phones 
intrusion detection malfunction detection site analytics network metrics analysis 
fraud detection 
dynamic process 
optimisation 
recommendations location based ads 
log processing supply chain planning sentiment analysis spying
What can you do with it? 
applications sensors web mobile phones 
intrusion detection malfunction detection site analytics network metrics analysis 
fraud detection 
dynamic process 
optimisation 
recommendations location based ads 
log processing supply chain planning sentiment analysis spying
Almost 
Whatever 
Source 
You 
Want 
Almost 
Whatever 
Destination 
You 
Want
so, let’s see how it works
DStream - A continuous sequence 
of micro batches 
DStream 
μBatch (ordinary RDD) μBatch (ordinary RDD) μBatch (ordinary RDD) 
Processing of DStream = Processing of μBatches, RDDs
9 8 7 6 5 4 3 2 1 Receiver Interface between different 
stream sources and Spark
9 8 7 6 5 4 3 2 1 Receiver 
Spark memory boundary 
Block Manager 
Interface between different 
stream sources and Spark
9 8 7 6 5 4 3 2 1 Receiver 
Spark memory boundary 
Block Manager 
Replication and 
building μBatches 
Interface between different 
stream sources and Spark
Spark memory boundary Block Manager
Spark memory boundary Block Manager 
Blocks of input data 
9 8 7 6 5 4 3 2 1
Spark memory boundary Block Manager 
Blocks of input data 
9 8 7 6 5 4 3 2 1 
μBatch made of blocks 
9 8 7 6 5 4 3 2 1
μBatch made of blocks 
9 8 7 6 5 4 3 2 1
μBatch made of blocks 
9 8 7 6 5 4 3 2 1 
Partition Partition Partition
μBatch made of blocks 
9 8 7 6 5 4 3 2 1 
Partition Partition Partition
Ingestion from multiple sources 
Receiving, 
μBatch building 
Receiving, 
μBatch building 
Receiving, 
μBatch building
Ingestion from multiple sources 
Receiving, 
μBatch building 
Receiving, 
μBatch building 
Receiving, 
μBatch building 
μBatch μBatch 
2s 1s 0s
A well-worn example 
• ingestion of text messages 
• splitting them into separate words 
• count the occurrence of words within 5 
seconds windows 
• save word counts from the last 5 seconds, 
every 5 second to Cassandra, and display the 
first few results on the console
how to do that ? 
well…
Yes, it is that easy 
case class WordCount(time: Long, word: String, count: Int) 
val paragraphs: DStream[String] = stream.map { case (_, paragraph) => paragraph} 
val words: DStream[String] = paragraphs.flatMap(_.split( """s+""")) 
val wordCounts: DStream[(String, Long)] = words.countByValue() 
val topWordCounts: DStream[WordCount] = wordCounts.transform((rdd, time) => 
val mappedWordCounts: RDD[(Int, WordCount)] = rdd.map { 
case (word, count) => 
(count.toInt, WordCount(time.milliseconds, word, count.toInt)) 
} 
val topWordCountsRDD: RDD[WordCount] = mappedWordCounts 
.sortByKey(ascending = false).values 
) 
topWordsStream.saveToCassandra("meetup", "word_counts") 
topWordsStream.print()
DStream stateless operators 
(quick recap) 
• map 
• flatMap 
• filter 
• repartition 
• union 
• count 
• countByValue 
• reduce 
• reduceByKey 
• joins 
• cogroup 
• transform 
• transformWith
DStream[Bean].count() 
count 4 3 
1s 1s 1s 1s
DStream[Bean].count() 
count 4 3 
1s 1s 1s 1s
DStream[Orange].union(DStream[Apple]) 
union 
1s 1s
Other stateless operations 
• join(DStream[(K, W)]) 
• leftOuterJoin(DStream[(K, W)]) 
• rightOuterJoin(DStream[(K, W)]) 
• cogroup(DStream[(K, W)]) 
are applied on pairs of corresponding μBatches
transform, transformWith 
• DStream[T].transform(RDD[T] => RDD[U]): DStream[U] 
• DStream[T].transformWith(DStream[U], (RDD[T], RDD[U]) => RDD[V]): DStream[V] 
allow you to create new stateless operators
DStream[Blue].transformWith 
(DStream[Red], …): DStream[Violet] 
1-A 2-A 3-A 
1-B 2-B 3-B 
1-A x 1-B 2-A x 2-B 3-A x 3-B
DStream[Blue].transformWith 
(DStream[Red], …): DStream[Violet] 
1-A 2-A 3-A 
1-B 2-B 3-B 
1-A x 1-B 2-A x 2-B 3-A x 3-B
DStream[Blue].transformWith 
(DStream[Red], …): DStream[Violet] 
1-A 2-A 3-A 
1-B 2-B 3-B 
1-A x 1-B 2-A x 2-B 3-A x 3-B
Windowing 
slide 
0s 1s 2s 3s 4s 5s 6s 7s 
By default: 
window = slide = μBatch duration 
window
Windowing 
slide 
0s 1s 2s 3s 4s 5s 6s 7s 
By default: 
window = slide = μBatch duration 
window
Windowing 
slide 
0s 1s 2s 3s 4s 5s 6s 7s 
By default: 
window = slide = μBatch duration 
window
Windowing 
slide 
window 
0s 1s 2s 3s 4s 5s 6s 7s 
The resulting DStream consists of 3 seconds μBatches 
! 
Each resulting μBatch overlaps the preceding one by 1 second
Windowing 
slide 
window 
0s 1s 2s 3s 4s 5s 6s 7s 
The resulting DStream consists of 3 seconds μBatches 
! 
Each resulting μBatch overlaps the preceding one by 1 second
Windowing 
slide 
window 
0s 1s 2s 3s 4s 5s 6s 7s 
The resulting DStream consists of 3 seconds μBatches 
! 
Each resulting μBatch overlaps the preceding one by 1 second
Windowing 
slide 
window 
1 2 3 4 5 6 7 8 window 1 2 3 4 5 6 3 4 5 6 7 8 
μBatch appears in output stream every 1s 
! 
It contains messages collected during 3s 
1s
Windowing 
slide 
window 
1 2 3 4 5 6 7 8 window 1 2 3 4 5 6 3 4 5 6 7 8 
μBatch appears in output stream every 1s 
! 
It contains messages collected during 3s 
1s
DStream window operators 
• window(Duration, Duration) 
• countByWindow(Duration, Duration) 
• reduceByWindow(Duration, Duration, (T, T) => T) 
• countByValueAndWindow(Duration, Duration) 
• groupByKeyAndWindow(Duration, Duration) 
• reduceByKeyAndWindow((V, V) => V, Duration, Duration)
Let’s modify the example 
• ingestion of text messages 
• splitting them into separate words 
• count the occurrence of words within 10 
seconds windows 
• save word counts from the last 10 seconds, 
every 2 second to Cassandra, and display the 
first few results on the console
Yes, it is still easy to do 
case class WordCount(time: Long, word: String, count: Int) 
val paragraphs: DStream[String] = stream.map { case (_, paragraph) => paragraph} 
val words: DStream[String] = paragraphs.flatMap(_.split( """s+""")) 
val wordCounts: DStream[(String, Long)] = words.countByValueAndWindow(Seconds(10), Seconds(2)) 
val topWordCounts: DStream[WordCount] = wordCounts.transform((rdd, time) => 
val mappedWordCounts: RDD[(Int, WordCount)] = rdd.map { 
case (word, count) => 
(count.toInt, WordCount(time.milliseconds, word, count.toInt)) 
} 
val topWordCountsRDD: RDD[WordCount] = mappedWordCounts 
.sortByKey(ascending = false).values 
) 
topWordsStream.saveToCassandra("meetup", "word_counts") 
topWordsStream.print()
DStream stateful operator 
• DStream[(K, V)].updateStateByKey 
(f: (Seq[V], Option[S]) => Option[S]): DStream[(K, S)] 
A 
1 
B 
2 
A 
3 
C 
4 
A 
5 
B 
6 
A 
7 
B 
8 
C 
9 
• R1 = f(Seq(1, 3, 5), Some(7)) 
• R2 = f(Seq(2, 6), Some(8)) 
• R3 = f(Seq(4), Some(9)) 
A 
R1 
B 
R2 
C 
R3
Total word count example 
case class WordCount(time: Long, word: String, count: Int) 
def update(counts: Seq[Long], state: Option[Long]): Option[Long] = { 
val sum = counts.sum 
Some(state.getOrElse(0L) + sum) 
} 
val totalWords: DStream[(String, Long)] = 
stream.map { case (_, paragraph) => paragraph} 
.flatMap(_.split( """s+""")) 
.countByValue() 
.updateStateByKey(update) 
val topTotalWordCounts: DStream[WordCount] = 
totalWords.transform((rdd, time) => 
rdd.map { case (word, count) => 
(count, WordCount(time.milliseconds, word, count.toInt)) 
}.sortByKey(ascending = false).values 
) 
topTotalWordCounts.saveToCassandra("meetup", "word_counts_total") 
topTotalWordCounts.print()
Obtaining DStreams 
• ZeroMQ 
• Kinesis 
• HDFS compatible file system 
• Akka actor 
• Twitter 
• MQTT 
• Kafka 
• Socket 
• Flume 
• …
Particular DStreams 
are available in separate modules 
GroupId ArtifactId Latest Version 
org.apache.spark spark-streaming-kinesis-asl_2.10 1.1.0 
org.apache.spark spark-streaming-mqtt_2.10 1.1.0 all (7) 
org.apache.spark spark-streaming-zeromq_2.10 1.1.0 all (7) 
org.apache.spark spark-streaming-flume_2.10 1.1.0 all (7) 
org.apache.spark spark-streaming-flume-sink_2.10 1.1.0 
org.apache.spark spark-streaming-kafka_2.10 1.1.0 all (7) 
org.apache.spark spark-streaming-twitter_2.10 1.1.0 all (7)
If something goes wrong…
Fault tolerance 
The sequence 
of transformations is known 
to Spark Streaming 
μBatches are replicated 
once they are received 
Lost data can be recomputed
But there are pitfalls 
• Spark replicates blocks, not single messages 
• It is up to a particular receiver to decide whether to form the block from a 
single message or to collect more messages before pushing the block 
• The data collected in the receiver before the block is pushed will be lost in 
case of failure of the receiver 
• Typical tradeoff - efficiency vs fault tolerance
Built-in receivers breakdown 
Pushing single 
messages 
Can do both Pushing whole blocks 
Kafka Akka RawNetworkReceiver 
Twitter Custom ZeroMQ 
Socket 
MQTT
Thank you ! 
Questions? 
! 
http://spark.apache.org/ 
https://github.com/datastax/spark-cassandra-connector 
http://cassandra.apache.org/ 
http://www.datastax.com/

Contenu connexe

Tendances

Meet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaMeet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaKnoldus Inc.
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache SparkJosef Adersberger
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureRussell Spitzer
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.Sergey Zelvenskiy
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...DataWorks Summit
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityRussell Spitzer
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark Summit
 

Tendances (20)

Meet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaMeet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + Kafka
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 

En vedette

Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...DataStax
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
The biodegradation of Polystyrene
The biodegradation of PolystyreneThe biodegradation of Polystyrene
The biodegradation of PolystyrenePat Pataranutaporn
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & ZeppelinVinay Shukla
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...DataStax
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedGuido Schmutz
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 

En vedette (18)

Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Cassandra and IoT
Cassandra and IoTCassandra and IoT
Cassandra and IoT
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
The biodegradation of Polystyrene
The biodegradation of PolystyreneThe biodegradation of Polystyrene
The biodegradation of Polystyrene
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similaire à Spark Streaming with Cassandra

Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptrveiga100
 
The Dynamic Language is not Enough
The Dynamic Language is not EnoughThe Dynamic Language is not Enough
The Dynamic Language is not EnoughLukas Renggli
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streamingTao Li
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptAbhijitManna19
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptsnowflakebatch
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingShidrokhGoudarzi1
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Samir Bessalah
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...Databricks
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 

Similaire à Spark Streaming with Cassandra (20)

So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
The Dynamic Language is not Enough
The Dynamic Language is not EnoughThe Dynamic Language is not Enough
The Dynamic Language is not Enough
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streaming
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streaming
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
About time
About timeAbout time
About time
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
AWS Data Collection & Storage
AWS Data Collection & StorageAWS Data Collection & Storage
AWS Data Collection & Storage
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 

Dernier

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Dernier (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Spark Streaming with Cassandra

  • 1. spark streaming with C* jacek.lewandowski@datastax.com
  • 2. …applies where you need near-realtime data analysis
  • 3. Spark vs Spark Streaming zillions of bytes gigabytes per second static dataset stream of data
  • 4. What can you do with it? applications sensors web mobile phones intrusion detection malfunction detection site analytics network metrics analysis fraud detection dynamic process optimisation recommendations location based ads log processing supply chain planning sentiment analysis spying
  • 5. What can you do with it? applications sensors web mobile phones intrusion detection malfunction detection site analytics network metrics analysis fraud detection dynamic process optimisation recommendations location based ads log processing supply chain planning sentiment analysis spying
  • 6. Almost Whatever Source You Want Almost Whatever Destination You Want
  • 7.
  • 8.
  • 9. so, let’s see how it works
  • 10. DStream - A continuous sequence of micro batches DStream μBatch (ordinary RDD) μBatch (ordinary RDD) μBatch (ordinary RDD) Processing of DStream = Processing of μBatches, RDDs
  • 11. 9 8 7 6 5 4 3 2 1 Receiver Interface between different stream sources and Spark
  • 12. 9 8 7 6 5 4 3 2 1 Receiver Spark memory boundary Block Manager Interface between different stream sources and Spark
  • 13. 9 8 7 6 5 4 3 2 1 Receiver Spark memory boundary Block Manager Replication and building μBatches Interface between different stream sources and Spark
  • 14. Spark memory boundary Block Manager
  • 15. Spark memory boundary Block Manager Blocks of input data 9 8 7 6 5 4 3 2 1
  • 16. Spark memory boundary Block Manager Blocks of input data 9 8 7 6 5 4 3 2 1 μBatch made of blocks 9 8 7 6 5 4 3 2 1
  • 17. μBatch made of blocks 9 8 7 6 5 4 3 2 1
  • 18. μBatch made of blocks 9 8 7 6 5 4 3 2 1 Partition Partition Partition
  • 19. μBatch made of blocks 9 8 7 6 5 4 3 2 1 Partition Partition Partition
  • 20. Ingestion from multiple sources Receiving, μBatch building Receiving, μBatch building Receiving, μBatch building
  • 21. Ingestion from multiple sources Receiving, μBatch building Receiving, μBatch building Receiving, μBatch building μBatch μBatch 2s 1s 0s
  • 22. A well-worn example • ingestion of text messages • splitting them into separate words • count the occurrence of words within 5 seconds windows • save word counts from the last 5 seconds, every 5 second to Cassandra, and display the first few results on the console
  • 23. how to do that ? well…
  • 24. Yes, it is that easy case class WordCount(time: Long, word: String, count: Int) val paragraphs: DStream[String] = stream.map { case (_, paragraph) => paragraph} val words: DStream[String] = paragraphs.flatMap(_.split( """s+""")) val wordCounts: DStream[(String, Long)] = words.countByValue() val topWordCounts: DStream[WordCount] = wordCounts.transform((rdd, time) => val mappedWordCounts: RDD[(Int, WordCount)] = rdd.map { case (word, count) => (count.toInt, WordCount(time.milliseconds, word, count.toInt)) } val topWordCountsRDD: RDD[WordCount] = mappedWordCounts .sortByKey(ascending = false).values ) topWordsStream.saveToCassandra("meetup", "word_counts") topWordsStream.print()
  • 25. DStream stateless operators (quick recap) • map • flatMap • filter • repartition • union • count • countByValue • reduce • reduceByKey • joins • cogroup • transform • transformWith
  • 29. Other stateless operations • join(DStream[(K, W)]) • leftOuterJoin(DStream[(K, W)]) • rightOuterJoin(DStream[(K, W)]) • cogroup(DStream[(K, W)]) are applied on pairs of corresponding μBatches
  • 30. transform, transformWith • DStream[T].transform(RDD[T] => RDD[U]): DStream[U] • DStream[T].transformWith(DStream[U], (RDD[T], RDD[U]) => RDD[V]): DStream[V] allow you to create new stateless operators
  • 31. DStream[Blue].transformWith (DStream[Red], …): DStream[Violet] 1-A 2-A 3-A 1-B 2-B 3-B 1-A x 1-B 2-A x 2-B 3-A x 3-B
  • 32. DStream[Blue].transformWith (DStream[Red], …): DStream[Violet] 1-A 2-A 3-A 1-B 2-B 3-B 1-A x 1-B 2-A x 2-B 3-A x 3-B
  • 33. DStream[Blue].transformWith (DStream[Red], …): DStream[Violet] 1-A 2-A 3-A 1-B 2-B 3-B 1-A x 1-B 2-A x 2-B 3-A x 3-B
  • 34. Windowing slide 0s 1s 2s 3s 4s 5s 6s 7s By default: window = slide = μBatch duration window
  • 35. Windowing slide 0s 1s 2s 3s 4s 5s 6s 7s By default: window = slide = μBatch duration window
  • 36. Windowing slide 0s 1s 2s 3s 4s 5s 6s 7s By default: window = slide = μBatch duration window
  • 37. Windowing slide window 0s 1s 2s 3s 4s 5s 6s 7s The resulting DStream consists of 3 seconds μBatches ! Each resulting μBatch overlaps the preceding one by 1 second
  • 38. Windowing slide window 0s 1s 2s 3s 4s 5s 6s 7s The resulting DStream consists of 3 seconds μBatches ! Each resulting μBatch overlaps the preceding one by 1 second
  • 39. Windowing slide window 0s 1s 2s 3s 4s 5s 6s 7s The resulting DStream consists of 3 seconds μBatches ! Each resulting μBatch overlaps the preceding one by 1 second
  • 40. Windowing slide window 1 2 3 4 5 6 7 8 window 1 2 3 4 5 6 3 4 5 6 7 8 μBatch appears in output stream every 1s ! It contains messages collected during 3s 1s
  • 41. Windowing slide window 1 2 3 4 5 6 7 8 window 1 2 3 4 5 6 3 4 5 6 7 8 μBatch appears in output stream every 1s ! It contains messages collected during 3s 1s
  • 42. DStream window operators • window(Duration, Duration) • countByWindow(Duration, Duration) • reduceByWindow(Duration, Duration, (T, T) => T) • countByValueAndWindow(Duration, Duration) • groupByKeyAndWindow(Duration, Duration) • reduceByKeyAndWindow((V, V) => V, Duration, Duration)
  • 43. Let’s modify the example • ingestion of text messages • splitting them into separate words • count the occurrence of words within 10 seconds windows • save word counts from the last 10 seconds, every 2 second to Cassandra, and display the first few results on the console
  • 44. Yes, it is still easy to do case class WordCount(time: Long, word: String, count: Int) val paragraphs: DStream[String] = stream.map { case (_, paragraph) => paragraph} val words: DStream[String] = paragraphs.flatMap(_.split( """s+""")) val wordCounts: DStream[(String, Long)] = words.countByValueAndWindow(Seconds(10), Seconds(2)) val topWordCounts: DStream[WordCount] = wordCounts.transform((rdd, time) => val mappedWordCounts: RDD[(Int, WordCount)] = rdd.map { case (word, count) => (count.toInt, WordCount(time.milliseconds, word, count.toInt)) } val topWordCountsRDD: RDD[WordCount] = mappedWordCounts .sortByKey(ascending = false).values ) topWordsStream.saveToCassandra("meetup", "word_counts") topWordsStream.print()
  • 45. DStream stateful operator • DStream[(K, V)].updateStateByKey (f: (Seq[V], Option[S]) => Option[S]): DStream[(K, S)] A 1 B 2 A 3 C 4 A 5 B 6 A 7 B 8 C 9 • R1 = f(Seq(1, 3, 5), Some(7)) • R2 = f(Seq(2, 6), Some(8)) • R3 = f(Seq(4), Some(9)) A R1 B R2 C R3
  • 46. Total word count example case class WordCount(time: Long, word: String, count: Int) def update(counts: Seq[Long], state: Option[Long]): Option[Long] = { val sum = counts.sum Some(state.getOrElse(0L) + sum) } val totalWords: DStream[(String, Long)] = stream.map { case (_, paragraph) => paragraph} .flatMap(_.split( """s+""")) .countByValue() .updateStateByKey(update) val topTotalWordCounts: DStream[WordCount] = totalWords.transform((rdd, time) => rdd.map { case (word, count) => (count, WordCount(time.milliseconds, word, count.toInt)) }.sortByKey(ascending = false).values ) topTotalWordCounts.saveToCassandra("meetup", "word_counts_total") topTotalWordCounts.print()
  • 47. Obtaining DStreams • ZeroMQ • Kinesis • HDFS compatible file system • Akka actor • Twitter • MQTT • Kafka • Socket • Flume • …
  • 48. Particular DStreams are available in separate modules GroupId ArtifactId Latest Version org.apache.spark spark-streaming-kinesis-asl_2.10 1.1.0 org.apache.spark spark-streaming-mqtt_2.10 1.1.0 all (7) org.apache.spark spark-streaming-zeromq_2.10 1.1.0 all (7) org.apache.spark spark-streaming-flume_2.10 1.1.0 all (7) org.apache.spark spark-streaming-flume-sink_2.10 1.1.0 org.apache.spark spark-streaming-kafka_2.10 1.1.0 all (7) org.apache.spark spark-streaming-twitter_2.10 1.1.0 all (7)
  • 49. If something goes wrong…
  • 50. Fault tolerance The sequence of transformations is known to Spark Streaming μBatches are replicated once they are received Lost data can be recomputed
  • 51. But there are pitfalls • Spark replicates blocks, not single messages • It is up to a particular receiver to decide whether to form the block from a single message or to collect more messages before pushing the block • The data collected in the receiver before the block is pushed will be lost in case of failure of the receiver • Typical tradeoff - efficiency vs fault tolerance
  • 52. Built-in receivers breakdown Pushing single messages Can do both Pushing whole blocks Kafka Akka RawNetworkReceiver Twitter Custom ZeroMQ Socket MQTT
  • 53. Thank you ! Questions? ! http://spark.apache.org/ https://github.com/datastax/spark-cassandra-connector http://cassandra.apache.org/ http://www.datastax.com/