SlideShare une entreprise Scribd logo
1  sur  50
Have Your Cake
and
Eat It Too
Architectures for Batch and Stream
Processing
Speaker name // Speaker title
2
Stuff We’ll Talk About
• Why do we need both streams and batches
• Why is it a problem?
• Stream-Only Patterns (i.e. Kappa Architecture)
• Lambda-Architecture Technologies
– SummingBird
– Apache Spark
– Apache Flink
– Bring-your-own-framework
3©2014 Cloudera, Inc. All rights reserved.
• 15 years of moving data
• Formerly consultant
• Now Cloudera Engineer:
– Sqoop Committer
– Kafka
– Flume
• @gwenshap
About Me
4
Why Streaming
and Batch
©2014 Cloudera, Inc. All rights reserved.
5
Batch Processing
• Store data somewhere
• Read large chunks of data
• Do something with data
• Sometimes store results
6
Batch Examples
• Analytics
• ETL / ELT
• Training machine learning models
• Recommendations
Click to enter confidentiality information
7
Stream Processing
• Listen to incoming events
• Do something with each event
• Maybe store events / results
Click to enter confidentiality information
8
Stream Processing Examples
• Anomaly detection, alerts
• Monitoring, SLAs
• Operational intelligence
• Analytics, dashboards
• ETL
Click to enter confidentiality information
9
Streaming & Batch
Click to enter confidentiality information
Alerts
Monitoring, SLAs
Operational Intelligence
Risk Analysis
Anomaly
detection
Analytics
ETL
10
Four Categories
• Streams Only
• Batch Only
• Can be done in both
• Must be done in both
Click to enter confidentiality information
ETL
Some Analytics
11
ETL
Most Stream Processing projects I see involve few simple
transformations.
• Currency conversion
• JSON to Avro
• Field extraction
• Joining a stream to a static data set
• Aggregate on window
• Identifying change in trend
• Document indexing
Click to enter confidentiality information
12
Batch || Streaming
• Efficient:
– Lower CPU utilization
– Better network and disk throughput
– Fewer locks and waits
• Easier administration
• Easier integration with RDBMS
• Existing expertise
• Existing tools
• Real-time information
Click to enter confidentiality information
13
The Problem
©2014 Cloudera, Inc. All rights reserved.
14
We Like
• Efficiency
• Scalability
• Fault Tolerance
• Recovery from errors
• Experimenting with different
approaches
• Debuggers
• Cookies
Click to enter confidentiality information
15
But…
We don’t like
Maintaining two applications
That do the same thing
Click to enter confidentiality information
16
Do we really need to maintain same app
twice?
Yes, because:
• We are not sure about requirements
• We sometimes need to re-process
with very high efficiency
Not really:
• Different apps for batch and
streaming
• Can re-process with streams
• Can error-correct with streams
• Can maintain one code-base
for batches and streams
Click to enter confidentiality information
17
Stream-Only
Patterns
(Kappa
Architecture)
Click to enter confidentiality information
18
DWH Example
Click to enter confidentiality information
OLTP DB
Sensors,
Logs
DWH
Fact Table
(Partitioned)
Real Time
Fact Tables
Dimensio
n
Dimensio
n
Dimensio
n
Views
Aggregat
es
App 1:
Stream
processing
App 2:
Occasional load
19
We need to fix older data
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Streaming
App v1
Streaming
App v2
Real-Time
Table
Replacement
Partition
Partitioned
Fact Table
20
We need to fix older data
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Streaming
App v1
Streaming
App v2
Real-Time
Table
Replacement
Partition
Partitioned
Fact Table
21
We need to fix older data
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Streaming
App v2
Real-Time
Table
22
Lambda-
Architecture
Technologies
Click to enter confidentiality information
23
WordCount in Scala
source.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_+_)
.print()
24
SummingBird
25
MapReduce was great because…
Very simple abstraction:
- Map
- Shuffle
- Reduce
- Type-safe
And it has simpler abstractions on top.
26
SummingBird
• Multi-stage MapReduce
• Run on Hadoop, Spark, Storm
• Very easy to combine
batch and streaming results
Click to enter confidentiality information
27
API
• Platform – Storm, Scalding, Spark…
• Producer.source(Platform) <- get data
• Producer – collection of events
• Transformations – map, filter, merge, leftJoin (lookup)
• Output – write(sink), sumByKey(store)
• Store – contains aggregate for each key, and reduce operation
Click to enter confidentiality information
28
Associative Reduce
Click to enter confidentiality information
29
WordCount SummingBird
def wordCount[P <: Platform[P]]
(source: Producer[P, String], store: P#Store[String, Long]) =
source.flatMap { sentence =>
toWords(sentence).map(_ -> 1L)
}.sumByKey(store)
val stormTopology = Storm.remote(“stormName”).plan(wordCount)
val hadoopJob = Scalding(“scaldingName”).plan(wordCount)
Click to enter confidentiality information
30
SparkStreaming
31
First, there was the RDD
• Spark is its own execution engine
• With high-level API
• RDDs are sharded collections
• Can be mapped, reduced, grouped,
filtered, etc
32
DStream
DStream
DStream
Spark Streaming
Confidentiality Information Goes Here
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
Pre-first
Batch
First
Batch
Second
Batch
33
DStream
DStream
DStreamSpark Streaming
Confidentiality Information Goes Here
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count
Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count
Pre-first
Batch
First
Batch
Second
Batch
Stateful
RDD 1
Print
Stateful
RDD 2
Stateful
RDD 1
34
Compared to SummingBird
Differences:
• Micro-batches
• Completely new execution model
• Real joins
• Reduce is not limited to Monads
• SparkStreaming has Richer API
• Summingbird can aggregate batch
and stream to one dataset
• SparkStreaming runs in debugger
Similarities:
• Almost same code will run in batch
and streams
• Use of Scala
• Use of functional programing
concepts
Click to enter confidentiality information
35
Spark Example
©2014 Cloudera, Inc. All rights reserved.
1. val conf = new SparkConf().setMaster("local[2]”)
2. val sc = new SparkContext(conf)
3. val lines = sc.textFile(path, 2)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. wordCounts.print()
36
Spark Streaming Example
©2014 Cloudera, Inc. All rights reserved.
1. val conf = new SparkConf().setMaster("local[2]”)
2. val ssc = new StreamingContext(conf, Seconds(1))
3. val lines = ssc.socketTextStream("localhost", 9999)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. wordCounts.print()
8. ssc.start()
37
Apache Flink
38
Execution Model
You don’t want to know.
39
Flink vs SparkStreaming
Differences:
• Flink is event-by-event streaming,
events go through pipeline.
• SparkStreaming has good
integration with Hbase as state store
• “checkpoint barriers”
• Optimization based on strong typing
• Flink is newer than SparkStreaming,
there is less production experience
Similarities:
• Very similar APIs
• Built-in stream-specific operators
(windows)
• Exactly once guarantees through
checkpoints of offsets and state
(Flink is limited to small state for
now)
40
WordCount Batch
val env = ExecutionEnvironment.getExecutionEnvironment
val text = getTextDataSet(env)
val counts = text.flatMap { _.toLowerCase.split("W+") filter {
_.nonEmpty } }
.map { (_, 1) } .groupBy(0)
.sum(1)
counts.print()
env.execute(“Wordcount Example”)
41
WordCount Streaming
val env = ExecutionEnvironment.getExecutionEnvironment
val text = env.socketTextStream(host, port)
val counts = text.flatMap { _.toLowerCase.split("W+") filter {
_.nonEmpty } }
.map { (_, 1) } .groupBy(0)
.sum(1)
counts.print()
env.execute(“Wordcount Example”)
42
Bring Your Own
Framework
43
If the requirements are simple…
44
How difficult it is to parallelize
transformations?
Simple transformations
Are simple
45
Just add Kafka
Kafka is a reliable data source
You can read
Batches
Microbatches
Streams
Also allows for re-partitioning
Click to enter confidentiality information
46
Cluster management
• Managing cluster resources used to be difficult
• Now:
– YARN
– Mesos
– Docker
– Kubernetes
Click to enter confidentiality information
47
So your app should…
• Allocate resources and track tasks with YARN / Mesos
• Read from Kafka (however often you want)
• Do simple transformations
• Write to Kafka / Hbase
• How difficult can it possibly be?
Click to enter confidentiality information
48
Parting Thoughts
Click to enter confidentiality information
49
Good engineering lessons
• DRY – do you really need same code twice?
• Error correction is critical
• Reliability guarantees are critical
• Debuggers are really nice
• Latency / Throughput trade-offs
• Use existing expertise
• Stream processing is about patterns
Thank you

Contenu connexe

Tendances

How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormEdureka!
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeGuido Schmutz
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)Henning Spjelkavik
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 

Tendances (20)

Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

En vedette

Kien thuc ho tro 1
Kien thuc ho tro 1Kien thuc ho tro 1
Kien thuc ho tro 1qdai2008
 
Pismo preporuke-Aida B
Pismo preporuke-Aida BPismo preporuke-Aida B
Pismo preporuke-Aida BAida Botonjic
 
Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)
Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)
Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)Andreas Gruss
 
Anarquia y republica conservadora
Anarquia y republica conservadoraAnarquia y republica conservadora
Anarquia y republica conservadoraAndrea Aguilera
 
Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015
Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015 Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015
Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015 CA CISA Jayjit Biswas
 
Minor connectors & rests & rest seats /certified fixed orthodontic courses by...
Minor connectors & rests & rest seats /certified fixed orthodontic courses by...Minor connectors & rests & rest seats /certified fixed orthodontic courses by...
Minor connectors & rests & rest seats /certified fixed orthodontic courses by...Indian dental academy
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersGwen (Chen) Shapira
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analyticsFarheen Nilofer
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Uk job market update - Demand for staff rises at strongest
Uk job market update - Demand for staff rises at strongestUk job market update - Demand for staff rises at strongest
Uk job market update - Demand for staff rises at strongestSteven Jagger
 
Britto&arana2014
Britto&arana2014Britto&arana2014
Britto&arana2014Cesar Arana
 
MAQUINARIA
MAQUINARIAMAQUINARIA
MAQUINARIArexdan98
 

En vedette (18)

Kien thuc ho tro 1
Kien thuc ho tro 1Kien thuc ho tro 1
Kien thuc ho tro 1
 
Ipad manual del_usuario
Ipad manual del_usuarioIpad manual del_usuario
Ipad manual del_usuario
 
Pismo preporuke-Aida B
Pismo preporuke-Aida BPismo preporuke-Aida B
Pismo preporuke-Aida B
 
Los Cinco Sentidos
Los Cinco SentidosLos Cinco Sentidos
Los Cinco Sentidos
 
Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)
Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)
Die lokale Google Suche: Gegenwart und Zukunft (Local SEO)
 
Anarquia y republica conservadora
Anarquia y republica conservadoraAnarquia y republica conservadora
Anarquia y republica conservadora
 
Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015
Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015 Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015
Segregation of duties in SAP @ ISACA Pune presentation on 18.4.2015
 
Minor connectors & rests & rest seats /certified fixed orthodontic courses by...
Minor connectors & rests & rest seats /certified fixed orthodontic courses by...Minor connectors & rests & rest seats /certified fixed orthodontic courses by...
Minor connectors & rests & rest seats /certified fixed orthodontic courses by...
 
Sleazy
SleazySleazy
Sleazy
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analytics
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Uk job market update - Demand for staff rises at strongest
Uk job market update - Demand for staff rises at strongestUk job market update - Demand for staff rises at strongest
Uk job market update - Demand for staff rises at strongest
 
Britto&arana2014
Britto&arana2014Britto&arana2014
Britto&arana2014
 
MAQUINARIA
MAQUINARIAMAQUINARIA
MAQUINARIA
 

Similaire à Have your cake and eat it too

Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPDaniel Zivkovic
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesWeaveworks
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareDatabricks
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson
 

Similaire à Have your cake and eat it too (20)

Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Spark etl
Spark etlSpark etl
Spark etl
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 

Plus de Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3Gwen (Chen) Shapira
 
Visualizing database performance hotsos 13-v2
Visualizing database performance   hotsos 13-v2Visualizing database performance   hotsos 13-v2
Visualizing database performance hotsos 13-v2Gwen (Chen) Shapira
 

Plus de Gwen (Chen) Shapira (18)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3
 
Ssd collab13
Ssd   collab13Ssd   collab13
Ssd collab13
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Visualizing database performance hotsos 13-v2
Visualizing database performance   hotsos 13-v2Visualizing database performance   hotsos 13-v2
Visualizing database performance hotsos 13-v2
 
Flexible Design
Flexible DesignFlexible Design
Flexible Design
 

Dernier

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Dernier (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Have your cake and eat it too

  • 1. Have Your Cake and Eat It Too Architectures for Batch and Stream Processing Speaker name // Speaker title
  • 2. 2 Stuff We’ll Talk About • Why do we need both streams and batches • Why is it a problem? • Stream-Only Patterns (i.e. Kappa Architecture) • Lambda-Architecture Technologies – SummingBird – Apache Spark – Apache Flink – Bring-your-own-framework
  • 3. 3©2014 Cloudera, Inc. All rights reserved. • 15 years of moving data • Formerly consultant • Now Cloudera Engineer: – Sqoop Committer – Kafka – Flume • @gwenshap About Me
  • 4. 4 Why Streaming and Batch ©2014 Cloudera, Inc. All rights reserved.
  • 5. 5 Batch Processing • Store data somewhere • Read large chunks of data • Do something with data • Sometimes store results
  • 6. 6 Batch Examples • Analytics • ETL / ELT • Training machine learning models • Recommendations Click to enter confidentiality information
  • 7. 7 Stream Processing • Listen to incoming events • Do something with each event • Maybe store events / results Click to enter confidentiality information
  • 8. 8 Stream Processing Examples • Anomaly detection, alerts • Monitoring, SLAs • Operational intelligence • Analytics, dashboards • ETL Click to enter confidentiality information
  • 9. 9 Streaming & Batch Click to enter confidentiality information Alerts Monitoring, SLAs Operational Intelligence Risk Analysis Anomaly detection Analytics ETL
  • 10. 10 Four Categories • Streams Only • Batch Only • Can be done in both • Must be done in both Click to enter confidentiality information ETL Some Analytics
  • 11. 11 ETL Most Stream Processing projects I see involve few simple transformations. • Currency conversion • JSON to Avro • Field extraction • Joining a stream to a static data set • Aggregate on window • Identifying change in trend • Document indexing Click to enter confidentiality information
  • 12. 12 Batch || Streaming • Efficient: – Lower CPU utilization – Better network and disk throughput – Fewer locks and waits • Easier administration • Easier integration with RDBMS • Existing expertise • Existing tools • Real-time information Click to enter confidentiality information
  • 13. 13 The Problem ©2014 Cloudera, Inc. All rights reserved.
  • 14. 14 We Like • Efficiency • Scalability • Fault Tolerance • Recovery from errors • Experimenting with different approaches • Debuggers • Cookies Click to enter confidentiality information
  • 15. 15 But… We don’t like Maintaining two applications That do the same thing Click to enter confidentiality information
  • 16. 16 Do we really need to maintain same app twice? Yes, because: • We are not sure about requirements • We sometimes need to re-process with very high efficiency Not really: • Different apps for batch and streaming • Can re-process with streams • Can error-correct with streams • Can maintain one code-base for batches and streams Click to enter confidentiality information
  • 18. 18 DWH Example Click to enter confidentiality information OLTP DB Sensors, Logs DWH Fact Table (Partitioned) Real Time Fact Tables Dimensio n Dimensio n Dimensio n Views Aggregat es App 1: Stream processing App 2: Occasional load
  • 19. 19 We need to fix older data Click to enter confidentiality information 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Streaming App v1 Streaming App v2 Real-Time Table Replacement Partition Partitioned Fact Table
  • 20. 20 We need to fix older data Click to enter confidentiality information 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Streaming App v1 Streaming App v2 Real-Time Table Replacement Partition Partitioned Fact Table
  • 21. 21 We need to fix older data Click to enter confidentiality information 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Streaming App v2 Real-Time Table
  • 23. 23 WordCount in Scala source.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_+_) .print()
  • 25. 25 MapReduce was great because… Very simple abstraction: - Map - Shuffle - Reduce - Type-safe And it has simpler abstractions on top.
  • 26. 26 SummingBird • Multi-stage MapReduce • Run on Hadoop, Spark, Storm • Very easy to combine batch and streaming results Click to enter confidentiality information
  • 27. 27 API • Platform – Storm, Scalding, Spark… • Producer.source(Platform) <- get data • Producer – collection of events • Transformations – map, filter, merge, leftJoin (lookup) • Output – write(sink), sumByKey(store) • Store – contains aggregate for each key, and reduce operation Click to enter confidentiality information
  • 28. 28 Associative Reduce Click to enter confidentiality information
  • 29. 29 WordCount SummingBird def wordCount[P <: Platform[P]] (source: Producer[P, String], store: P#Store[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.sumByKey(store) val stormTopology = Storm.remote(“stormName”).plan(wordCount) val hadoopJob = Scalding(“scaldingName”).plan(wordCount) Click to enter confidentiality information
  • 31. 31 First, there was the RDD • Spark is its own execution engine • With high-level API • RDDs are sharded collections • Can be mapped, reduced, grouped, filtered, etc
  • 32. 32 DStream DStream DStream Spark Streaming Confidentiality Information Goes Here Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD RDD RDD Single Pass Filter Count Print Pre-first Batch First Batch Second Batch
  • 33. 33 DStream DStream DStreamSpark Streaming Confidentiality Information Goes Here Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD RDD RDD Single Pass Filter Count Pre-first Batch First Batch Second Batch Stateful RDD 1 Print Stateful RDD 2 Stateful RDD 1
  • 34. 34 Compared to SummingBird Differences: • Micro-batches • Completely new execution model • Real joins • Reduce is not limited to Monads • SparkStreaming has Richer API • Summingbird can aggregate batch and stream to one dataset • SparkStreaming runs in debugger Similarities: • Almost same code will run in batch and streams • Use of Scala • Use of functional programing concepts Click to enter confidentiality information
  • 35. 35 Spark Example ©2014 Cloudera, Inc. All rights reserved. 1. val conf = new SparkConf().setMaster("local[2]”) 2. val sc = new SparkContext(conf) 3. val lines = sc.textFile(path, 2) 4. val words = lines.flatMap(_.split(" ")) 5. val pairs = words.map(word => (word, 1)) 6. val wordCounts = pairs.reduceByKey(_ + _) 7. wordCounts.print()
  • 36. 36 Spark Streaming Example ©2014 Cloudera, Inc. All rights reserved. 1. val conf = new SparkConf().setMaster("local[2]”) 2. val ssc = new StreamingContext(conf, Seconds(1)) 3. val lines = ssc.socketTextStream("localhost", 9999) 4. val words = lines.flatMap(_.split(" ")) 5. val pairs = words.map(word => (word, 1)) 6. val wordCounts = pairs.reduceByKey(_ + _) 7. wordCounts.print() 8. ssc.start()
  • 39. 39 Flink vs SparkStreaming Differences: • Flink is event-by-event streaming, events go through pipeline. • SparkStreaming has good integration with Hbase as state store • “checkpoint barriers” • Optimization based on strong typing • Flink is newer than SparkStreaming, there is less production experience Similarities: • Very similar APIs • Built-in stream-specific operators (windows) • Exactly once guarantees through checkpoints of offsets and state (Flink is limited to small state for now)
  • 40. 40 WordCount Batch val env = ExecutionEnvironment.getExecutionEnvironment val text = getTextDataSet(env) val counts = text.flatMap { _.toLowerCase.split("W+") filter { _.nonEmpty } } .map { (_, 1) } .groupBy(0) .sum(1) counts.print() env.execute(“Wordcount Example”)
  • 41. 41 WordCount Streaming val env = ExecutionEnvironment.getExecutionEnvironment val text = env.socketTextStream(host, port) val counts = text.flatMap { _.toLowerCase.split("W+") filter { _.nonEmpty } } .map { (_, 1) } .groupBy(0) .sum(1) counts.print() env.execute(“Wordcount Example”)
  • 43. 43 If the requirements are simple…
  • 44. 44 How difficult it is to parallelize transformations? Simple transformations Are simple
  • 45. 45 Just add Kafka Kafka is a reliable data source You can read Batches Microbatches Streams Also allows for re-partitioning Click to enter confidentiality information
  • 46. 46 Cluster management • Managing cluster resources used to be difficult • Now: – YARN – Mesos – Docker – Kubernetes Click to enter confidentiality information
  • 47. 47 So your app should… • Allocate resources and track tasks with YARN / Mesos • Read from Kafka (however often you want) • Do simple transformations • Write to Kafka / Hbase • How difficult can it possibly be? Click to enter confidentiality information
  • 48. 48 Parting Thoughts Click to enter confidentiality information
  • 49. 49 Good engineering lessons • DRY – do you really need same code twice? • Error correction is critical • Reliability guarantees are critical • Debuggers are really nice • Latency / Throughput trade-offs • Use existing expertise • Stream processing is about patterns

Notes de l'éditeur

  1. This gives me a lot of perspective regarding the use of Hadoop
  2. Algebird has tons of associative reducers