SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Data Stream Processing – Concepts and Frameworks
Matthias Niehoff
1
AGENDA
2
Typical Problems
Basic Ideas
Streaming Frameworks
Current Innovations
Recommendations
3
Basic Ideas
Data Stream Processing – Why and what is it?
Batch Layer
Speed Layer
Current Situation of Dealing with (Big) Data
4
IoT Sensor Data
Industrial Machines,
Consumer Electronic,
Agriculture
Click Streams
Online Shops, Self Service
Portals, Comparison Portals
Monitoring
System Health, Traffic
between Systems,
Resource Utilization
Online Gaming
Gamer Interactions, Reward
Systems, Custom Content
& Experiences
Automotive Industry
Vehicle Tracking, Predictive
Maintenance , Routing
Information
Financial Transactions
Fraud Detection, Trade
Monitoring and
Management
Sources for streaming data can not only be found in the frequently
mentioned IoT area. In many other industries incur streaming data.
Strictly speaking, any data can be viewed as a stream. Some of the
most popular use cases and examples are:
5
Sources for Streaming Data
Distributed Stream Processing
6
7
Endless &
Continuous Data
7
8
Speed &
Realtime
9
Distributed &
Scalable
First step – Microbatching
10
Source
Processing
Sink
Microbatches
Native Streaming
11
Source
Processing
Sink
12
Typical Problems
and the way frameworks tackle them
13
Time
14
Order
Event time vs processing time
15
event
processing
1 2 3 4 5 6 7 8 9t in minutes
Windowing - Slicing data into chunks
16
Tumbling Window Sliding Window Session Window
Time Trigger Count Trigger Content Trigger
Tumbling & Sliding Windows
17
4 5 3 6 1 5 9 2 8 6 7 2
4 5 3 6 1 5 9 2 8 6 7 2
18 17 23
tumbling windows
sum
4 5 3 6 1 5 9 2 8 6 7 2
18 17 23sum
4 5 3 6 1 5 9 2 8 6 7 2
15 25
sliding windows
Session Window
18
time
user 1
user 2
?
logout
delayed event
Session Window
19
time
user 1
user 2
logout
delayed event
The Dataflow Model: 

A Practical Approach to Balancing Correctness, Latency, and Cost
in Massive-Scale, Unbounded, Out-of-Order Data Processing
20
[...] stop trying to groom unbounded
datasets into finite pools of information that eventually
become complete, and instead live and breathe under
the assumption that we will never know if or when we have
seen all of our data, only that new data will arrive, old data
may be retracted [...]
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
•Part 1 of „When will the result be calculated?“
•Watermark of all received data
•A watermark of 10:00 means „It is assumed that all
data until 10:00 now arrived“
•fix watermark
•heuristic watermark
•A window will be materialized/processed when
watermark equals end of window
21
Watermarks
Watermarks
22
event time
processingtime
3
6
4,5
Trigger
23
ContentEvent Time Processing Time Count Composite
•Part 2 of „When will the result be calculated?“
•Triggers an (additional) materialization of the window
•Example
•every 10 minutes (in processing time)
•& when the watermark reached the end of the window
•& with each delayed event
•but only for additional 15 minutes in processing time (allowed lateness)
Accumulators
Joining the individual (triggered) results
•every result on its own (discarding)
•Results based on each other (accumulating)
•Results based on each other & correction of the old
result (accumulating & retracting)
24
Accumulators
25
Discarding Accumulating
Accumulating &
Retracting
(5,2) 7 7 7
(8,3) 11 18 18, -7
(4) 4 22 22, -18
Last value 4 22 22
Total sum 22 47 22
5 2 | 8 3 | 4
Watermarks, Trigger, Accumulators
vgl. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
26
	input	
		.apply(Window.into(FixedWindows.of(Duration.standardMinutes(60)))	
				.triggering(	
						AtWatermark()	
								.withEarlyFirings(AtPeriod(Duration.standardMinutes(10)))	
								.withLateFirings(AtCount(1))))	
				.withAllowedLateness(Duration.standardMinutes(15)))

				.discardingFiredPanes())	
		.apply(Sum.integersPerKey());
Stream ~ Table Model
•Aggregating a stream over time yields a table
•Changes to a table over time yields a stream
•Table will be updated by every entry in the stream
•Every new entry triggers a computation
•Retention period for late events (c.f. allowed lateness)
•Stream/Table ⊆ Dataflow
27
(key1, value1) key1 value1 1
key1 value3 2
key2 value2 1
key1 value1 1
key2 value2 1
(key2, value2)
(key1, value3)
key value
update
count
28
Stateful Processing
State & Window Processing
•Non trivial applications mostly need some kind of (temporal) persistent state
•i.e aggregations over a longer time, counter, slowly refreshing metadata
•held in memory, can be stored on disk
•interesting: partitioning, rescaling, node failure?
29
state
operation
State Implementations
•State is most of the time partitioned
•Distributed over multiple nodes
•Number of nodes might change
•State must be fault-tolerant
•State access must be fast
•Storage backend
•native/own-build: i.e. in Spark Streaming
•existing tools: RocksDB in Kafka Streams
•pluggable: Flink, amongst others also RocksDB
•Carbone et. al. (2017), State Management in Flink, 

http://www.vldb.org/pvldb/vol10/p1718-carbone.pdf
30
31
Data Lookup
Lookup Additional Data
32
Results
Queue Processing
Metadata
Lookup - Remote Read
33
Queue Metadata
Node 2
Node 1
cc
cc
cc
cc
cc
cc
cc
cc
Lookup - Local Read
34
Queue Metadata
Node 2
Node 1
cc
cc
cc
cc
cccc
cc cc
35
Deployment &
Runtime Environment
Runtime Environment - Cluster vs. Library
36
YARN
Framework
Dependent	
•UI
•REST APIs
•Metrics
Scheduler
Monitoring
Own 

Logging
•Technical
•Business
Java 

„Classics"	
•JMX
•Profiler
37
Monitoring
38
Delivery Guarantees
Guarantees
39
at-most-once at-least-once exactly-once
Record Acknowledgement
Micro Batching
Snapshots/Checkpoints
Changelogs
Guarantees
40
at-most-once at-least-once exactly-once
41
Streaming Frameworks
Helping you implement your solution
Tyler Akidau
“ ... an execution engine designed for unbounded data sets, and nothing more”
42
T. Akidau et. al (2015): The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
Apache Spark
•Open Source (2010) & Apache project (2013)
•Unified Batch & Stream Processing
•Wide distribution, especially in Hadoop environments
•Batch: RDD as base, DataFrames and DataSets as optimization
•Streaming: DStream & Structured Streaming
43
Apache Spark Streaming
•Microbatching
•Similiar, partly unified, programming model as with batch processing
•State and window operations
•Missing support for event time
44
Apache Spark Structured Streaming
•DataSets/DataFrames for streaming processing
•DataStream as an ever-growing table
•Unified API
•Limited support for event time operations
45
val	ds	=	sparkSession	
	.read	
	.json("someFile.json")
ds	
	.write	
	.json("otherFile.json")
val	ds	=	sparkSession	
	.readStream	
	.format("kafka")	
	.option("...","...")	
	.load	
ds	
	.writeStream	
	.outputMode("complete")	
	.format("console")	
	.start()
Apache Flink
•Started as research project in 2010 (Stratosphere), Apache project since 2014
•Low latency streaming and high throughput batch processing
•Streaming first
•Flexible state and window handling
•Rich support for event time handling
46
Apache Kafka Streams API
•Only a library, no runtime environment
•Requires Kafka cluster ( >= 0.10)
•Uses Kafka consumer technologies for
•Ordering
•Partitioning
•Scaling
•Source & sink: Kafka topics only
•Kafka Connect for sources & sinks
47
48
Current developments
The latest promises and features
Queryable State
•Known as
•Queryable state (Flink)
•Interactive Queries (Kafka Streams)
•Still low level
•Data lifecycle
•(De)Serialization
•Partitioned state discovery
49
state
operation
query
interface
Streaming SQL
•Use SQL to query Streaming Data
•time varying relations i.e. [12:00, 12:00)
•query on multiple points in time
•Standard ANSI SQL + some extensions
•SELECT TABLE, SELECT STREAM
•WINDOWS
•TRIGGERS
•Supported by
•Flink
•Kafka Streams (KSQL)
•https://s.apache.org/streaming-sql-strata-nyc
50
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
51
Recommendations
Or at least some hints when choosing a framework
Spark Streaming might be an option if
•Spark is already used for batch processing
•Hadoop, and therefore YARN, is used
•A huge community is important
•Scala is not a problem
•Latency is not an important criteria *
•Event time handling is not needed *
•* those change in Structured Streaming
•event time support
•reduce microbatching overhead
52
Flink is good for...
•flexible event time processing
•watermarks
•trigger
•accumulator
•connectivity to the most important peripheral systems
•low latency stream processing
•excellent state handling
53
And finally Kafka Streams, for...
•you want an easy deployment
•you already have a scheduler/micro service platform
•low latency and high throughput
•event time support
•a lightweight start in streaming
•if you already use Kafka
•if you are fine with making Kafka your central backbone
54
Comparison
55
Engine Microbatching Native Nativ
Programmingmodel Declarative Declarative Declarativ
Guarantees Exactly-Once Exactly-Once Exactly-Once
Event time Handling No/Yes* Yes Yes
State Storage Own Pluggable RocksDB + Topic
Community & Ecosystem Big Medium Big
Deployment Cluster Cluster Library
Monitoring
UI, REST API, Dropwizard
Metrics
UI, Metrics (JMX, Ganglia),
Rest API
Kafka Tools, Confluent
Control Center, JMX
A word on
•Apache Beam
•High-level API for different streaming runner, i.e Google Cloud Dataflow, Flink and Spark Streaming
•Google Cloud Data Flow
•Cloud Streaming by Google
•Apex
•YARN based with a static topology which can be changed at runtime
•Flume
•Logfile Shipping, especially into HDFS
•Storm/Heron
•Streaming pioneer by Twitter, Heron as a successor with the same API
56
Take aways
•Streaming is not easy
•(Event) Time
•State
•Deployment
•Correctness
•Different concepts and implementations
•Be aware of
•Monitoring
•„Overkill“
•Ongoing research and development
57
Our mission – to promote agile development, innovation 	
and technology – extends through everything we do.
codecentric AG	
Hochstraße 11	
42697 Solingen
Germany
Address
E-Mail: matthias.niehoff@codecentric.de
Twitter: @matthiasniehoff
www.codecentric.de
Contact Info
Stay connected!
58

Contenu connexe

Tendances

Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern ProgrammingKafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programmingconfluent
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to OneSerg Masyutin
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Spark Summit
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Shirshanka Das
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Guido Schmutz
 
Baymeetup-FlinkResearch
Baymeetup-FlinkResearchBaymeetup-FlinkResearch
Baymeetup-FlinkResearchFoo Sounds
 
Streamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupStreamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupHari Shreedharan
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 

Tendances (20)

Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern ProgrammingKafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
 
Baymeetup-FlinkResearch
Baymeetup-FlinkResearchBaymeetup-FlinkResearch
Baymeetup-FlinkResearch
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
Streamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupStreamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User Group
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 

En vedette

Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Ontico
 
Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)
Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)
Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)Ontico
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Till Rohrmann
 
Aula 5. frameworks mobile
Aula 5. frameworks mobileAula 5. frameworks mobile
Aula 5. frameworks mobileandreluizlc
 
IQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data AnalyticsIQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data AnalyticsInterQuest Group
 
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...Alan McSweeney
 
Place in Space (AKA "How to Design A Concept Model")
Place in Space (AKA "How to Design A Concept Model")Place in Space (AKA "How to Design A Concept Model")
Place in Space (AKA "How to Design A Concept Model")Stephen Anderson
 
Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...
Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...
Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...Dion Hinchcliffe
 
Aula 3. frameworks front end
Aula 3. frameworks front endAula 3. frameworks front end
Aula 3. frameworks front endandreluizlc
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Leo Shuster
 
Aula 6. trabalho da disciplina
Aula 6. trabalho da disciplinaAula 6. trabalho da disciplina
Aula 6. trabalho da disciplinaandreluizlc
 
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...Denodo
 
Net Promoter Score Pitfalls to Avoid
Net Promoter Score Pitfalls to AvoidNet Promoter Score Pitfalls to Avoid
Net Promoter Score Pitfalls to AvoidAureus Analytics
 

En vedette (20)

Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
 
Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)
Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)
Реактивные микросервисы с Apache Kafka / Денис Иванов (2ГИС)
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
TOGAF Vs E-Tom
TOGAF Vs E-TomTOGAF Vs E-Tom
TOGAF Vs E-Tom
 
Aula 5. frameworks mobile
Aula 5. frameworks mobileAula 5. frameworks mobile
Aula 5. frameworks mobile
 
IQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data AnalyticsIQ Crash Course - Big Data Analytics
IQ Crash Course - Big Data Analytics
 
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
Integrating It Frameworks, Methodologies And Best Practices Into It Delivery ...
 
Place in Space (AKA "How to Design A Concept Model")
Place in Space (AKA "How to Design A Concept Model")Place in Space (AKA "How to Design A Concept Model")
Place in Space (AKA "How to Design A Concept Model")
 
Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...
Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...
Social Business: Frameworks for Next-Gen Organizational Structure | Enterpris...
 
Aula 3. frameworks front end
Aula 3. frameworks front endAula 3. frameworks front end
Aula 3. frameworks front end
 
Togaf 9 template solution concept diagram
Togaf 9 template   solution concept diagramTogaf 9 template   solution concept diagram
Togaf 9 template solution concept diagram
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Introduction to Enterprise Architecture
Introduction to Enterprise Architecture
 
MEC Concept Plans and Diagrams
MEC Concept Plans and DiagramsMEC Concept Plans and Diagrams
MEC Concept Plans and Diagrams
 
Aula 6. trabalho da disciplina
Aula 6. trabalho da disciplinaAula 6. trabalho da disciplina
Aula 6. trabalho da disciplina
 
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...
 
TOGAF ADM cycle
TOGAF ADM cycleTOGAF ADM cycle
TOGAF ADM cycle
 
TOGAF in 8 Steps
TOGAF in 8 StepsTOGAF in 8 Steps
TOGAF in 8 Steps
 
Net Promoter Score Pitfalls to Avoid
Net Promoter Score Pitfalls to AvoidNet Promoter Score Pitfalls to Avoid
Net Promoter Score Pitfalls to Avoid
 

Similaire à Data Stream Processing - Concepts and Frameworks

Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analyticsamesar0
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scaleDataScienceConferenc1
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexThomas Weise
 

Similaire à Data Stream Processing - Concepts and Frameworks (20)

Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 

Dernier

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 

Dernier (20)

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 

Data Stream Processing - Concepts and Frameworks

  • 1. Data Stream Processing – Concepts and Frameworks Matthias Niehoff 1
  • 2. AGENDA 2 Typical Problems Basic Ideas Streaming Frameworks Current Innovations Recommendations
  • 3. 3 Basic Ideas Data Stream Processing – Why and what is it?
  • 4. Batch Layer Speed Layer Current Situation of Dealing with (Big) Data 4
  • 5. IoT Sensor Data Industrial Machines, Consumer Electronic, Agriculture Click Streams Online Shops, Self Service Portals, Comparison Portals Monitoring System Health, Traffic between Systems, Resource Utilization Online Gaming Gamer Interactions, Reward Systems, Custom Content & Experiences Automotive Industry Vehicle Tracking, Predictive Maintenance , Routing Information Financial Transactions Fraud Detection, Trade Monitoring and Management Sources for streaming data can not only be found in the frequently mentioned IoT area. In many other industries incur streaming data. Strictly speaking, any data can be viewed as a stream. Some of the most popular use cases and examples are: 5 Sources for Streaming Data
  • 10. First step – Microbatching 10 Source Processing Sink Microbatches
  • 12. 12 Typical Problems and the way frameworks tackle them
  • 15. Event time vs processing time 15 event processing 1 2 3 4 5 6 7 8 9t in minutes
  • 16. Windowing - Slicing data into chunks 16 Tumbling Window Sliding Window Session Window Time Trigger Count Trigger Content Trigger
  • 17. Tumbling & Sliding Windows 17 4 5 3 6 1 5 9 2 8 6 7 2 4 5 3 6 1 5 9 2 8 6 7 2 18 17 23 tumbling windows sum 4 5 3 6 1 5 9 2 8 6 7 2 18 17 23sum 4 5 3 6 1 5 9 2 8 6 7 2 15 25 sliding windows
  • 18. Session Window 18 time user 1 user 2 ? logout delayed event
  • 19. Session Window 19 time user 1 user 2 logout delayed event
  • 20. The Dataflow Model: 
 A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing 20 [...] stop trying to groom unbounded datasets into finite pools of information that eventually become complete, and instead live and breathe under the assumption that we will never know if or when we have seen all of our data, only that new data will arrive, old data may be retracted [...] http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
  • 21. •Part 1 of „When will the result be calculated?“ •Watermark of all received data •A watermark of 10:00 means „It is assumed that all data until 10:00 now arrived“ •fix watermark •heuristic watermark •A window will be materialized/processed when watermark equals end of window 21 Watermarks
  • 23. Trigger 23 ContentEvent Time Processing Time Count Composite •Part 2 of „When will the result be calculated?“ •Triggers an (additional) materialization of the window •Example •every 10 minutes (in processing time) •& when the watermark reached the end of the window •& with each delayed event •but only for additional 15 minutes in processing time (allowed lateness)
  • 24. Accumulators Joining the individual (triggered) results •every result on its own (discarding) •Results based on each other (accumulating) •Results based on each other & correction of the old result (accumulating & retracting) 24
  • 25. Accumulators 25 Discarding Accumulating Accumulating & Retracting (5,2) 7 7 7 (8,3) 11 18 18, -7 (4) 4 22 22, -18 Last value 4 22 22 Total sum 22 47 22 5 2 | 8 3 | 4
  • 26. Watermarks, Trigger, Accumulators vgl. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 26 input .apply(Window.into(FixedWindows.of(Duration.standardMinutes(60))) .triggering( AtWatermark() .withEarlyFirings(AtPeriod(Duration.standardMinutes(10))) .withLateFirings(AtCount(1)))) .withAllowedLateness(Duration.standardMinutes(15)))
 .discardingFiredPanes()) .apply(Sum.integersPerKey());
  • 27. Stream ~ Table Model •Aggregating a stream over time yields a table •Changes to a table over time yields a stream •Table will be updated by every entry in the stream •Every new entry triggers a computation •Retention period for late events (c.f. allowed lateness) •Stream/Table ⊆ Dataflow 27 (key1, value1) key1 value1 1 key1 value3 2 key2 value2 1 key1 value1 1 key2 value2 1 (key2, value2) (key1, value3) key value update count
  • 29. State & Window Processing •Non trivial applications mostly need some kind of (temporal) persistent state •i.e aggregations over a longer time, counter, slowly refreshing metadata •held in memory, can be stored on disk •interesting: partitioning, rescaling, node failure? 29 state operation
  • 30. State Implementations •State is most of the time partitioned •Distributed over multiple nodes •Number of nodes might change •State must be fault-tolerant •State access must be fast •Storage backend •native/own-build: i.e. in Spark Streaming •existing tools: RocksDB in Kafka Streams •pluggable: Flink, amongst others also RocksDB •Carbone et. al. (2017), State Management in Flink, 
 http://www.vldb.org/pvldb/vol10/p1718-carbone.pdf 30
  • 33. Lookup - Remote Read 33 Queue Metadata Node 2 Node 1 cc cc cc cc cc cc cc cc
  • 34. Lookup - Local Read 34 Queue Metadata Node 2 Node 1 cc cc cc cc cccc cc cc
  • 36. Runtime Environment - Cluster vs. Library 36 YARN
  • 39. Guarantees 39 at-most-once at-least-once exactly-once Record Acknowledgement Micro Batching Snapshots/Checkpoints Changelogs
  • 41. 41 Streaming Frameworks Helping you implement your solution
  • 42. Tyler Akidau “ ... an execution engine designed for unbounded data sets, and nothing more” 42 T. Akidau et. al (2015): The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
  • 43. Apache Spark •Open Source (2010) & Apache project (2013) •Unified Batch & Stream Processing •Wide distribution, especially in Hadoop environments •Batch: RDD as base, DataFrames and DataSets as optimization •Streaming: DStream & Structured Streaming 43
  • 44. Apache Spark Streaming •Microbatching •Similiar, partly unified, programming model as with batch processing •State and window operations •Missing support for event time 44
  • 45. Apache Spark Structured Streaming •DataSets/DataFrames for streaming processing •DataStream as an ever-growing table •Unified API •Limited support for event time operations 45 val ds = sparkSession .read .json("someFile.json") ds .write .json("otherFile.json") val ds = sparkSession .readStream .format("kafka") .option("...","...") .load ds .writeStream .outputMode("complete") .format("console") .start()
  • 46. Apache Flink •Started as research project in 2010 (Stratosphere), Apache project since 2014 •Low latency streaming and high throughput batch processing •Streaming first •Flexible state and window handling •Rich support for event time handling 46
  • 47. Apache Kafka Streams API •Only a library, no runtime environment •Requires Kafka cluster ( >= 0.10) •Uses Kafka consumer technologies for •Ordering •Partitioning •Scaling •Source & sink: Kafka topics only •Kafka Connect for sources & sinks 47
  • 48. 48 Current developments The latest promises and features
  • 49. Queryable State •Known as •Queryable state (Flink) •Interactive Queries (Kafka Streams) •Still low level •Data lifecycle •(De)Serialization •Partitioned state discovery 49 state operation query interface
  • 50. Streaming SQL •Use SQL to query Streaming Data •time varying relations i.e. [12:00, 12:00) •query on multiple points in time •Standard ANSI SQL + some extensions •SELECT TABLE, SELECT STREAM •WINDOWS •TRIGGERS •Supported by •Flink •Kafka Streams (KSQL) •https://s.apache.org/streaming-sql-strata-nyc 50 CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  • 51. 51 Recommendations Or at least some hints when choosing a framework
  • 52. Spark Streaming might be an option if •Spark is already used for batch processing •Hadoop, and therefore YARN, is used •A huge community is important •Scala is not a problem •Latency is not an important criteria * •Event time handling is not needed * •* those change in Structured Streaming •event time support •reduce microbatching overhead 52
  • 53. Flink is good for... •flexible event time processing •watermarks •trigger •accumulator •connectivity to the most important peripheral systems •low latency stream processing •excellent state handling 53
  • 54. And finally Kafka Streams, for... •you want an easy deployment •you already have a scheduler/micro service platform •low latency and high throughput •event time support •a lightweight start in streaming •if you already use Kafka •if you are fine with making Kafka your central backbone 54
  • 55. Comparison 55 Engine Microbatching Native Nativ Programmingmodel Declarative Declarative Declarativ Guarantees Exactly-Once Exactly-Once Exactly-Once Event time Handling No/Yes* Yes Yes State Storage Own Pluggable RocksDB + Topic Community & Ecosystem Big Medium Big Deployment Cluster Cluster Library Monitoring UI, REST API, Dropwizard Metrics UI, Metrics (JMX, Ganglia), Rest API Kafka Tools, Confluent Control Center, JMX
  • 56. A word on •Apache Beam •High-level API for different streaming runner, i.e Google Cloud Dataflow, Flink and Spark Streaming •Google Cloud Data Flow •Cloud Streaming by Google •Apex •YARN based with a static topology which can be changed at runtime •Flume •Logfile Shipping, especially into HDFS •Storm/Heron •Streaming pioneer by Twitter, Heron as a successor with the same API 56
  • 57. Take aways •Streaming is not easy •(Event) Time •State •Deployment •Correctness •Different concepts and implementations •Be aware of •Monitoring •„Overkill“ •Ongoing research and development 57
  • 58. Our mission – to promote agile development, innovation and technology – extends through everything we do. codecentric AG Hochstraße 11 42697 Solingen Germany Address E-Mail: matthias.niehoff@codecentric.de Twitter: @matthiasniehoff www.codecentric.de Contact Info Stay connected! 58