SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Streaming Engines for Big Data
Spark Streaming: a case study
Stavros Kontopoulos
Senior Software Engineer @ Lightbend, M.Sc.
21st October 2016, Thessaloniki
#VoxxedDaysThessaloniki
2
Who Am I?
Fast Data Team Engineer @ Lightbend
OSS contributor (Apache Spark on Mesos)
https://github.com/skonto
#VoxxedDaysThessaloniki
3
● A bit of history...
● Streaming Engines for Big Data
○ Key concepts - Design Considerations
○ Modern analysis of infinite streams
○ Streaming Engines Examples
○ Which one to use?
● Spark Streaming A Case Study
○ DStream API
○ Structured Streaming
#VoxxedDaysThessaloniki
Who likes history?
#VoxxedDaysThessaloniki
4
Why Streaming?
5
#VoxxedDaysThessaloniki
Big Data - The story
● One decade ago people started looking to the problem of how to process
massive data sets (Velocity, Variety, Volume).
● The Apache Hadoop project appeared at that time and became the golden
solution for batch processing running on commodity hardware. Later became
an ecosystem of several other projects: Pig, Hive, HBase etc.
present
GFS paper
2003
Mapreduce
Paper
2004
Hadoop
project, 0.1.0
release
2006 2009
Hadoop sorts
1 Petabyte
Spark on Yarn
by Clouder,
Yarn in
production
2010
Hadoop 2.4,
2.5, 2.6
releases
2014
HBase, Pig,
Hive graduate
2013 2015
Hadoop 2.7
release
#VoxxedDaysThessaloniki
6
Big Data - The story
X
Y
Z
MAP
MAP
SHUFFLE
MAP
MAP-REDUCE
A
B
A
REDUCE
REDUCE
Q
W
#VoxxedDaysThessaloniki
7
Big Data - The story
Hadoop pros/cons
● Batch jobs usually take hours if not days to complete, in many applications
that is not acceptable anymore.
● Traditionally focus is on throughput than latency. Frameworks like Hadoop
were designed with that in mind.
● Accuracy is the best you can get.
#VoxxedDaysThessaloniki
8
Big Data - The story
● Giuseppe DeCandia et al., ”Dynamo: amazon's highly available key-value
store.” changed the DataBase world in 2007.
● NoSQL Databases along with general system like Hadoop solve problems
cannot be solved with traditional RDBMs.
● Technology facts: Cheap memory, SSDs, HDDs are the new tape, more cpus
over more powerful cpus.
#VoxxedDaysThessaloniki
9
Big Data - The story
● Disruptive companies need to utilize ML and latest information to come up
with smart decisions sooner.
● And so we need streaming in the enterprise… We no longer talk about Big
Data only, its Fast Data first.
Searching Recommendations Real-time financial activities
Fraud Detection
#VoxxedDaysThessaloniki
10
Big Data - The story
OpsClarity Report Summary:
● 92% plan to increase their investment in stream processing applications in the
next year
● 79% plan to reduce or eliminate investment in batch processing
● 32% use real time analysis to power core customer-facing applications
● 44% agreed that it is tedious to correlate issues across the pipeline
● 68% identified lack of experience and underlying complexity of new data
frameworks as their barrier to adoption
http://info.opsclarity.com/2016-fast-data-streaming-applications-report.html
#VoxxedDaysThessaloniki
11
#VoxxedDaysThessaloniki
12
Key Concepts
Streams
● A Stream is flow of data. The flow consists of ephemeral data elements
flowing from a source to a sink.
● Streams become useful when a set of operations/transformations are applied
on them.
● Can be infinite or finite in size. This translates to the notions of bounded/
unbounded data.
#VoxxedDaysThessaloniki
13
Stream Processing
Stream Processing: processing done on an (un)bounded data stream. Not all
data are available.
Source Sink
Processing
#VoxxedDaysThessaloniki
14
Stream Processing
Multiple StreamsSource
1
Sink
Processing
Source
2
#VoxxedDaysThessaloniki
15
Stream Processing
Processing can be…
● Stream management: connect, iterate...
● Data manipulation: map, flatmap…
● Input/Output
Graph as the abstraction for defining how all the pieces are put together and how
data flows between them. Some systems use a DAG.
16
#VoxxedDaysThessaloniki
Map Reduce
Count
Distinct DFS
DB
DFS
Stream Processing - Parallelism
Source Sink
#VoxxedDaysThessaloniki
map
map
17
partitioner
Stream Processing - Execution Model
Map your graph to an execution plan and run it.
Execution Model Abstractions: Job, Task etc.
Actors: JobManager, TaskManager.
Where TaskManager and Tasks run? Threads, nodes etc…
Important: code runs close to the data… Serialize and send over the network the
task code along with any dependencies, communicate back the results to the
application...
18
#VoxxedDaysThessaloniki
Stream vs Batch Processing
Batch processing is processing done on finite data set with all data available.
Two types of engines: batch and streaming engines which can actually be used
for both types of processing!
19
#VoxxedDaysThessaloniki
Streaming Applications
User code that materializes streams and applies stream processing.
...
...
20
#VoxxedDaysThessaloniki
Streaming Engines for Big Data
Streaming Engines allows to building streaming applications:
Streaming Engines for Big data provide in addition:
● A rich ecosystem built around them for example connectors for common
sources, outputs to different sinks etc.
● Fault tolerance, scalability (cluster management support), management of
strugglers
● ML, Graph, CEP, processing capabilities
+ API Streaming App
21
#VoxxedDaysThessaloniki
Streaming Engines for Big Data
A big data system at minimum needs:
● A data processing framework eg. a streaming engine.
● A Distributed File System.
22
#VoxxedDaysThessaloniki
23
Designing A Streaming Engine
Design Considerations of A Streaming Engine
● Strong consistency. If a machine fails how my results are
affected?
○ Exactly once processing.
○ Checkpointing
● Appropriate semantics for integrating time. Late data?
● API (Language Support, DAG, SQL Support etc)
24
#VoxxedDaysThessaloniki
Design Considerations of A Streaming Engine
● Execution Model - integration with cluster manager(s)
● Elasticity - Dynamic allocation
● Performance: Throughput vs Latency
● Libraries for CEP, Graph, ML, SQL based processing
25
#VoxxedDaysThessaloniki
Design Considerations of A Streaming Engine
● Deployment modes: local vs cluster mode
● Streaming vs Batch mode, Code looks the same?
● Logging
● Local state management
● Support for session state
26
#VoxxedDaysThessaloniki
Design Considerations of A Streaming Engine
● Backpressure
● Off Heap Management
● Caching
● Security
● UI
● CLI env for interactive sessions
27
#VoxxedDaysThessaloniki
28
State of the Art Stream Analysis
Analyzing Infinite Data Streams
● Recent advances in Streaming are a result of the pioneer work:
○ MillWheel: Fault-Tolerant Stream Processing at Internet Scale, VLDB
2013.
○ The Dataflow Model: A Practical Approach to Balancing Correctness,
Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data
Processing, Proceedings of the VLDB Endowment, vol. 8 (2015), pp.
1792-1803
29
#VoxxedDaysThessaloniki
Analyzing Infinite Data Streams
● Two cases for processing:
○ Single event processing: event transformation, trigger an alarm on an error event
○ Event aggregations: summary statistics, group-by, join and similar queries. For example
compute the average temperature for the last 5 minutes from a sensor data stream.
30
#VoxxedDaysThessaloniki
Analyzing Infinite Data Streams
● Event aggregation introduces the concept of windowing wrt the notion of time
selected:
○ Event time (the time that events happen): Important for most use cases where context and
correctness matter at the same time. Example: billing applications, anomaly detection.
○ Processing time (the time they are observed during processing): Use cases where I only care
about what I process in a window. Example: accumulated clicks on a page per second.
○ System Arrival or Ingestion time (the time that events arrived at the streaming system).
● Ideally event time = Processing time. Reality is: there is skew.
31
#VoxxedDaysThessaloniki
Time in Modern Data Stream Analysis
Windows come in different flavors:
● Tumbling windows discretize a stream into non-overlapping windows.
○ Eg. report all distinct users every 10 seconds
● Sliding Windows: slide over the stream of data.
○ Eg. report all distinct users for the last 10 minutes every 1 minute.
32
#VoxxedDaysThessaloniki
Analyzing Infinite Data Streams
● Watermarks: indicates that no elements with a timestamp older or equal to
the watermark timestamp should arrive for the specific window of data.
○ Allows us to mark late data. Late data can either be added to the window or discarded.
● Triggers: decide when the window is evaluated or purged.
○ Allows complex logic for window processing
33
#VoxxedDaysThessaloniki
Analyzing Infinite Data Streams
● Apache Beam is the open source successor of Google’s DataFlow
● It is becoming the standard api streaming. Provides the advanced semantics
needed for the current needs in streaming applications.
34
#VoxxedDaysThessaloniki
Streaming Engines for Big Data
OSS
● Apache Flink
● Apache Spark Streaming
● Apache Storm
● Apache Samza
● Apache Apex
● Apache Kafka Streams (Confluent Platform)
● Akka Streams/Gearpump
● Apache Beam
Cloud:
● Amazon Kinesis
● Google Dataflow 35
#VoxxedDaysThessaloniki
Streaming Engines for Big Data - Pick one
Many criteria: use case at hand, existing infrastructure, performance, customer
support, cloud vendor, features
Recommend to first to look at:
● Apache Flink for low latency and advanced semantics
● Apache Spark for its maturity and rich set of functionality: ML, SQL, GraphX
● Apache Kafka Streams for simple data transformations from and back to
Kafka topics
36
#VoxxedDaysThessaloniki
37
Apache Spark 2.0
Spark in a Nutshell
Apache Spark: A memory optimized distributed computing framework.
Supports caching of data in memory for speeding computations.
38
#VoxxedDaysThessaloniki
Spark in a Nutshell - RDDs
Represents a bounded dataset as an RDD (Resilient Distributed Dataset).
An RDD can be seen as an immutable distributed collection.
Two types of operations can be applied on an RDD: transformations like map
and actions like collect.
Transformations are lazy while actions trigger computation on the cluster.
Operations like groupBy cause shuffle of data across the network.
39
#VoxxedDaysThessaloniki
Spark in a Nutshell - Deployment Mode
40
#VoxxedDaysThessaloniki
Spark in a Nutshell - Basic Components
41
#VoxxedDaysThessaloniki
42
#VoxxedDaysThessaloniki
Spark Batch Sample
Word Count
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
Spark in a nutshell - Key Features
Dynamic Allocation
Memory management (Project Tungsten + off heap operations)
Cluster managers: Yarn, StandAlone, Mesos
Scala, Python, Java, R
Micro-batch engine
SQL API, ML library, GraphX
Monitoring UI
43
#VoxxedDaysThessaloniki
Spark Streaming
Two flavors of Streaming:
● DStream API Spark 1.X -> mature API
● Structured Streaming (Alpha), Spark 2.0 -> Don’t go to production yet
“Based on Spark SQL. User does not need to
reason about streaming end to end”
44
#VoxxedDaysThessaloniki
Spark Streaming DStream API
Discretizes the stream based on batchDuration (batch interval) which is configured
once.
Provides exactly one semantics with KafkaDirect for DStream or with WAL
enabled for reliable receivers/drivers plus checkpointing for driver context
recovery.
Many transformations and actions you get on a RDD you can get them on
DStream as well.
45
#VoxxedDaysThessaloniki
Spark Structured Streaming
● Integrates with DF and Dataset API (Spark SQL) for structured queries
● Allows for end-to-end exactly once for specific sources/sinks (HDFS/S3)
○ Requires replayable sources and idempotent sinks
● Input is sent to a query and output of the query is written to a sink.
Two types of output implemented:
● Complete Mode - The entire updated Result Table will be written to the external storage. It is up to the storage connector to
decide how to handle writing of the entire table.
● Append Mode - Only the new rows appended in the Result Table since the last trigger will be written to the external storage.
This is applicable only on the queries where existing rows in the Result Table are not expected to change.
46
#VoxxedDaysThessaloniki
Spark Structured Streaming - Not Yet Implemented
● More Sources/Sinks
● Watermarks
● Late data management
● State Sessions
47
#VoxxedDaysThessaloniki
48
#VoxxedDaysThessaloniki
DStream API Example
reportMax
rdd.map(data => data.toInt).max()
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
49
#VoxxedDaysThessaloniki
reportMax
rdd.map(data => data.toInt).max()
DStream API Example
CheckPointing
get or create the streaming context
All streaming
code goes
here
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
50
Spark SQL - Batch
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
51
Structured Streaming
mean code same as batch
readStream instead of read
writeStream instead of write
Session creation is the
same as with batch case
https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
Thank You!
Questions?
#VoxxedDaysThessaloniki
References
1. http://data-artisans.com/batch-is-a-special-case-of-streaming/
2. http://www.slideshare.net/rolandkuhn/reactive-streams
3. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
4. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
5. http://www.slideshare.net/FlinkForward/flink-case-study-capital-one
6. http://flink.apache.org/poweredby.html
7. https://en.wikipedia.org/wiki/Apache_Hadoop
8. http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
9. http://data-artisans.com/batch-is-a-special-case-of-streaming/
10. https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-
streaming.html
11. Ellen Friedman & Kostas Tzoumas, Introduction to Apache Flink, Oreilly 2016
12. http://spark.apache.org/docs/latest/sql-programming-guide.html
13. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
53
#VoxxedDaysThessaloniki

Contenu connexe

Tendances

A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven productsLars Albertsson
 
An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform   An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform Sriskandarajah Suhothayan
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observabilityOVHcloud
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015Institute e-Austria Timisoara
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence GeneratorRim Moussa
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
ironSource Atom BigData Berlin
ironSource Atom BigData BerlinironSource Atom BigData Berlin
ironSource Atom BigData BerlinShimon Tolts
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practiceLars Albertsson
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big dataLars Albertsson
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuCitus Data
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud Altocloud
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & MarquezJulien Le Dem
 

Tendances (20)

A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform   An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observability
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence Generator
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Asd 2015
Asd 2015Asd 2015
Asd 2015
 
ironSource Atom BigData Berlin
ironSource Atom BigData BerlinironSource Atom BigData Berlin
ironSource Atom BigData Berlin
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
 
parallel OLAP
parallel OLAPparallel OLAP
parallel OLAP
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
 

En vedette

Apache Spark Use case for Education Industry
Apache Spark Use case for Education IndustryApache Spark Use case for Education Industry
Apache Spark Use case for Education IndustryVinayak Agrawal
 
Cancer Outlier Pro file Analysis using Apache Spark
Cancer Outlier Profile Analysis using Apache SparkCancer Outlier Profile Analysis using Apache Spark
Cancer Outlier Pro file Analysis using Apache SparkMahmoud Parsian
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache SparkOren Raboy
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionCloudera, Inc.
 
Kodu Game Lab e Project Spark
Kodu Game Lab e Project SparkKodu Game Lab e Project Spark
Kodu Game Lab e Project SparkFabrício Catae
 
Fighting Fraud with Apache Spark
Fighting Fraud with Apache SparkFighting Fraud with Apache Spark
Fighting Fraud with Apache SparkMiklos Christine
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015Modern Data Stack France
 
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Databricks
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in PracticeC4Media
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Real Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and SparkReal Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and SparkQAware GmbH
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidTony Ng
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Treasure Data, Inc.
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientistsJenn Rawlins
 
Wrangling Big Data in a Small Tech Ecosystem
Wrangling Big Data in a Small Tech EcosystemWrangling Big Data in a Small Tech Ecosystem
Wrangling Big Data in a Small Tech EcosystemShalin Hai-Jew
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalizationShriya Arora
 
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache KafkaKafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache KafkaEno Thereska
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 

En vedette (20)

Apache Spark Use case for Education Industry
Apache Spark Use case for Education IndustryApache Spark Use case for Education Industry
Apache Spark Use case for Education Industry
 
Cancer Outlier Pro file Analysis using Apache Spark
Cancer Outlier Profile Analysis using Apache SparkCancer Outlier Profile Analysis using Apache Spark
Cancer Outlier Pro file Analysis using Apache Spark
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache Spark
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Kodu Game Lab e Project Spark
Kodu Game Lab e Project SparkKodu Game Lab e Project Spark
Kodu Game Lab e Project Spark
 
Fighting Fraud with Apache Spark
Fighting Fraud with Apache SparkFighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Real Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and SparkReal Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and Spark
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Wrangling Big Data in a Small Tech Ecosystem
Wrangling Big Data in a Small Tech EcosystemWrangling Big Data in a Small Tech Ecosystem
Wrangling Big Data in a Small Tech Ecosystem
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
 
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache KafkaKafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache Kafka
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 

Similaire à Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data

Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
 
Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данныхCEE-SEC(R)
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Objectivity
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranJoseph Glorieux
 

Similaire à Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data (20)

Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данных
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écran
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 

Plus de Voxxed Days Thessaloniki

Voxxed Days Thesaloniki 2016 - The Long Road
Voxxed Days Thesaloniki 2016 - The Long RoadVoxxed Days Thesaloniki 2016 - The Long Road
Voxxed Days Thesaloniki 2016 - The Long RoadVoxxed Days Thessaloniki
 
Voxxed Days Thesaloniki 2016 - Scaling react.js applications
Voxxed Days Thesaloniki 2016 - Scaling react.js applicationsVoxxed Days Thesaloniki 2016 - Scaling react.js applications
Voxxed Days Thesaloniki 2016 - Scaling react.js applicationsVoxxed Days Thessaloniki
 
Voxxed Days Thesaloniki 2016 - Herding cats to a firefight
Voxxed Days Thesaloniki 2016 - Herding cats to a firefightVoxxed Days Thesaloniki 2016 - Herding cats to a firefight
Voxxed Days Thesaloniki 2016 - Herding cats to a firefightVoxxed Days Thessaloniki
 
Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...
Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...
Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...Voxxed Days Thessaloniki
 
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki
 
Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...
Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...
Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...Voxxed Days Thessaloniki
 
Voxxed Days Thessaloniki 2016 - Microservices in production
Voxxed Days Thessaloniki 2016 - Microservices in productionVoxxed Days Thessaloniki 2016 - Microservices in production
Voxxed Days Thessaloniki 2016 - Microservices in productionVoxxed Days Thessaloniki
 
Voxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 spec
Voxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 specVoxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 spec
Voxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 specVoxxed Days Thessaloniki
 
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thessaloniki
 
Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...
Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...
Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...Voxxed Days Thessaloniki
 
Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...
Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...
Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...Voxxed Days Thessaloniki
 
Voxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on Azure
Voxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on AzureVoxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on Azure
Voxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on AzureVoxxed Days Thessaloniki
 
Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...
Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...
Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...Voxxed Days Thessaloniki
 

Plus de Voxxed Days Thessaloniki (13)

Voxxed Days Thesaloniki 2016 - The Long Road
Voxxed Days Thesaloniki 2016 - The Long RoadVoxxed Days Thesaloniki 2016 - The Long Road
Voxxed Days Thesaloniki 2016 - The Long Road
 
Voxxed Days Thesaloniki 2016 - Scaling react.js applications
Voxxed Days Thesaloniki 2016 - Scaling react.js applicationsVoxxed Days Thesaloniki 2016 - Scaling react.js applications
Voxxed Days Thesaloniki 2016 - Scaling react.js applications
 
Voxxed Days Thesaloniki 2016 - Herding cats to a firefight
Voxxed Days Thesaloniki 2016 - Herding cats to a firefightVoxxed Days Thesaloniki 2016 - Herding cats to a firefight
Voxxed Days Thesaloniki 2016 - Herding cats to a firefight
 
Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...
Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...
Voxxed Days Thessaloniki 2016 - Web assembly : the browser vm we were waiting...
 
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
 
Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...
Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...
Voxxed Days Thesaloniki 2016 - Rightsize Your Services with WildFly & WildFly...
 
Voxxed Days Thessaloniki 2016 - Microservices in production
Voxxed Days Thessaloniki 2016 - Microservices in productionVoxxed Days Thessaloniki 2016 - Microservices in production
Voxxed Days Thessaloniki 2016 - Microservices in production
 
Voxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 spec
Voxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 specVoxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 spec
Voxxed Days Thesaloniki 2016 - Whirlwind tour through the HTTP2 spec
 
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
 
Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...
Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...
Voxxed Days Thessaloniki 2016 - Continuous Delivery: Jenkins, Docker and Spri...
 
Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...
Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...
Voxxed Days Thesaloniki 2016 - 5 must have patterns for your web-scale micros...
 
Voxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on Azure
Voxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on AzureVoxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on Azure
Voxxed Days Thesaloniki 2016 - A journey to Open Source Technologies on Azure
 
Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...
Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...
Voxxed Days Thessaloniki 2016 - Keynote - JDK 9 : Big Changes To Make Java Sm...
 

Dernier

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Dernier (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data

  • 1. Streaming Engines for Big Data Spark Streaming: a case study Stavros Kontopoulos Senior Software Engineer @ Lightbend, M.Sc. 21st October 2016, Thessaloniki #VoxxedDaysThessaloniki
  • 2. 2 Who Am I? Fast Data Team Engineer @ Lightbend OSS contributor (Apache Spark on Mesos) https://github.com/skonto #VoxxedDaysThessaloniki
  • 3. 3 ● A bit of history... ● Streaming Engines for Big Data ○ Key concepts - Design Considerations ○ Modern analysis of infinite streams ○ Streaming Engines Examples ○ Which one to use? ● Spark Streaming A Case Study ○ DStream API ○ Structured Streaming #VoxxedDaysThessaloniki
  • 6. Big Data - The story ● One decade ago people started looking to the problem of how to process massive data sets (Velocity, Variety, Volume). ● The Apache Hadoop project appeared at that time and became the golden solution for batch processing running on commodity hardware. Later became an ecosystem of several other projects: Pig, Hive, HBase etc. present GFS paper 2003 Mapreduce Paper 2004 Hadoop project, 0.1.0 release 2006 2009 Hadoop sorts 1 Petabyte Spark on Yarn by Clouder, Yarn in production 2010 Hadoop 2.4, 2.5, 2.6 releases 2014 HBase, Pig, Hive graduate 2013 2015 Hadoop 2.7 release #VoxxedDaysThessaloniki 6
  • 7. Big Data - The story X Y Z MAP MAP SHUFFLE MAP MAP-REDUCE A B A REDUCE REDUCE Q W #VoxxedDaysThessaloniki 7
  • 8. Big Data - The story Hadoop pros/cons ● Batch jobs usually take hours if not days to complete, in many applications that is not acceptable anymore. ● Traditionally focus is on throughput than latency. Frameworks like Hadoop were designed with that in mind. ● Accuracy is the best you can get. #VoxxedDaysThessaloniki 8
  • 9. Big Data - The story ● Giuseppe DeCandia et al., ”Dynamo: amazon's highly available key-value store.” changed the DataBase world in 2007. ● NoSQL Databases along with general system like Hadoop solve problems cannot be solved with traditional RDBMs. ● Technology facts: Cheap memory, SSDs, HDDs are the new tape, more cpus over more powerful cpus. #VoxxedDaysThessaloniki 9
  • 10. Big Data - The story ● Disruptive companies need to utilize ML and latest information to come up with smart decisions sooner. ● And so we need streaming in the enterprise… We no longer talk about Big Data only, its Fast Data first. Searching Recommendations Real-time financial activities Fraud Detection #VoxxedDaysThessaloniki 10
  • 11. Big Data - The story OpsClarity Report Summary: ● 92% plan to increase their investment in stream processing applications in the next year ● 79% plan to reduce or eliminate investment in batch processing ● 32% use real time analysis to power core customer-facing applications ● 44% agreed that it is tedious to correlate issues across the pipeline ● 68% identified lack of experience and underlying complexity of new data frameworks as their barrier to adoption http://info.opsclarity.com/2016-fast-data-streaming-applications-report.html #VoxxedDaysThessaloniki 11
  • 13. Streams ● A Stream is flow of data. The flow consists of ephemeral data elements flowing from a source to a sink. ● Streams become useful when a set of operations/transformations are applied on them. ● Can be infinite or finite in size. This translates to the notions of bounded/ unbounded data. #VoxxedDaysThessaloniki 13
  • 14. Stream Processing Stream Processing: processing done on an (un)bounded data stream. Not all data are available. Source Sink Processing #VoxxedDaysThessaloniki 14
  • 16. Stream Processing Processing can be… ● Stream management: connect, iterate... ● Data manipulation: map, flatmap… ● Input/Output Graph as the abstraction for defining how all the pieces are put together and how data flows between them. Some systems use a DAG. 16 #VoxxedDaysThessaloniki Map Reduce Count Distinct DFS DB DFS
  • 17. Stream Processing - Parallelism Source Sink #VoxxedDaysThessaloniki map map 17 partitioner
  • 18. Stream Processing - Execution Model Map your graph to an execution plan and run it. Execution Model Abstractions: Job, Task etc. Actors: JobManager, TaskManager. Where TaskManager and Tasks run? Threads, nodes etc… Important: code runs close to the data… Serialize and send over the network the task code along with any dependencies, communicate back the results to the application... 18 #VoxxedDaysThessaloniki
  • 19. Stream vs Batch Processing Batch processing is processing done on finite data set with all data available. Two types of engines: batch and streaming engines which can actually be used for both types of processing! 19 #VoxxedDaysThessaloniki
  • 20. Streaming Applications User code that materializes streams and applies stream processing. ... ... 20 #VoxxedDaysThessaloniki
  • 21. Streaming Engines for Big Data Streaming Engines allows to building streaming applications: Streaming Engines for Big data provide in addition: ● A rich ecosystem built around them for example connectors for common sources, outputs to different sinks etc. ● Fault tolerance, scalability (cluster management support), management of strugglers ● ML, Graph, CEP, processing capabilities + API Streaming App 21 #VoxxedDaysThessaloniki
  • 22. Streaming Engines for Big Data A big data system at minimum needs: ● A data processing framework eg. a streaming engine. ● A Distributed File System. 22 #VoxxedDaysThessaloniki
  • 24. Design Considerations of A Streaming Engine ● Strong consistency. If a machine fails how my results are affected? ○ Exactly once processing. ○ Checkpointing ● Appropriate semantics for integrating time. Late data? ● API (Language Support, DAG, SQL Support etc) 24 #VoxxedDaysThessaloniki
  • 25. Design Considerations of A Streaming Engine ● Execution Model - integration with cluster manager(s) ● Elasticity - Dynamic allocation ● Performance: Throughput vs Latency ● Libraries for CEP, Graph, ML, SQL based processing 25 #VoxxedDaysThessaloniki
  • 26. Design Considerations of A Streaming Engine ● Deployment modes: local vs cluster mode ● Streaming vs Batch mode, Code looks the same? ● Logging ● Local state management ● Support for session state 26 #VoxxedDaysThessaloniki
  • 27. Design Considerations of A Streaming Engine ● Backpressure ● Off Heap Management ● Caching ● Security ● UI ● CLI env for interactive sessions 27 #VoxxedDaysThessaloniki
  • 28. 28 State of the Art Stream Analysis
  • 29. Analyzing Infinite Data Streams ● Recent advances in Streaming are a result of the pioneer work: ○ MillWheel: Fault-Tolerant Stream Processing at Internet Scale, VLDB 2013. ○ The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing, Proceedings of the VLDB Endowment, vol. 8 (2015), pp. 1792-1803 29 #VoxxedDaysThessaloniki
  • 30. Analyzing Infinite Data Streams ● Two cases for processing: ○ Single event processing: event transformation, trigger an alarm on an error event ○ Event aggregations: summary statistics, group-by, join and similar queries. For example compute the average temperature for the last 5 minutes from a sensor data stream. 30 #VoxxedDaysThessaloniki
  • 31. Analyzing Infinite Data Streams ● Event aggregation introduces the concept of windowing wrt the notion of time selected: ○ Event time (the time that events happen): Important for most use cases where context and correctness matter at the same time. Example: billing applications, anomaly detection. ○ Processing time (the time they are observed during processing): Use cases where I only care about what I process in a window. Example: accumulated clicks on a page per second. ○ System Arrival or Ingestion time (the time that events arrived at the streaming system). ● Ideally event time = Processing time. Reality is: there is skew. 31 #VoxxedDaysThessaloniki
  • 32. Time in Modern Data Stream Analysis Windows come in different flavors: ● Tumbling windows discretize a stream into non-overlapping windows. ○ Eg. report all distinct users every 10 seconds ● Sliding Windows: slide over the stream of data. ○ Eg. report all distinct users for the last 10 minutes every 1 minute. 32 #VoxxedDaysThessaloniki
  • 33. Analyzing Infinite Data Streams ● Watermarks: indicates that no elements with a timestamp older or equal to the watermark timestamp should arrive for the specific window of data. ○ Allows us to mark late data. Late data can either be added to the window or discarded. ● Triggers: decide when the window is evaluated or purged. ○ Allows complex logic for window processing 33 #VoxxedDaysThessaloniki
  • 34. Analyzing Infinite Data Streams ● Apache Beam is the open source successor of Google’s DataFlow ● It is becoming the standard api streaming. Provides the advanced semantics needed for the current needs in streaming applications. 34 #VoxxedDaysThessaloniki
  • 35. Streaming Engines for Big Data OSS ● Apache Flink ● Apache Spark Streaming ● Apache Storm ● Apache Samza ● Apache Apex ● Apache Kafka Streams (Confluent Platform) ● Akka Streams/Gearpump ● Apache Beam Cloud: ● Amazon Kinesis ● Google Dataflow 35 #VoxxedDaysThessaloniki
  • 36. Streaming Engines for Big Data - Pick one Many criteria: use case at hand, existing infrastructure, performance, customer support, cloud vendor, features Recommend to first to look at: ● Apache Flink for low latency and advanced semantics ● Apache Spark for its maturity and rich set of functionality: ML, SQL, GraphX ● Apache Kafka Streams for simple data transformations from and back to Kafka topics 36 #VoxxedDaysThessaloniki
  • 38. Spark in a Nutshell Apache Spark: A memory optimized distributed computing framework. Supports caching of data in memory for speeding computations. 38 #VoxxedDaysThessaloniki
  • 39. Spark in a Nutshell - RDDs Represents a bounded dataset as an RDD (Resilient Distributed Dataset). An RDD can be seen as an immutable distributed collection. Two types of operations can be applied on an RDD: transformations like map and actions like collect. Transformations are lazy while actions trigger computation on the cluster. Operations like groupBy cause shuffle of data across the network. 39 #VoxxedDaysThessaloniki
  • 40. Spark in a Nutshell - Deployment Mode 40 #VoxxedDaysThessaloniki
  • 41. Spark in a Nutshell - Basic Components 41 #VoxxedDaysThessaloniki
  • 42. 42 #VoxxedDaysThessaloniki Spark Batch Sample Word Count https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
  • 43. Spark in a nutshell - Key Features Dynamic Allocation Memory management (Project Tungsten + off heap operations) Cluster managers: Yarn, StandAlone, Mesos Scala, Python, Java, R Micro-batch engine SQL API, ML library, GraphX Monitoring UI 43 #VoxxedDaysThessaloniki
  • 44. Spark Streaming Two flavors of Streaming: ● DStream API Spark 1.X -> mature API ● Structured Streaming (Alpha), Spark 2.0 -> Don’t go to production yet “Based on Spark SQL. User does not need to reason about streaming end to end” 44 #VoxxedDaysThessaloniki
  • 45. Spark Streaming DStream API Discretizes the stream based on batchDuration (batch interval) which is configured once. Provides exactly one semantics with KafkaDirect for DStream or with WAL enabled for reliable receivers/drivers plus checkpointing for driver context recovery. Many transformations and actions you get on a RDD you can get them on DStream as well. 45 #VoxxedDaysThessaloniki
  • 46. Spark Structured Streaming ● Integrates with DF and Dataset API (Spark SQL) for structured queries ● Allows for end-to-end exactly once for specific sources/sinks (HDFS/S3) ○ Requires replayable sources and idempotent sinks ● Input is sent to a query and output of the query is written to a sink. Two types of output implemented: ● Complete Mode - The entire updated Result Table will be written to the external storage. It is up to the storage connector to decide how to handle writing of the entire table. ● Append Mode - Only the new rows appended in the Result Table since the last trigger will be written to the external storage. This is applicable only on the queries where existing rows in the Result Table are not expected to change. 46 #VoxxedDaysThessaloniki
  • 47. Spark Structured Streaming - Not Yet Implemented ● More Sources/Sinks ● Watermarks ● Late data management ● State Sessions 47 #VoxxedDaysThessaloniki
  • 48. 48 #VoxxedDaysThessaloniki DStream API Example reportMax rdd.map(data => data.toInt).max() https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
  • 49. 49 #VoxxedDaysThessaloniki reportMax rdd.map(data => data.toInt).max() DStream API Example CheckPointing get or create the streaming context All streaming code goes here https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
  • 50. 50 Spark SQL - Batch https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
  • 51. 51 Structured Streaming mean code same as batch readStream instead of read writeStream instead of write Session creation is the same as with batch case https://github.com/skonto/talks/tree/master/voxxed-days-thess-2016
  • 53. References 1. http://data-artisans.com/batch-is-a-special-case-of-streaming/ 2. http://www.slideshare.net/rolandkuhn/reactive-streams 3. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 4. https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 5. http://www.slideshare.net/FlinkForward/flink-case-study-capital-one 6. http://flink.apache.org/poweredby.html 7. https://en.wikipedia.org/wiki/Apache_Hadoop 8. http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/ 9. http://data-artisans.com/batch-is-a-special-case-of-streaming/ 10. https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark- streaming.html 11. Ellen Friedman & Kostas Tzoumas, Introduction to Apache Flink, Oreilly 2016 12. http://spark.apache.org/docs/latest/sql-programming-guide.html 13. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html 53 #VoxxedDaysThessaloniki