SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
Check out these resources:
Dean’s book
Webinars
etc.
Fast Data Architectures 

for Streaming Applications
Getting Answers Now from Data Sets that Never End
By Dean Wampler, Ph. D., VP of Fast Data Engineering
2
lightbend.com/products/fast-data-platform
Streaming Engines in Context…
Classic Batch Architecture:
Hadoop
Logs
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
YARN
Resource	
Manager
Node	
Manager
N
M
Batch
MapReduce
…
Spark
Flume
SqoopDBs
Logs
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
YARN
Resource	
Manager
Node	
Manager
N
M
Batch
MapReduce
…
Spark
Flume
SqoopDBs
Logs
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
YARN
Resource	
Manager
Node	
Manager
N
M
Batch
MapReduce
…
Spark
Flume
SqoopDBs
Logs
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
YARN
Resource	
Manager
Node	
Manager
N
M
Batch
MapReduce
…
Spark
Flume
SqoopDBs
Logs
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
YARN
Resource	
Manager
Node	
Manager
N
M
Batch
MapReduce
…
Spark
Flume
SqoopDBs
New Streaming, “Fast Data” Architecture
(but it also supports batch)
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
• Why Kafka?
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N * M links ConsumersProducers
Before:
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N + M links ConsumersProducers
After:
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
Mesos, Kubernetes, YARN, …
Cloud, on premise, …
Logs
Sockets
REST
ZooKeeper Cluster
ZK
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 11
KaFa Cluster
Ka9a
Microservices
RP Go
Node.js …
2
4
7
8
9
10
Beam
Streaming Engines
Features to Consider
• Low latency? How low?
• High volume? How high?
• Which kinds of data processing, analytics?
• Process data in bulk or individually?
•Bulk processing of records?
•Individual processing of events?
• Preferred application architecture?
• Low latency? How low?
www.spacex.com/news
• Low latency? How low?
• Real real time? pico- to microseconds
www.spacex.com/news
• Low latency? How low?
• < 100 microseconds?
tradinghub.co/watch-list-for-mar-26th-2015/
www.usa.philips.com/
• Low latency? How low?
• < 10 milliseconds?
money.cnn.com/2017/05/12/pf/credit-card-mistakes/index.html
• Low latency? How low?
• < 100s milliseconds?
github.com/keen/dashboards
coursera.org/learn/machine-learning
• Low latency? How low?
• < 1 second to minutes
ETL
Model	Training
storage
Data
Model
Training
Model
Serving
Other
Logic
Logs
Ka'a
Raw	Logs	Topic
Parsed	Logs	Topic
Ka'a
Streams
Job
• Low latency? How low?
• > 1 minute?
• Use short batch jobs
• High volume? How high?
• High volume? How high?
• < 1oK -100K per second?
drdobbs.com/web-development/	
soa-web-services-and-restful-systems/199902676
• High volume? How high?
• > 1M per second?
https://store.nest.com/product/thermostat/
• Which kinds of data processing, analytics?
• SQL?
SELECT		COUNT(*)	
FROM	my-iot-data	
GROUP	BY	zip-code
val	input	=	spark.read.	
		format(“parquet”).	
		stream(“my-iot-data”)	
input.groupBy(“zip-code”).	
		count()
• Which kinds of data processing, analytics?
• “Dataflow”?
val sc = new SparkContext("local[*]", "Inverted Idx")
sc.textFile("data/crawl")
.map { line => val Array(path, text) = line.split(“t”,2); (path, text
} flatMap {
case (path, text) => text.split(“”"W+""").map((_, path))
} map {
case (w, p) => ((w, p), 1)
} reduceByKey {
case (n1, n2) => n1 + n2
} map {
• Which kinds of data processing, analytics?
• ETL?
ETL
Logs
Ka'a
Raw	Logs	Topic
Parsed	Logs	Topic
Ka'a
Streams
Job
• Which kinds of data processing, analytics?
• Train and serve ML models?
storage
Data
Model
Training
Model
Serving
Other
Logic
• Process data in bulk or individually?
• Individual events (i.e., CEP).
• In bulk records (i.e., each datum’s identity
unimportant).
Microservice
Microservice
Microservice
Microservice
Service	
Actor	1
Event
Event
Event
Event
Event
Event
Router
Actor
Service	
Actor	2
…
SA13
SA11
SA12
SA23
SA21
SA22
SELECT		COUNT(*)	
FROM	my-iot-data	
GROUP	BY	zip-code
• Preferred application architecture
• Streaming library in an app?
• Distributed services running your job?
Mini-batch
Spark	
Streaming
Low Latency
Flink
Ka0a	Streams
Akka	Streams
Beam
…
Mini-batch
Spark	
Streaming
Low Latency
Flink
Ka0a	Streams
Akka	Streams
Beam
…
Best of Breed Streaming Engines
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
• Apache Beam
• (Formerly Google Dataflow)
• Define your flows; run with
Flink, Spark, etc.
• Beam is defining the state of
the art for streaming
semantics
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
• Apache Flink
• Low-latency streaming
• Best Beam runner
• SQL, ML, etc.
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
• Apache Spark
• Best known; large community
• Batch, mini-batch, and new
low-latency streaming
• SQL, ML, etc.
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
• Akka Streams
• Low-latency streaming
• Rich dataflow language
• Rich APIs for microservices,
data sources and sinks
• Excellent for model serving
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
• Kafka Streams
• Read, write Kafka topics
• Stream and Table abstractions
• SQL on streams
Low Latency and
Mini-batch
Spark	
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a	Streams
Akka	Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
• Spark or Flink?
• Best for massive data sets
• Rich analytics
• Akka Streams or Kafka Streams
• Best for microservice
integration
• Wider flexibility
Check out these resources:
Dean’s book
Webinars
etc.
Fast Data Architectures 

for Streaming Applications
Getting Answers Now from Data Sets that Never End
By Dean Wampler, Ph. D., VP of Fast Data Engineering
48
lightbend.com/products/fast-data-platform
For more information on
Lightbend Fast Data Platform:
lightbend.com/fast-data-platform

Contenu connexe

Tendances

Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 

Tendances (20)

Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreTypesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
 
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudPakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
 
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed ApplicationsAkka at Enterprise Scale: Performance Tuning Distributed Applications
Akka at Enterprise Scale: Performance Tuning Distributed Applications
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in production
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Akka and Kubernetes: Reactive From Code To Cloud
Akka and Kubernetes: Reactive From Code To CloudAkka and Kubernetes: Reactive From Code To Cloud
Akka and Kubernetes: Reactive From Code To Cloud
 

Similaire à Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job

Similaire à Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job (20)

Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 

Plus de Lightbend

Plus de Lightbend (20)

IoT 'Megaservices' - High Throughput Microservices with Akka
IoT 'Megaservices' - High Throughput Microservices with AkkaIoT 'Megaservices' - High Throughput Microservices with Akka
IoT 'Megaservices' - High Throughput Microservices with Akka
 
How Akka Cluster Works: Actors Living in a Cluster
How Akka Cluster Works: Actors Living in a ClusterHow Akka Cluster Works: Actors Living in a Cluster
How Akka Cluster Works: Actors Living in a Cluster
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
 
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka MicroservicesPutting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and Microservices
 
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
Detecting Real-Time Financial Fraud with Cloudflow on KubernetesDetecting Real-Time Financial Fraud with Cloudflow on Kubernetes
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
 
Cloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful ServerlessCloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful Serverless
 
Digital Transformation from Monoliths to Microservices to Serverless and Beyond
Digital Transformation from Monoliths to Microservices to Serverless and BeyondDigital Transformation from Monoliths to Microservices to Serverless and Beyond
Digital Transformation from Monoliths to Microservices to Serverless and Beyond
 
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Microservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done RightMicroservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done Right
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In Practice
 
Akka and Kubernetes: A Symbiotic Love Story
Akka and Kubernetes: A Symbiotic Love StoryAkka and Kubernetes: A Symbiotic Love Story
Akka and Kubernetes: A Symbiotic Love Story
 
Scala 3 Is Coming: Martin Odersky Shares What To Know
Scala 3 Is Coming: Martin Odersky Shares What To KnowScala 3 Is Coming: Martin Odersky Shares What To Know
Scala 3 Is Coming: Martin Odersky Shares What To Know
 
Migrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsMigrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive Systems
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Designing Events-First Microservices For A Cloud Native World
Designing Events-First Microservices For A Cloud Native WorldDesigning Events-First Microservices For A Cloud Native World
Designing Events-First Microservices For A Cloud Native World
 
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For ScalaScala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Dernier (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 

Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job

  • 1.
  • 2. Check out these resources: Dean’s book Webinars etc. Fast Data Architectures 
 for Streaming Applications Getting Answers Now from Data Sets that Never End By Dean Wampler, Ph. D., VP of Fast Data Engineering 2 lightbend.com/products/fast-data-platform
  • 3. Streaming Engines in Context…
  • 10. New Streaming, “Fast Data” Architecture (but it also supports batch)
  • 11. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 12. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 13. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 14. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 15. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 16. • Why Kafka? Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After:
  • 17. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 18. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 19. Mesos, Kubernetes, YARN, … Cloud, on premise, … Logs Sockets REST ZooKeeper Cluster ZK Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 11 KaFa Cluster Ka9a Microservices RP Go Node.js … 2 4 7 8 9 10 Beam
  • 22. • Low latency? How low? • High volume? How high? • Which kinds of data processing, analytics? • Process data in bulk or individually? •Bulk processing of records? •Individual processing of events? • Preferred application architecture?
  • 23. • Low latency? How low? www.spacex.com/news
  • 24. • Low latency? How low? • Real real time? pico- to microseconds www.spacex.com/news
  • 25. • Low latency? How low? • < 100 microseconds? tradinghub.co/watch-list-for-mar-26th-2015/ www.usa.philips.com/
  • 26. • Low latency? How low? • < 10 milliseconds? money.cnn.com/2017/05/12/pf/credit-card-mistakes/index.html
  • 27. • Low latency? How low? • < 100s milliseconds? github.com/keen/dashboards coursera.org/learn/machine-learning
  • 28. • Low latency? How low? • < 1 second to minutes ETL Model Training storage Data Model Training Model Serving Other Logic Logs Ka'a Raw Logs Topic Parsed Logs Topic Ka'a Streams Job
  • 29. • Low latency? How low? • > 1 minute? • Use short batch jobs
  • 30. • High volume? How high?
  • 31. • High volume? How high? • < 1oK -100K per second? drdobbs.com/web-development/ soa-web-services-and-restful-systems/199902676
  • 32. • High volume? How high? • > 1M per second? https://store.nest.com/product/thermostat/
  • 33. • Which kinds of data processing, analytics? • SQL? SELECT COUNT(*) FROM my-iot-data GROUP BY zip-code val input = spark.read. format(“parquet”). stream(“my-iot-data”) input.groupBy(“zip-code”). count()
  • 34. • Which kinds of data processing, analytics? • “Dataflow”? val sc = new SparkContext("local[*]", "Inverted Idx") sc.textFile("data/crawl") .map { line => val Array(path, text) = line.split(“t”,2); (path, text } flatMap { case (path, text) => text.split(“”"W+""").map((_, path)) } map { case (w, p) => ((w, p), 1) } reduceByKey { case (n1, n2) => n1 + n2 } map {
  • 35. • Which kinds of data processing, analytics? • ETL? ETL Logs Ka'a Raw Logs Topic Parsed Logs Topic Ka'a Streams Job
  • 36. • Which kinds of data processing, analytics? • Train and serve ML models? storage Data Model Training Model Serving Other Logic
  • 37. • Process data in bulk or individually? • Individual events (i.e., CEP). • In bulk records (i.e., each datum’s identity unimportant). Microservice Microservice Microservice Microservice Service Actor 1 Event Event Event Event Event Event Router Actor Service Actor 2 … SA13 SA11 SA12 SA23 SA21 SA22 SELECT COUNT(*) FROM my-iot-data GROUP BY zip-code
  • 38. • Preferred application architecture • Streaming library in an app? • Distributed services running your job? Mini-batch Spark Streaming Low Latency Flink Ka0a Streams Akka Streams Beam … Mini-batch Spark Streaming Low Latency Flink Ka0a Streams Akka Streams Beam …
  • 39. Best of Breed Streaming Engines
  • 40. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam
  • 41. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Apache Beam • (Formerly Google Dataflow) • Define your flows; run with Flink, Spark, etc. • Beam is defining the state of the art for streaming semantics
  • 42.
  • 43. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Apache Flink • Low-latency streaming • Best Beam runner • SQL, ML, etc.
  • 44. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Apache Spark • Best known; large community • Batch, mini-batch, and new low-latency streaming • SQL, ML, etc.
  • 45. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Akka Streams • Low-latency streaming • Rich dataflow language • Rich APIs for microservices, data sources and sinks • Excellent for model serving
  • 46. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Kafka Streams • Read, write Kafka topics • Stream and Table abstractions • SQL on streams
  • 47. Low Latency and Mini-batch Spark Streaming Batch Spark … Low Latency Flink Ka9a Streams Akka Streams Beam … Persistence S3 HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Ka9a 2 4 7 8 9 10 Beam • Spark or Flink? • Best for massive data sets • Rich analytics • Akka Streams or Kafka Streams • Best for microservice integration • Wider flexibility
  • 48. Check out these resources: Dean’s book Webinars etc. Fast Data Architectures 
 for Streaming Applications Getting Answers Now from Data Sets that Never End By Dean Wampler, Ph. D., VP of Fast Data Engineering 48 lightbend.com/products/fast-data-platform
  • 49. For more information on Lightbend Fast Data Platform: lightbend.com/fast-data-platform