SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Data Streaming Technology
Overview
Dan Lynn, CEO
CodeFutures
www.codefutures.com
@danklynn
Agenda
¨  What is stream processing?
¨  Why should you care?
¨  Architecture Patterns
¨  Technologies
¨  Q&A
What is Stream Processing?
Streaming
Complex Event Processing
Event Sourcing
Change Data Capture
Reactive programming
Oh my!
What is Stream Processing?
Generally speaking, streaming is…
processing events
in the order they occur.
Ok, why should you care?
Because time happens in
order.
Databases already know this.
Master SlaveReplicationBinary log
files
Binary log
files
Binary log
files
ter Slave
Binary log
files
Binary log
files
Binary log
files
Databases already know this.
Databases already know this.
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
0 1 2 3 4 5 6 7 8 9 . . .
INSERT INTO employees …
UPDATE employees
SET salary = 1 MILLION DOLLARS
Databases already know this.
Replication is streaming
Master Slave
Binary log
files
Binary log
files
Binary log
files
What if you could leverage this?
Master
Binary log
files
Binary log
files
Binary log
files
Message
Queues
Web Servers
Devices
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
What if you could leverage this?
Master
Binary log
files
Binary log
files
Binary log
files
Message
Queues
Web Servers
Devices
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
What if you could leverage this?
Master
Binary log
files
Binary log
files
Binary log
files
Message
Queues
Web Servers
Devices
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Binary log
files
Stream Processing
Architecture Patterns
Lambda Architecture
Binary log
files
Binary log
files
events
Batch Computation
(hadoop)
Speed (real-time) Layer
Serving Layer
Kappa Architecture
Binary log
files
Binary log
files
events
Batch Computation
(hadoop)
Speed (real-time) Layer
Serving Layer
Replay-able Log
Technologies
RabbitMQ
¨  http://www.rabbitmq.com/
¨  History
¤  Long history of successful deployments
¨  How it works
¤  Erlang-based implementation of the Advanced Message
Queuing Protocol (AMQP).
¤  Scales up well, but difficult to scale-out.
Apache Kafka
¨  http://kafka.apache.org/
¨  History
¤  Developed at LinkedIn. Healthy developer community.
¨  How it works
¤  Built on the log-oriented processing concept.
¤  Partitions topics using a leader-follower model,
coordinated by Zookeeper.
¤  Low level and mid-level processing API.
Apache Storm
¨  https://storm.apache.org/
¨  History
¤  Created by Nathan Marz et al. while working at
Twitter.
¨  How it works
¤  Processes events in parallel from a variety of sources.
¤  “At-least-once” reliability semantics.
¤  Structures processing as a graph of incremental
processing called a Topology.
¤  Blend of Java and Clojure
Apache Storm
¨  Gotchas
¤  Debugging can become complex.
¤  Finicky configuration
¨  Neat integrations
¤  Summingbird – Advanced analytics
¤  Trident – Cleaner API, stateful processing
¨  See also:
¤  http://www.slideshare.net/DanLynn1/storm-as-deep-into-
realtime-data-processing-as-you-can-get-in-30-minutes
Apache Samza
¨  http://samza.apache.org/
¨  History
¤  Originated at LinkedIn. Active development by
Confluent.
¨  How it works
¤  Built on top of Kafka and YARN
¤  Provides a clean streaming API to logically connect
different streams of data
¤  Blend of Java and Scala
Apache Spark (Streaming)
¨  http://spark.apache.org/
¨  History
¤  Originated at UC Berkley AMPLAB.
¤  Very active community.
¨  How it works
¤  Structures processing as a DAG of parallel computation.
¤  Can run on YARN or Mesos
¤  Stream processing is supported using small “micro-
batches” (~1 second)
¤  Developed in Scala. Java and Python supported.
LinkedIn Databus
¨  https://github.com/linkedin/databus
¨  History
¤  Originated at LinkedIn. Precursor to Kafka.
¨  How it works
¤  Based of of “Database Log mining” concept.
¤  Follows the replication log of source database(s)
¤  Uses “Data Relays” to replicate and transform data into
other repositories.
AgilData (obligatory sales slide)
¨  http://www.agildata.com/
¨  History
¤  Successor to dbShards, CodeFutures successful relational
sharding product.
¨  How it works
¤  Extends dbShards high performance replication &
partitioning infrastructure to handle stream processing.
¤  Very simple SQL-based API (w/ pluggable UDFs)
¤  Advanced execution planner.
Hailstorm
¨  https://github.com/hailstorm-hs/hailstorm
¨  History
¤  Haskell-based improvement upon Storm.
¨  How it works
¤  Improves the implementation of exactly-once semantics in
Storm by leveraging Kafka and Zookeeper.
¨  Very early but promising.
dan.lynn@codefutures.com
@danklynn
Thanks!

Contenu connexe

Tendances

Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JSuneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JFlink Forward
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchDataWorks Summit/Hadoop Summit
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingFlink Forward
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Spark Summit
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink StreamingTuri, Inc.
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedJamie Grier
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 

Tendances (20)

Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JSuneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba Search
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream Processing
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 

En vedette

Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAPaolo Platter
 
The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...Audrey Neveu
 
Agile Lab_BigData_Meetup
Agile Lab_BigData_MeetupAgile Lab_BigData_Meetup
Agile Lab_BigData_MeetupPaolo Platter
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB
 
iBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retailiBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retailEmanuele Cucci
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
Introduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big DataIntroduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big DataVincenzo Manzoni
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?MapR Technologies
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
R-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
R-IoT Retail IoT - Come le IoT influenzano il mercato del RetailR-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
R-IoT Retail IoT - Come le IoT influenzano il mercato del Retailriot-markets
 
Introduzione ai Big Data e alla scienza dei dati
Introduzione ai Big Data e alla scienza dei datiIntroduzione ai Big Data e alla scienza dei dati
Introduzione ai Big Data e alla scienza dei datiVincenzo Manzoni
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineTianlun Zhang
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
QCON 2015: Gearpump, Realtime Streaming on Akka
QCON 2015: Gearpump, Realtime Streaming on AkkaQCON 2015: Gearpump, Realtime Streaming on Akka
QCON 2015: Gearpump, Realtime Streaming on AkkaSean Zhong
 

En vedette (20)

18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Data API 2.0
Data API 2.0Data API 2.0
Data API 2.0
 
Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKA
 
The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...
 
Agile Lab_BigData_Meetup
Agile Lab_BigData_MeetupAgile Lab_BigData_Meetup
Agile Lab_BigData_Meetup
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
 
iBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retailiBeacon - I vantaggi nel mondo retail
iBeacon - I vantaggi nel mondo retail
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Introduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big DataIntroduzione ai Big Data e alla scienza dei dati - Big Data
Introduzione ai Big Data e alla scienza dei dati - Big Data
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
R-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
R-IoT Retail IoT - Come le IoT influenzano il mercato del RetailR-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
R-IoT Retail IoT - Come le IoT influenzano il mercato del Retail
 
Introduzione ai Big Data e alla scienza dei dati
Introduzione ai Big Data e alla scienza dei datiIntroduzione ai Big Data e alla scienza dei dati
Introduzione ai Big Data e alla scienza dei dati
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engine
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
QCON 2015: Gearpump, Realtime Streaming on Akka
QCON 2015: Gearpump, Realtime Streaming on AkkaQCON 2015: Gearpump, Realtime Streaming on Akka
QCON 2015: Gearpump, Realtime Streaming on Akka
 
Real Time Big Data Framework
Real Time Big Data FrameworkReal Time Big Data Framework
Real Time Big Data Framework
 

Similaire à Data Streaming Technology Overview

Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionDatabricks
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCMark Smith
 
Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Yuval Itzchakov
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaPrateek Maheshwari
 
Real-World Pulsar Architectural Patterns
Real-World Pulsar Architectural PatternsReal-World Pulsar Architectural Patterns
Real-World Pulsar Architectural PatternsDevin Bost
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Drupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthDrupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthPhilip Norton
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...Zhenzhong Xu
 
Data Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup TalkData Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup TalkEren Avşaroğulları
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Cloud Best Practices
Cloud Best PracticesCloud Best Practices
Cloud Best PracticesEric Bottard
 

Similaire à Data Streaming Technology Overview (20)

Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache Samza
 
Real-World Pulsar Architectural Patterns
Real-World Pulsar Architectural PatternsReal-World Pulsar Architectural Patterns
Real-World Pulsar Architectural Patterns
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Drupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp NorthDrupal Performance : DrupalCamp North
Drupal Performance : DrupalCamp North
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
 
Data Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup TalkData Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup Talk
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Cloud Best Practices
Cloud Best PracticesCloud Best Practices
Cloud Best Practices
 

Plus de Dan Lynn

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsDan Lynn
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache SparkDan Lynn
 
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve  with On-Demand SchemasAgilData - How I Learned to Stop Worrying and Evolve  with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand SchemasDan Lynn
 
Data decay and the illusion of the present
Data decay and the illusion of the presentData decay and the illusion of the present
Data decay and the illusion of the presentDan Lynn
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
Storing and manipulating graphs in HBase
Storing and manipulating graphs in HBaseStoring and manipulating graphs in HBase
Storing and manipulating graphs in HBaseDan Lynn
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2Dan Lynn
 

Plus de Dan Lynn (10)

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data Analytics
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
 
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve  with On-Demand SchemasAgilData - How I Learned to Stop Worrying and Evolve  with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
 
Data decay and the illusion of the present
Data decay and the illusion of the presentData decay and the illusion of the present
Data decay and the illusion of the present
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Storing and manipulating graphs in HBase
Storing and manipulating graphs in HBaseStoring and manipulating graphs in HBase
Storing and manipulating graphs in HBase
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2
 

Dernier

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Data Streaming Technology Overview

  • 1. Data Streaming Technology Overview Dan Lynn, CEO CodeFutures www.codefutures.com @danklynn
  • 2. Agenda ¨  What is stream processing? ¨  Why should you care? ¨  Architecture Patterns ¨  Technologies ¨  Q&A
  • 3. What is Stream Processing? Streaming Complex Event Processing Event Sourcing Change Data Capture Reactive programming Oh my!
  • 4. What is Stream Processing? Generally speaking, streaming is… processing events in the order they occur.
  • 5. Ok, why should you care?
  • 7. Databases already know this. Master SlaveReplicationBinary log files Binary log files Binary log files
  • 8. ter Slave Binary log files Binary log files Binary log files Databases already know this.
  • 9. Databases already know this. Binary log files Binary log files Binary log files
  • 10. Binary log files Binary log files Binary log files 0 1 2 3 4 5 6 7 8 9 . . . INSERT INTO employees … UPDATE employees SET salary = 1 MILLION DOLLARS Databases already know this.
  • 11. Replication is streaming Master Slave Binary log files Binary log files Binary log files
  • 12. What if you could leverage this? Master Binary log files Binary log files Binary log files Message Queues Web Servers Devices Binary log files Binary log files Binary log files Binary log files Binary log files Binary log files
  • 13. What if you could leverage this? Master Binary log files Binary log files Binary log files Message Queues Web Servers Devices Binary log files Binary log files Binary log files Binary log files Binary log files Binary log files
  • 14. What if you could leverage this? Master Binary log files Binary log files Binary log files Message Queues Web Servers Devices Binary log files Binary log files Binary log files Binary log files Binary log files Binary log files Stream Processing
  • 16. Lambda Architecture Binary log files Binary log files events Batch Computation (hadoop) Speed (real-time) Layer Serving Layer
  • 17. Kappa Architecture Binary log files Binary log files events Batch Computation (hadoop) Speed (real-time) Layer Serving Layer Replay-able Log
  • 19. RabbitMQ ¨  http://www.rabbitmq.com/ ¨  History ¤  Long history of successful deployments ¨  How it works ¤  Erlang-based implementation of the Advanced Message Queuing Protocol (AMQP). ¤  Scales up well, but difficult to scale-out.
  • 20. Apache Kafka ¨  http://kafka.apache.org/ ¨  History ¤  Developed at LinkedIn. Healthy developer community. ¨  How it works ¤  Built on the log-oriented processing concept. ¤  Partitions topics using a leader-follower model, coordinated by Zookeeper. ¤  Low level and mid-level processing API.
  • 21. Apache Storm ¨  https://storm.apache.org/ ¨  History ¤  Created by Nathan Marz et al. while working at Twitter. ¨  How it works ¤  Processes events in parallel from a variety of sources. ¤  “At-least-once” reliability semantics. ¤  Structures processing as a graph of incremental processing called a Topology. ¤  Blend of Java and Clojure
  • 22. Apache Storm ¨  Gotchas ¤  Debugging can become complex. ¤  Finicky configuration ¨  Neat integrations ¤  Summingbird – Advanced analytics ¤  Trident – Cleaner API, stateful processing ¨  See also: ¤  http://www.slideshare.net/DanLynn1/storm-as-deep-into- realtime-data-processing-as-you-can-get-in-30-minutes
  • 23. Apache Samza ¨  http://samza.apache.org/ ¨  History ¤  Originated at LinkedIn. Active development by Confluent. ¨  How it works ¤  Built on top of Kafka and YARN ¤  Provides a clean streaming API to logically connect different streams of data ¤  Blend of Java and Scala
  • 24. Apache Spark (Streaming) ¨  http://spark.apache.org/ ¨  History ¤  Originated at UC Berkley AMPLAB. ¤  Very active community. ¨  How it works ¤  Structures processing as a DAG of parallel computation. ¤  Can run on YARN or Mesos ¤  Stream processing is supported using small “micro- batches” (~1 second) ¤  Developed in Scala. Java and Python supported.
  • 25. LinkedIn Databus ¨  https://github.com/linkedin/databus ¨  History ¤  Originated at LinkedIn. Precursor to Kafka. ¨  How it works ¤  Based of of “Database Log mining” concept. ¤  Follows the replication log of source database(s) ¤  Uses “Data Relays” to replicate and transform data into other repositories.
  • 26. AgilData (obligatory sales slide) ¨  http://www.agildata.com/ ¨  History ¤  Successor to dbShards, CodeFutures successful relational sharding product. ¨  How it works ¤  Extends dbShards high performance replication & partitioning infrastructure to handle stream processing. ¤  Very simple SQL-based API (w/ pluggable UDFs) ¤  Advanced execution planner.
  • 27. Hailstorm ¨  https://github.com/hailstorm-hs/hailstorm ¨  History ¤  Haskell-based improvement upon Storm. ¨  How it works ¤  Improves the implementation of exactly-once semantics in Storm by leveraging Kafka and Zookeeper. ¨  Very early but promising.