SlideShare une entreprise Scribd logo
1  sur  107
Apache Samza*
Stream Processing at LinkedIn
Chris Riccomini
9/27/2013

* Incubating
Stream Processing?
0 ms

Response latency
0 ms

Response latency

Synchronous
0 ms

Response latency

Synchronous

Later. Possibly much later.
0 ms

Response latency
Milliseconds to minutes
Synchronous

Later. Possibly much later.
Newsfeed
News
Ad Relevance
Email
Search Indexing Pipeline
Metrics and Monitoring
Motivation
Real-time Feeds
•
•
•
•

User activity
Metrics
Monitoring
Database Changes
Real-time Feeds
• 10+ billion writes per day
• 172,000 messages per second (average)
• 55+ billion messages per day to real-time
consumers
Stream Processing is Hard
•
•
•
•
•
•

Partitioning
State
Re-processing
Failure semantics
Joins to services or database
Non-determinism
Samza Concepts
&
Architecture
Streams
Partition 0

Partition 1

Partition 2
Streams
Partition 0

1
2
3
4
5
6

Partition 1

1
2
3
4
5

Partition 2

1
2
3
4
5
6
7
Streams
Partition 0

1
2
3
4
5
6

Partition 1

1
2
3
4
5

Partition 2

1
2
3
4
5
6
7
Streams
Partition 0

1
2
3
4
5
6

Partition 1

1
2
3
4
5

Partition 2

1
2
3
4
5
6
7
Streams
Partition 0

1
2
3
4
5
6

Partition 1

1
2
3
4
5

Partition 2

1
2
3
4
5
6
7
Streams
Partition 0

1
2
3
4
5
6

Partition 1

1
2
3
4
5

Partition 2

1
2
3
4
5
6
7
Streams
Partition 0

1
2
3
4
5
6

Partition 1

1
2
3
4
5

Partition 2

1
2
3
4
5
6
7

next append
Tasks
Partition 0
Tasks
Partition 0

Task 1
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
Tasks
Partition 0

Task 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Partition 0

Partition 1

Output Count Stream
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Partition 0

Partition 1

Output Count Stream
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Partition 0

Partition 1

Output Count Stream
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Output Count Stream

Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Output Count Stream

Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Output Count Stream

Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Output Count Stream
Partition 0

Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Output Count Stream
Partition 0

Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0

Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0

Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0

Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0
Partition 1
Tasks
Page Views - Partition 0

1
2
3
4
PageKeyViews
CounterTask

Checkpoint
Stream

2
Output Count Stream

Partition 1
Partition 0
Partition 1
Jobs
Stream A

Task 1

Task 2

Stream B

Task 3
Jobs
Stream A

Task 1

Stream B

Task 2

Stream C

Task 3
Jobs
AdViews

Task 1

AdClicks

Task 2

AdClickThroughRate

Task 3
Jobs
AdViews

Task 1

AdClicks

Task 2

AdClickThroughRate

Task 3
Jobs
Stream A

Task 1

Stream B

Task 2

Stream C

Task 3
Dataflow
Stream A

Stream B

Job 1

Stream D

Job 2

Stream E

Job 3

Stream B

Stream C
Dataflow
Stream A

Stream B

Job 1

Stream D

Job 2

Stream E

Job 3

Stream B

Stream C
YARN
Jobs
Stream A

Task 1

Task 2

Stream B

Task 3
Containers
Stream A

Task 1

Task 2

Stream B

Task 3
Containers
Stream A

Samza Container 1

Stream B

Samza Container 2
Containers

Samza Container 1

Samza Container 2
YARN
Host 1

Samza Container 1

Host 2

Samza Container 2
YARN
Host 1

Host 2

NodeManager

NodeManager

Samza Container 1

Samza Container 2
YARN
Host 1

Host 2

NodeManager

NodeManager

Samza Container 1

Samza Container 2

Samza YARN AM
YARN
Host 1

Host 2

NodeManager

NodeManager

Samza Container 1

Kafka Broker

Samza Container 2

Samza YARN AM

Kafka Broker
YARN
Host 1

Host 2

NodeManager

NodeManager

MapReduce
Container

HDFS

MapReduce
YARN AM

MapReduce
Container

HDFS
YARN
Host 1
Stream A

NodeManager

Samza Container 1
Samza Container 1

Kafka Broker
Stream C

Samza
Container 2
YARN
Host 1
Stream A

NodeManager

Samza Container 1
Samza Container 1

Kafka Broker
Stream C

Samza
Container 2
YARN
Host 1
Stream A

NodeManager

Samza Container 1
Samza Container 1

Kafka Broker
Stream C

Samza
Container 2
YARN
Host 1
Stream A

NodeManager

Samza Container 1
Samza Container 1

Kafka Broker
Stream C

Samza
Container 2
YARN
Host 1

Host 2

NodeManager

NodeManager

Samza Container 1

Kafka Broker

Samza Container 2

Samza YARN AM

Kafka Broker
CGroups
Host 1

Host 2

NodeManager

NodeManager

Samza Container 1

Kafka Broker

Samza Container 2

Samza YARN AM

Kafka Broker
(Not Running) Multi-Framework
Host 1

Host 2

NodeManager

NodeManager

Samza Container 1

Kafka

MapReduce
Container

Samza YARN AM

HDFS
Stateful Processing
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 50;
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 50;
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 50;
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 10;
How do people do this?
Remote Stores
Stream A

Task 1

Task 2

Task 3

Key-Value
Store
Stream B
Remote RPC is slow
• Stream: ~500k records/sec/container
• DB: << less
Online vs. Async
No undo
• Database state is non-deterministic
• Can’t roll back mutations if task crashes
Tables & Streams
put(a, w)
put(b, x)
Database

put(a, y)

put(b, z)

Time
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Stateful Tasks
Stream A

Task 1

Task 2

Stream B

Task 3

Changelog Stream
Key-Value Store
•
•
•
•

put(table_name, key, value)
get(table_name, key)
delete(table_name, key)
range(table_name, key1, key2)
Whew!
Let’s be Friends!
• We are incubating, and you can help!
• Get up and running in 5 minutes
http://bit.ly/hello-samza
• Grab some newbie JIRAs
http://bit.ly/samza_newbie_issues

Contenu connexe

Tendances

Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
HBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnBHBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnBHBaseCon
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
 
Stream Processing using Samza SQL
Stream Processing using Samza SQLStream Processing using Samza SQL
Stream Processing using Samza SQLSamarth Shetty
 
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...confluent
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configsconfluent
 
Apache HBase at Airbnb
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb HBaseCon
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processingconfluent
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
 
Bootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and SparkBootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and SparkAlex Silva
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
 
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)confluent
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)ucelebi
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13DECK36
 

Tendances (20)

Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
HBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnBHBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnB
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, Confluent
 
Stream Processing using Samza SQL
Stream Processing using Samza SQLStream Processing using Samza SQL
Stream Processing using Samza SQL
 
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Way to kafka connect
Way to kafka connectWay to kafka connect
Way to kafka connect
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
Apache HBase at Airbnb
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Bootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and SparkBootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and Spark
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 

En vedette

Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelMartin Kleppmann
 
London hug-samza
London hug-samzaLondon hug-samza
London hug-samzahuguk
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeTao Feng
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler
 

En vedette (6)

Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
 
London hug-samza
London hug-samzaLondon hug-samza
London hug-samza
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Samza la hug
Samza la hugSamza la hug
Samza la hug
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 

Similaire à Apache Incubator Samza: Stream Processing at LinkedIn

LinkedIn-Teradata Summit feb 25, 2015
LinkedIn-Teradata Summit feb 25, 2015LinkedIn-Teradata Summit feb 25, 2015
LinkedIn-Teradata Summit feb 25, 2015Navina Ramesh
 
Samza tech talk_2015 - huawei
Samza tech talk_2015 - huaweiSamza tech talk_2015 - huawei
Samza tech talk_2015 - huaweiYi Pan
 
#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoringchriswoj
 
Samza 0.13 meetup slide v1.0.pptx
Samza 0.13 meetup slide   v1.0.pptxSamza 0.13 meetup slide   v1.0.pptx
Samza 0.13 meetup slide v1.0.pptxYi Pan
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
Samza at LinkedIn
Samza at LinkedInSamza at LinkedIn
Samza at LinkedInVenu Ryali
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzaAbhishek Shivanna
 
Monitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandMonitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandD
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuningMongoDB
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...butest
 
SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...butest
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansEvention
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Anyscale
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...confluent
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and SparkArtem Chebotko
 
Yogesh kumar kushwah represent’s
Yogesh kumar kushwah represent’sYogesh kumar kushwah represent’s
Yogesh kumar kushwah represent’sYogesh Kushwah
 

Similaire à Apache Incubator Samza: Stream Processing at LinkedIn (20)

LinkedIn-Teradata Summit feb 25, 2015
LinkedIn-Teradata Summit feb 25, 2015LinkedIn-Teradata Summit feb 25, 2015
LinkedIn-Teradata Summit feb 25, 2015
 
Samza tech talk_2015 - huawei
Samza tech talk_2015 - huaweiSamza tech talk_2015 - huawei
Samza tech talk_2015 - huawei
 
#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring
 
Samza 0.13 meetup slide v1.0.pptx
Samza 0.13 meetup slide   v1.0.pptxSamza 0.13 meetup slide   v1.0.pptx
Samza 0.13 meetup slide v1.0.pptx
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Samza at LinkedIn
Samza at LinkedInSamza at LinkedIn
Samza at LinkedIn
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
 
Monitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandMonitoring und Metriken im Wunderland
Monitoring und Metriken im Wunderland
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuning
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...
 
SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...SVR17: Data-Intensive Computing on Windows HPC Server with the ...
SVR17: Data-Intensive Computing on Windows HPC Server with the ...
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and Spark
 
Yogesh kumar kushwah represent’s
Yogesh kumar kushwah represent’sYogesh kumar kushwah represent’s
Yogesh kumar kushwah represent’s
 

Dernier

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Apache Incubator Samza: Stream Processing at LinkedIn

Notes de l'éditeur

  1. - stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
  2. - stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
  3. - stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
  4. - stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
  5. - stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
  6. - compute top shares, pull in, scrape, entity tag- language detection- send emails: friend was in the news- requirement: has to be fast, since news is trendy
  7. - relevance pipeline
  8. - we send relatively data rich emails- some emails are time sensitive (need to be sent soon)
  9. - time sensitive- data ingestion pattern- other systems that follow this pattern: realtimeolap system, and social graph system
  10. - ecosystem at LinkedIn (some unique traits)- hard unsolved problems in this space
  11. - oncewe had all this data in kafka, we wanted to do stuff with it.- persistent,reliable,distributed,message queue- Kafka = first among equals, but stream systems are pluggable. Just like Hadoop with HDSF vs. S3.
  12. - started with just simple web service that consumes and produces kafka messages.- realized that there are a lot of hard problems that needed to be solved.- reprocessing: what if my algorithm changes and I need to reprocess all events?- non-determinism: queries to external systems, time dependencies, ordering of messages.
  13. - open area of research- been around for 20 years
  14. partitioned
  15. re-playableorderedfault tolerantinfinitevery heavyweight definition of a stream (vs. s4, storm, etc)
  16. At least once messaging. Duplicates are possible.Future: exact semantics.Transparent to user. No ack’ing API.
  17. connected by stream name onlyfully buffered
  18. - group by, sum, count
  19. - stream to stream, stream to table, table to table
  20. - buffered sorting
  21. UDP is an over-optimization, since most processors try to remote join, which is very slow.
  22. Changelog/redologState machine model
  23. Can also consume these streams from other jobs.
  24. - can’t keep messages forever. - log compaction: delete over-written keys over time.
  25. - can’t keep messages forever. - log compaction: delete over-written keys over time.
  26. storeAPI is pluggable: Lucene, buffered sort, external sort, bitmap index, bloom filters and sketches