SlideShare une entreprise Scribd logo
1  sur  31
Stream Processing Advanced
with Flink and Pulsar
Till Rohrmann, Engineering Lead at Ververica, @stsffap
Addison Higham, Chief Architect at StreamNative, @addisonjh
Why is Stream Processing Important?
Spectrum of data streaming applications
offline real time
Data warehousing
OLAP / BI / reporting
Machine learning & model
training
Continuous
ETL
Unified offline /
real-time analytics
Continuous
monitoring
(position, risk, …)
Real-time behaviour
modeling (pricing,
recommenders, …)
Real-time alerts
(fraud, security, …)
Real-time ML
model
training/evaluation
Distributed OLTP
applications
Choice of data architecture defines which uses cases you can cover!
Modern Streaming Data Architecture
Stream
Processing
Stream
Processing
Stream
Processing
Stream Storage
Long Term Storage
Results / Views
Triggered
Applications
Event Producers
Reference Streaming Data Architecture: Flink + Pulsar
Stream Storage
Long Term Storage
Results / Views
Stateful
Functions
Event Producers
Apache Flink: Analytics and Applications on Streaming Data
Flink Runtime
Stateful Computations over Data Streams
Stateful Stream
Processing
Streams, State, Time
Streaming Analytics
SQL and Tables
Event-driven
Applications
Stateful Functions
Apache Flink in Numbers
● Contributors: 887
● Most active Apache ML
● Most active sources by visits and commits (#2)
● Github stars: 16.4k
● Commits: >27k
● Releases: 1.13 latest major release
● Maven downloads per month: ~ 170k
● LOC: > 1.8 million
● Latency: < 1s
● Throughput: 4 billion events/s, 7 TB/s
● Input size: 100 TB for batch job
Some Apache Flink Users
One of The Most Advanced Stream Processors
● First class support for state
○ Asynchronous barrier checkpointing algorithm to create globally consistent checkpoints
● Event-time support
○ Correctness under delayed events
● End-to-end exactly once processing guarantees
○ Correctness under faults
● Resource elastic
○ Flink applications’ resources can be adjusted to the actual need
● Unified batch-streaming APIs
○ A single query to process historical as well as live data
● Stream: Sequence of data which is made available over time
● All computation processes chunks of data over time producing results over
time → Stream processing
○ E.g. reading data from disks is done in streaming fashion
● Events can be of various forms
● Decisive difference: Is my stream bounded or not?
Everything Is a Stream
SQL / Table API: Unified Batch & Stream Processing
SQL Query
Batch query
execution
SELECT
room,
TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL ‘1’ HOUR),
room
Stream-Table Duality
User lastLogin
Alice 2021-01-01
Bob 2021-01-02
User lastLogin
Alice 2021-01-14
Bob 2021-02-01
Eve 2021-01-21
Alice, 2021-01-01
Bob, 2021-01-02
Alice, 2021-01-14
Eve, 2021-01-21
Bob, 2021-02-01
SQL / Table API: Running The Same Query On Streams
SQL Query
Incremental
query execution
SELECT
room,
TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL ‘1’ HOUR),
room
Interpret stream as
table
Data Has to Come From Somewhere
● Flink is a powerful compute engine for unified batch and streaming
processing that is pushing boundaries of data processing
● For Flink really to shine, a storage system that supports efficient stream and
batch ingestion is required
Pulsar offers exactly that!
● Apache Pulsar is evolving to meet the needs of a system like Flink, offering
both a low-latency, high bandwidth streaming API as well as a flexible
architecture to support batch access
Why is Pulsar a Great Fit For Streaming And Batch?
Streaming
● Like other streaming systems,
Pulsar can be used like a
distributed scale-out log, providing
consumer controlled offsets and
high throughput through parallel
partitions of topics
Batch
● In addition, Pulsar’s segment-
based, multi-layer architecture
allows for applications to access
historical data more directly via
the underlying storage
Producer Consumer
Broker 2
Broker 1 Broker 3
Topic1-Part1 Topic1-Part2 Topic1-Part3
Segment 2
Segment 1 Segment 3 Segment X
Bookie
1
Bookie
2
Bookie
3
Segment 1
Segment X
Segment 1
Segment X
Segment 1
Segment X
Offload
Segments
+
Flink + Pulsar
Adoption at Scale
While Flink and Pulsar are both widely used independently, together they offer a
strong technology for unified batch and stream storage and compute
Many large organizations are adopting Flink + Pulsar for complex workloads that
can take advantage of the strengths of both systems
State of The Pulsar-Flink Connector Today
● Pulsar-Flink connector supports Flink 1.11, 1.12 and soon 1.13
● Exactly once source and sink via producer deduplication
● Full support for Flink Schema with integration to Pulsar Schema Registry
● Full Flink SQL support for batch and streaming modes
○ Also support for an “upsert” mode which can reason about inserts, updates, and deletes
● Support for KeyShared subscriptions for higher source parallelism than
number of topic-partitions
● https://github.com/streamnative/pulsar-flink
+
Demo
The Road Ahead
Many in-progress features are being developed
● Pulsar-Flink connector being upstreamed into Flink
○ Targeted for Flink 1.14
● Pulsar Transaction Support
○ Pulsar transaction’s allow for stronger exactly-once processing guarantees
● Native Pulsar Watermarking
○ Pulsar is able to broker watermarks between producers and consumers
● Parallel Batch Source
○ Query multiple segments in parallel for higher throughput
● Unified Batch + Streaming Source
○ Using Pulsar’s batch mode for catch up and then switching over to streaming ingestion
Job 1
Flink Support For Pulsar Transactions
Pulsar transactions are GA in Pulsar 2.8.0
Support in Flink is nearing completion, which can allows for end-to-end exactly-
once processing guarantees!
Learn more about transactions at the talk Exactly-Once Made Easy: Transactional Messaging in
Apache Pulsar at 11:30 AM
Job 2
Job 3
tx
The Importance of Watermarks
High-quality watermarks are crucial for correct and
stable stream processing jobs.
In order for results to be correct, we need to take into
account “Event Time” rather than solely relying on
“Processing Time”
This is especially important when replaying or
processing older streaming data
“Watermarks” advance the event-time clock, if the clock
does not “tick” accurately, the results will not be correct
Ideal
Reality
Skew
Event Time
Native Pulsar Watermarking
Pulsar is adding support for watermarks
● Producers have a new API for injecting
watermarks into the stream, with
multiple producers potentially producing
● Watermarks are broadcast across all
partitions, which allows the broker to
dispatch more accurate watermarks to
consumers, in both realtime and
historical scenarios
● The API is designed to be simple to
integrate with Flink, removing the
complexity of accurate watermark
generation from the developer
// Pulsar
for (int = 0; i < NUM_MESSAGES; i++) {
producer.newMessage()
.value("hello with event time! " + i)
.eventTime(
System.currentTimeMillis()).sendAsync();
}
Producer
.newWatermark()
.eventTime(
System.currentTimeMillis()).sendAsync();
// Flink
PulsarSource<String> pulsarSource = new
FlinkPulsarSource(topic, ...);
pulsarSource.assignTimestampsAndWatermarks(
PulsarWatermarkStrategy.forPulsarWatermarks());
Shared Event-Time Domain
End-to-End Watermarks
Job 1
Job 2
Job 3
Producer
W|24
W|17 W|12
W|12
W|8
W|4
Pulsar + Flink and StreamNative + Ververica
Pulsar + Flink community collaboration
● Contributing Pulsar-Flink connector to Flink repository
● Evolve connector to support more advanced Pulsar and Flink features
● Build best in the class open source stream processing platform
StreamNative + Ververica Cloud partnership
● Help customers to unlock the full potential of stream processing
Thanks a lot for your attention!
Questions?

Contenu connexe

Tendances

Tag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and RangerTag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and RangerVimal Sharma
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorImply
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Seunghyun Lee
 
Securing your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris KelloggSecuring your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris KelloggStreamNative
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)KafkaZone
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Databricks
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkDataWorks Summit/Hadoop Summit
 

Tendances (20)

Tag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and RangerTag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and Ranger
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
 
Securing your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris KelloggSecuring your Pulsar Cluster with Vault_Chris Kellogg
Securing your Pulsar Cluster with Vault_Chris Kellogg
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 

Similaire à Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017Thomas Weise
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsMonal Daxini
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 

Similaire à Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote (20)

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 

Plus de StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
 

Plus de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote

  • 1. Stream Processing Advanced with Flink and Pulsar Till Rohrmann, Engineering Lead at Ververica, @stsffap Addison Higham, Chief Architect at StreamNative, @addisonjh
  • 2. Why is Stream Processing Important?
  • 3.
  • 4.
  • 5. Spectrum of data streaming applications offline real time Data warehousing OLAP / BI / reporting Machine learning & model training Continuous ETL Unified offline / real-time analytics Continuous monitoring (position, risk, …) Real-time behaviour modeling (pricing, recommenders, …) Real-time alerts (fraud, security, …) Real-time ML model training/evaluation Distributed OLTP applications Choice of data architecture defines which uses cases you can cover!
  • 6. Modern Streaming Data Architecture Stream Processing Stream Processing Stream Processing Stream Storage Long Term Storage Results / Views Triggered Applications Event Producers
  • 7. Reference Streaming Data Architecture: Flink + Pulsar Stream Storage Long Term Storage Results / Views Stateful Functions Event Producers
  • 8.
  • 9. Apache Flink: Analytics and Applications on Streaming Data Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Streaming Analytics SQL and Tables Event-driven Applications Stateful Functions
  • 10. Apache Flink in Numbers ● Contributors: 887 ● Most active Apache ML ● Most active sources by visits and commits (#2) ● Github stars: 16.4k ● Commits: >27k ● Releases: 1.13 latest major release ● Maven downloads per month: ~ 170k ● LOC: > 1.8 million ● Latency: < 1s ● Throughput: 4 billion events/s, 7 TB/s ● Input size: 100 TB for batch job
  • 12. One of The Most Advanced Stream Processors ● First class support for state ○ Asynchronous barrier checkpointing algorithm to create globally consistent checkpoints ● Event-time support ○ Correctness under delayed events ● End-to-end exactly once processing guarantees ○ Correctness under faults ● Resource elastic ○ Flink applications’ resources can be adjusted to the actual need ● Unified batch-streaming APIs ○ A single query to process historical as well as live data
  • 13. ● Stream: Sequence of data which is made available over time ● All computation processes chunks of data over time producing results over time → Stream processing ○ E.g. reading data from disks is done in streaming fashion ● Events can be of various forms ● Decisive difference: Is my stream bounded or not? Everything Is a Stream
  • 14. SQL / Table API: Unified Batch & Stream Processing SQL Query Batch query execution SELECT room, TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL ‘1’ HOUR), room
  • 15. Stream-Table Duality User lastLogin Alice 2021-01-01 Bob 2021-01-02 User lastLogin Alice 2021-01-14 Bob 2021-02-01 Eve 2021-01-21 Alice, 2021-01-01 Bob, 2021-01-02 Alice, 2021-01-14 Eve, 2021-01-21 Bob, 2021-02-01
  • 16. SQL / Table API: Running The Same Query On Streams SQL Query Incremental query execution SELECT room, TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL ‘1’ HOUR), room Interpret stream as table
  • 17.
  • 18. Data Has to Come From Somewhere ● Flink is a powerful compute engine for unified batch and streaming processing that is pushing boundaries of data processing ● For Flink really to shine, a storage system that supports efficient stream and batch ingestion is required Pulsar offers exactly that! ● Apache Pulsar is evolving to meet the needs of a system like Flink, offering both a low-latency, high bandwidth streaming API as well as a flexible architecture to support batch access
  • 19. Why is Pulsar a Great Fit For Streaming And Batch? Streaming ● Like other streaming systems, Pulsar can be used like a distributed scale-out log, providing consumer controlled offsets and high throughput through parallel partitions of topics Batch ● In addition, Pulsar’s segment- based, multi-layer architecture allows for applications to access historical data more directly via the underlying storage Producer Consumer Broker 2 Broker 1 Broker 3 Topic1-Part1 Topic1-Part2 Topic1-Part3 Segment 2 Segment 1 Segment 3 Segment X Bookie 1 Bookie 2 Bookie 3 Segment 1 Segment X Segment 1 Segment X Segment 1 Segment X Offload Segments
  • 20. +
  • 22. Adoption at Scale While Flink and Pulsar are both widely used independently, together they offer a strong technology for unified batch and stream storage and compute Many large organizations are adopting Flink + Pulsar for complex workloads that can take advantage of the strengths of both systems
  • 23. State of The Pulsar-Flink Connector Today ● Pulsar-Flink connector supports Flink 1.11, 1.12 and soon 1.13 ● Exactly once source and sink via producer deduplication ● Full support for Flink Schema with integration to Pulsar Schema Registry ● Full Flink SQL support for batch and streaming modes ○ Also support for an “upsert” mode which can reason about inserts, updates, and deletes ● Support for KeyShared subscriptions for higher source parallelism than number of topic-partitions ● https://github.com/streamnative/pulsar-flink
  • 25. The Road Ahead Many in-progress features are being developed ● Pulsar-Flink connector being upstreamed into Flink ○ Targeted for Flink 1.14 ● Pulsar Transaction Support ○ Pulsar transaction’s allow for stronger exactly-once processing guarantees ● Native Pulsar Watermarking ○ Pulsar is able to broker watermarks between producers and consumers ● Parallel Batch Source ○ Query multiple segments in parallel for higher throughput ● Unified Batch + Streaming Source ○ Using Pulsar’s batch mode for catch up and then switching over to streaming ingestion
  • 26. Job 1 Flink Support For Pulsar Transactions Pulsar transactions are GA in Pulsar 2.8.0 Support in Flink is nearing completion, which can allows for end-to-end exactly- once processing guarantees! Learn more about transactions at the talk Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar at 11:30 AM Job 2 Job 3 tx
  • 27. The Importance of Watermarks High-quality watermarks are crucial for correct and stable stream processing jobs. In order for results to be correct, we need to take into account “Event Time” rather than solely relying on “Processing Time” This is especially important when replaying or processing older streaming data “Watermarks” advance the event-time clock, if the clock does not “tick” accurately, the results will not be correct Ideal Reality Skew Event Time
  • 28. Native Pulsar Watermarking Pulsar is adding support for watermarks ● Producers have a new API for injecting watermarks into the stream, with multiple producers potentially producing ● Watermarks are broadcast across all partitions, which allows the broker to dispatch more accurate watermarks to consumers, in both realtime and historical scenarios ● The API is designed to be simple to integrate with Flink, removing the complexity of accurate watermark generation from the developer // Pulsar for (int = 0; i < NUM_MESSAGES; i++) { producer.newMessage() .value("hello with event time! " + i) .eventTime( System.currentTimeMillis()).sendAsync(); } Producer .newWatermark() .eventTime( System.currentTimeMillis()).sendAsync(); // Flink PulsarSource<String> pulsarSource = new FlinkPulsarSource(topic, ...); pulsarSource.assignTimestampsAndWatermarks( PulsarWatermarkStrategy.forPulsarWatermarks());
  • 29. Shared Event-Time Domain End-to-End Watermarks Job 1 Job 2 Job 3 Producer W|24 W|17 W|12 W|12 W|8 W|4
  • 30. Pulsar + Flink and StreamNative + Ververica Pulsar + Flink community collaboration ● Contributing Pulsar-Flink connector to Flink repository ● Evolve connector to support more advanced Pulsar and Flink features ● Build best in the class open source stream processing platform StreamNative + Ververica Cloud partnership ● Help customers to unlock the full potential of stream processing
  • 31. Thanks a lot for your attention! Questions?

Notes de l'éditeur

  1. Faster results → faster business decisions → competitive advantage → more money
  2. Generalization of batch processing Processing of historical data Processing of live data Processing both historical data + live data
  3. Brings together the worlds of data analytics and event-driven applications Building powerful applications which benefit from real time analytics (e.g. fraud detection, match-making, etc.)
  4. Real time search and recommendation models (e.g., Alibaba) Build a real-time session behavior profile of users (e.g., Netflix) Real time trade settlement dashboard (e.g., UBS) Real time revenue accounting (various AdTechs) Machine Learning-based anomaly/fraud detection (e.g., ING, Microsoft) Real-time data refinement and data pipelines (many) Realtime Analytics Platforms (e.g., Alibaba, Uber, Lyft, Yelp!, Tencent) Materializing Views (dashboards, data marts) ETL - batch and continuous Machine Learning Training (Alibaba, new ML library) Sub-second latency Peak throughput: 4B Event/s Throughput size: 7 TB/s Largest Batch Job Input size: 100TB Flink cluster size: 30K
  5. If everything is a stream, can I treat it with a single API? TableAPI, SQL and DataStream can handle bounded (batch) and unbounded streams Write the program once → Run it on live and historic data Lower development and maintenance costs Having a single engine for batch and stream processing Do batch processing when catching up with the stream One engine to run batch & streaming workloads Scheduler which can schedule bounded and unbounded tasks Exploiting bounded streams property for runtime optimizations More batching of results Out of order processing Stage wise execution possible
  6. In this section, I will give a brief overview of watermarks and discuss correctness and skew a bit
  7. Watermarks are assertions about the progression of event time, according to the data producer. Pulsar is brokering the assertions from producer to consumer. The quality of the watermarks (and hence the job output) is determined by the producer; system internals (partitioning, I/O performance, etc) are effectively factored out.