SlideShare une entreprise Scribd logo
1  sur  30
Sink Your Teeth
into Streaming
at Any Scale
David Kjerrumgaard, Developer Advocate at Stream Native
Tim Spann, Developer Advocate at Stream Native
David Kjerrumgaard
■ Apache Pulsar Committer | Author of Pulsar In Action
■ Former Principal Software Engineer on Splunk’s
messaging team that is responsible for Splunk’s internal
Pulsar-as-a-Service platform.
■ Former Director of Solution Architecture at Streamlio.
Your photo
goes here,
smile :)
Tim Spann
■ FLiP(N) Stack = Flink, Pulsar and NiFi Stack
■ Streaming Systems & Data Architecture Expert
■ DZone Iot and ML Top Expert
■ 15+ years of experience with streaming technologies including Pulsar,
Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more.
■ Today, he helps to grow the Pulsar community sharing rich technical
knowledge and experience at both global conferences and through
individual conversations.
■ The Team to Stream
■ What is Apache Pulsar?
■ Join The Streams with Flink SQL
■ Fast Sink to ScyllaDB
■ Reference Architecture
■ Demo
Presentation Agenda
Stream Team
Apps
Building Real-Time Requires a Team
Apache Pulsar
Apache Pulsar is a Cloud-Native Messaging and
Event-Streaming Platform.
CREATED
Originally developed
inside Yahoo! as
Cloud Messaging
Service
GROWTH
10x Contributors
10MM+ Downloads
Ecosystem Expands
Kafka on Pulsar
AMQ on Pulsar
Functions
. . .
2012 2016 2018 TODAY
APACHE TLP
Pulsar becomes
Apache top level
project.
OPEN SOURCE
Pulsar committed
to open source.
Apache Pulsar Timeline
Pulsar Growth
Contributors
11X Growth
Github Stars
5X Growth
10,000,000+
Docker Images
Downloads
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
Multi-Tenancy Model
Tenants
(Data Services)
Namespace
(Microservices)
Pulsar Cluster
Tenants
(Marketing)
Tenants
(Compliance)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Namespace
(Risk Assessment)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-2
(Risk Detection)
Integrated Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Automatic Load Balancing
Pulsar supports automatic load shedding whenever the system recognizes a
particular broker is overloaded, the system forces some traffic to be reassigned
to less-loaded brokers.
Key Features Apache Pulsar Apache Kafka
Message retention (time-based) ✔ ✔
Message replay ✔ ✔
Message retention (acknowledge-based) ✔ ✘
Built-in tiered storage ✔ ✘
Processing capabilities (fully managed) Pulsar Functions KStream
Queue semantics (round robin) ✔ ✘
Queue semantics (key based) ✔ ✘
Dead letter queue ✔ ✘
Key Features Apache Pulsar Apache Kafka
Scheduled and delayed delivery ✔ ✔
Rebalance-free scaling ✔ ✔
Elastically scalable ✔ ✘
Maximum number of topics Millions Up to 100k
Built-in multi-tenancy ✔ ✘
Built-in geo-replication ✔ ✘
Built-in schema management ✔ ✘
End-to-end encryption ✔ ✘
Flink SQL
Pulsar-Flink Connector
■ Pulsar provides a connector that enables Flink to read & write data directly from/to Pulsar
topics.
■ This makes Pulsar Topic data available for Flink SQL queries.
■ https://hub.streamnative.io/data-processing/pulsar-flink/1.15.1.1/
Flink SQL
Apache Flink
■ Unified computing engine
■ Batch processing is a special case
of stream processing
■ Stateful processing
■ Massive Scalability
■ Flink SQL for queries, inserts
against Pulsar Topics
■ Streaming Analytics
■ Continuous SQL
■ Continuous ETL
■ Complex Event Processing
■ Standard SQL Powered by Apache
Calcite
Apache Flink
Streaming Tables
■ A streaming table represents an unbounded sequence of structured data, i.e.
“facts”
■ Facts in a streaming table are immutable, which means new events can be
inserted into a table, but existing events can never be updated or deleted.
■ All the topics within a Pulsar namespace will automatically be mapped to
streaming tables in a catalog configured to use a pulsar connector.
■ Streaming tables can also be created or deleted via DDL queries, where the
underlying Pulsar topics will be created or deleted.
Sink to ScyllaDB
Data Offloaders
(Tiered Storage)
Client Libraries
StreamNative Pulsar ecosystem
hub.streamnative.io
Connectors
(Sources & Sinks)
Protocol Handlers
Pulsar Functions
(Lightweight Stream
Processing)
Processing Engines
… and more!
… and more!
ScyllaDB Compatible Sink Connector
pulsar.apache.org/docs/en/io-quickstart/
ScyllaDB Compatible Sink
David’s ScyllaDB Sink Update
Connector automatically interrogates the database to
determine the schema type definition at runtime.
Includes a framework that extracts the values from the generic
schema types (GenericRecord, and JSON String)
Performance improvements & Bug Fixes
PR: https://github.com/apache/pulsar/pull/16179
Building Real-Time Together
Easily integrate ScyllaDB’s fast distributed database system and the StreamNative streaming
platform with Pulsar sources and sinks via various protocols and libraries.
Create high-speed streaming pipelines for log analysis, IoT sensing, and more.
Reference Architecture
Reference Architecture
Microservices
ETL
Reference Architecture
ScyllaDB <-> Apache Pulsar -> Flink SQL -> Apache Pulsar -> Apache Pinot
Flink SQL for Continuous Analytics, Ingest, Real-time Joins, Real-Time Analytics
Apache Pulsar for routing, transformation, central messaging hub, data channels
ScyllaDB for instant applications and massive data feeds
Apache Pinot for low latency instant result queries
Thank You
Stay in Touch
Timothy Spann
tim@streamnative.io
@PaaSDev
https://github.com/tspannhw
https://www.linkedin.com/in/timothyspann/
David Kjerrumgaard
david@streamnative.io
@DavidKjerrumga1
https://github.com/david-streamlio
https://www.linkedin.com/in/davidkj/

Contenu connexe

Similaire à Sink Your Teeth into Streaming at Any Scale

[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 

Similaire à Sink Your Teeth into Streaming at Any Scale (20)

Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
 
The Next Generation of Streaming
The Next Generation of StreamingThe Next Generation of Streaming
The Next Generation of Streaming
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
 

Plus de ScyllaDB

Plus de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Sink Your Teeth into Streaming at Any Scale

  • 1. Sink Your Teeth into Streaming at Any Scale David Kjerrumgaard, Developer Advocate at Stream Native Tim Spann, Developer Advocate at Stream Native
  • 2. David Kjerrumgaard ■ Apache Pulsar Committer | Author of Pulsar In Action ■ Former Principal Software Engineer on Splunk’s messaging team that is responsible for Splunk’s internal Pulsar-as-a-Service platform. ■ Former Director of Solution Architecture at Streamlio. Your photo goes here, smile :)
  • 3. Tim Spann ■ FLiP(N) Stack = Flink, Pulsar and NiFi Stack ■ Streaming Systems & Data Architecture Expert ■ DZone Iot and ML Top Expert ■ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ■ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 4. ■ The Team to Stream ■ What is Apache Pulsar? ■ Join The Streams with Flink SQL ■ Fast Sink to ScyllaDB ■ Reference Architecture ■ Demo Presentation Agenda
  • 8. Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform.
  • 9. CREATED Originally developed inside Yahoo! as Cloud Messaging Service GROWTH 10x Contributors 10MM+ Downloads Ecosystem Expands Kafka on Pulsar AMQ on Pulsar Functions . . . 2012 2016 2018 TODAY APACHE TLP Pulsar becomes Apache top level project. OPEN SOURCE Pulsar committed to open source. Apache Pulsar Timeline
  • 10. Pulsar Growth Contributors 11X Growth Github Stars 5X Growth 10,000,000+ Docker Images Downloads
  • 11. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging
  • 12. Multi-Tenancy Model Tenants (Data Services) Namespace (Microservices) Pulsar Cluster Tenants (Marketing) Tenants (Compliance) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Namespace (Risk Assessment) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-2 (Risk Detection)
  • 13. Integrated Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 14. Automatic Load Balancing Pulsar supports automatic load shedding whenever the system recognizes a particular broker is overloaded, the system forces some traffic to be reassigned to less-loaded brokers.
  • 15. Key Features Apache Pulsar Apache Kafka Message retention (time-based) ✔ ✔ Message replay ✔ ✔ Message retention (acknowledge-based) ✔ ✘ Built-in tiered storage ✔ ✘ Processing capabilities (fully managed) Pulsar Functions KStream Queue semantics (round robin) ✔ ✘ Queue semantics (key based) ✔ ✘ Dead letter queue ✔ ✘
  • 16. Key Features Apache Pulsar Apache Kafka Scheduled and delayed delivery ✔ ✔ Rebalance-free scaling ✔ ✔ Elastically scalable ✔ ✘ Maximum number of topics Millions Up to 100k Built-in multi-tenancy ✔ ✘ Built-in geo-replication ✔ ✘ Built-in schema management ✔ ✘ End-to-end encryption ✔ ✘
  • 18. Pulsar-Flink Connector ■ Pulsar provides a connector that enables Flink to read & write data directly from/to Pulsar topics. ■ This makes Pulsar Topic data available for Flink SQL queries. ■ https://hub.streamnative.io/data-processing/pulsar-flink/1.15.1.1/
  • 20. Apache Flink ■ Unified computing engine ■ Batch processing is a special case of stream processing ■ Stateful processing ■ Massive Scalability ■ Flink SQL for queries, inserts against Pulsar Topics ■ Streaming Analytics ■ Continuous SQL ■ Continuous ETL ■ Complex Event Processing ■ Standard SQL Powered by Apache Calcite Apache Flink
  • 21. Streaming Tables ■ A streaming table represents an unbounded sequence of structured data, i.e. “facts” ■ Facts in a streaming table are immutable, which means new events can be inserted into a table, but existing events can never be updated or deleted. ■ All the topics within a Pulsar namespace will automatically be mapped to streaming tables in a catalog configured to use a pulsar connector. ■ Streaming tables can also be created or deleted via DDL queries, where the underlying Pulsar topics will be created or deleted.
  • 23. Data Offloaders (Tiered Storage) Client Libraries StreamNative Pulsar ecosystem hub.streamnative.io Connectors (Sources & Sinks) Protocol Handlers Pulsar Functions (Lightweight Stream Processing) Processing Engines … and more! … and more!
  • 24. ScyllaDB Compatible Sink Connector pulsar.apache.org/docs/en/io-quickstart/ ScyllaDB Compatible Sink
  • 25. David’s ScyllaDB Sink Update Connector automatically interrogates the database to determine the schema type definition at runtime. Includes a framework that extracts the values from the generic schema types (GenericRecord, and JSON String) Performance improvements & Bug Fixes PR: https://github.com/apache/pulsar/pull/16179
  • 26. Building Real-Time Together Easily integrate ScyllaDB’s fast distributed database system and the StreamNative streaming platform with Pulsar sources and sinks via various protocols and libraries. Create high-speed streaming pipelines for log analysis, IoT sensing, and more.
  • 29. Reference Architecture ScyllaDB <-> Apache Pulsar -> Flink SQL -> Apache Pulsar -> Apache Pinot Flink SQL for Continuous Analytics, Ingest, Real-time Joins, Real-Time Analytics Apache Pulsar for routing, transformation, central messaging hub, data channels ScyllaDB for instant applications and massive data feeds Apache Pinot for low latency instant result queries
  • 30. Thank You Stay in Touch Timothy Spann tim@streamnative.io @PaaSDev https://github.com/tspannhw https://www.linkedin.com/in/timothyspann/ David Kjerrumgaard david@streamnative.io @DavidKjerrumga1 https://github.com/david-streamlio https://www.linkedin.com/in/davidkj/