SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
© 2020 SPLUNK INC.
Interactive Querying of
Streams Using Apache
Pulsar™
Jerry Peng
Pulsar Summit | June 2020
Principal Software Engineer | jerryp@splunk.com
Apache {Pulsar, Heron, Storm} committer and PMC member
© 2020 SPLUNK INC.
Agenda 1) General use cases
2) Existing architectures
3) Apache Pulsar overview
4) Pulsar SQL
5) Concrete use case (Zhaoping.com)
6) Demo!
7) Questions?
© 2020 SPLUNK INC.
What are Streams?
Continuous flows of data…
Almost all data originate in this form
© 2020 SPLUNK INC.
Interactive Querying of Streams?
Querying both latest and historical data
© 2020 SPLUNK INC.
How is it useful?
● Speed (i.e. data-driven processing)
○ Act faster
● Accuracy
○ In many contexts the wrong decision may be made if you do not have visibility
that includes the most current data
○ For example, historical data is useful to predict a user is interested in buying a
particular item, but if my analytics don’t also know that the user just purchased
that item two minutes ago they’re going to make the wrong recommendation
● Simplification
○ Single place to go to access current and historical data
© 2020 SPLUNK INC.
Debugging
● Errors and Exception
● Troubleshooting systems and
networks
● Have we seen these errors before?
General use cases
© 2020 SPLUNK INC.
Monitoring (Audit logs)
● Answering the “What, When, Who,
Why”
● Suspicious access patterns
● Example
○ Auditing CDC logs in financial institutions
General use cases
© 2020 SPLUNK INC.
Exploring
● Raw or enriched data
● Really simplifies access if data is all in
one location
General use cases
© 2020 SPLUNK INC.
Lots of use cases
● Data analytics
● Business Intelligence
● Real-time dashboards
● etc…
General use cases
© 2020 SPLUNK INC.
Stream processing patterns
ComputeMessaging
Storage
Data Ingestion Data Processing / Querying
Results StorageData Storage
Data
Serving
© 2020 SPLUNK INC.
Existing Solutions
HDFS
Messaging Real-time compute
Storage
Data Stream
Querying
Cloud
Storage
Apache Hadoop MR, Apache Spark, Presto, etc.
Cloud
Pub/Sub Apache Storm, Apache Flink, Apache Heron, etc.
© 2020 SPLUNK INC.
Problems with existing solutions
● Multiple Systems
● Duplication of data
○ Data consistency. Where is the source of truth?
● Latency between data ingestion and when data is queryable
© 2020 SPLUNK INC.
THIS IS WHERE APACHE PULSAR
AND PULSAR SQL COMES IN…
© 2020 SPLUNK INC.
Apache Pulsar™
Flexible Messaging + Streaming System
backed by a durable log storage
© 2020 SPLUNK INC.
Apache Pulsar as a Event Store
1
5
© 2020 SPLUNK INC.
Apache Pulsar Overview
© 2020 SPLUNK INC.
Architecture
Multi-layer, scalable architecture
Independent layers for processing, serving and storage
Messaging and processing built on Apache Pulsar
Storage built on Apache BookKeeper
Consumer
Producer
Producer
Producer
Consumer
Consumer
Consumer
Messaging
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Event storage
Function Processing
WorkerWorker
© 2020 SPLUNK INC.
Segment Centric
Storage
● In addition to partitioning, messages are
stored in segments (based on time and
size)
● Segments are independent from each
others and spread across all storage
nodes
● What this means for Pulsar SQL?
○ Allows SQL engine to read multiple
bookies and leverage disk I/O and
bandwidth of multiple machines
even if the data is in one partition
© 2020 SPLUNK INC.
Writes
● Every segment/ledger has an ensemble
● Each entry in ledger has a
○ Write quorum
■ Nodes of the ensemble to which it is written (usually all)
○ Ack quorum
■ Nodes of the write quorum that must respond for that
entry to be acknowledged (usually a majority)
● What this means for Pulsar SQL?
○ Allows users to configure the number of
replicas SQL engine can read from
○ Trade off between read bandwidth and
storage cost
© 2020 SPLUNK INC.
Apache Bookkeeper™ Internals
● Separate IO path for reads and writes
● Optimized for writing, tailing reads,
catch-up reads
● What this means for Pulsar SQL?
○ Queries often involving scanning
the data.
○ Read-a-head cache in BK allows
for fast sequential reads
© 2020 SPLUNK INC.
Tiered Storage
Unlimited topic storage capacity
Achieves the true “stream-storage”:
keep the raw data forever in stream
form
© 2020 SPLUNK INC.
Tiered Storage
● Leverage cloud storage services to offload cold data — Completely transparent to clients
● Extremely cost effective — Backends (S3) (Coming GCS, HDFS)
● Example: Retain all data for 1 month — Offload all messages older than 1 day to S3
● What this means for Pulsar SQL?
○ Pulsar SQL can query not only data in store in Bookies but also offloaded
into a cloud storage service
2
2
© 2020 SPLUNK INC.
Schema Registry
● Store information on the data structure —
Stored in BookKeeper
● Enforce data types on topic
● Allow for compatible schema evolutions
● JSON, Avro, and Protobuf supported
● What this means for Pulsar SQL?
○ Allows data to be structured so
that it becomes queryable by a
SQL language
© 2020 SPLUNK INC.
Pulsar SQL
Interactive SQL queries over data stored in Pulsar
Query old and real-time data
2
4
© 2020 SPLUNK INC.
Pulsar SQL / 2
● Based on Presto by Facebook — https://prestodb.io/
● Presto is a distributed query execution engine
● Fetches the data from multiple sources (HDFS, S3, MySQL, …)
● Full SQL compatibility
2
5
© 2020 SPLUNK INC.
Pulsar SQL / 3
● Pulsar connector for Presto
○ Read data directly from BookKeeper — bypass Pulsar Broker
■ Can also read data offloaded to Tiered Storage (S3, GCS, etc.)
○ Many-to-many data reads
■ Data is split even on a single partition — multiple workers can read data in
parallel from single Pulsar partition
■ Time based indexing — Use “publishTime” in predicates to reduce data
being read from disk
2
6
© 2020 SPLUNK INC.
Pulsar SQL Architecture
© 2020 SPLUNK INC.
Benefits
● Do not need to move data into another
system for querying
● Read data in parallel
○ Performance not impacted by
partitioning
○ Increase throughput by increasing
write quorum
● Newly arrived data able to be queried
immediately
© 2020 SPLUNK INC.
Compared to other message buses?
● Other messaging platforms have Presto integrations
● Typically uses a consumer to read data from brokers
● Topic/partition served by a single broker (limiting disk IO and
network bandwidth)
© 2020 SPLUNK INC.
User interaction
Connect with CLI client
$./bin/pulsar sql
List Pulsar cluster
presto> show catalogs;
Catalog
---------
pulsar
system
(2 rows)
List Pulsar namespaces
presto> show schemas in pulsar;
Schema
-----------------------
information_schema
public/default
public/functions
sample/standalone/ns1
List Pulsar topics
presto> show tables in pulsar."public/default";
Table
----------------
generator_test
(1 row)
Pulsar SQL
© 2020 SPLUNK INC.
User interaction
Query data in topic
presto> select * from pulsar."public/default".generator_test;
firstname | middlename | lastname | email | username | password | telephonenumber | age | companyemail |
-------------+-------------+-------------+----------------------------------+--------------+----------+-----------------+-----+-------------------------------------+
Genesis | Katherine | Wiley | genesis.wiley@gmail.com | genesisw | y9D2dtU3 | 959-197-1860 | 71 | genesis.wiley@interdemconsulting.eu |
Brayden | | Stanton | brayden.stanton@yahoo.com | braydens | ZnjmhXik | 220-027-867 | 81 | brayden.stanton@supermemo.eu |
Benjamin | Julian | Velasquez | benjamin.velasquez@yahoo.com | benjaminv | 8Bc7m3eb | 298-377-0062 | 21 | benjamin.velasquez@hostesltd.biz |
Michael | Thomas | Donovan | donovan@mail.com | michaeld | OqBm9MLs | 078-134-4685 | 55 | michael.donovan@memortech.eu |
Brooklyn | Avery | Roach | brooklynroach@yahoo.com | broach | IxtBLafO | 387-786-2998 | 68 | brooklyn.roach@warst.biz |
Skylar | | Bradshaw | skylarbradshaw@yahoo.com | skylarb | p6eC6cKy | 210-872-608 | 96 | skylar.bradshaw@flyhigh.eu |
.
.
.
Pulsar SQL
© 2020 SPLUNK INC.
Demo
© 2020 SPLUNK INC.
Performance
Setup
• 3 Nodes
• 12 CPU cores
• 128 GB RAM
• 2 X 1.2 TB NVMe disks
Results
• JSON (Compressed)
• ~60 Millions Rows / Second
• Avro (Compressed)
• ~50 Million Rows / Second
© 2020 SPLUNK INC.
Improving query efficiency
● Query by partition
○ Scanning a large amounts of data may be costly and time-consuming
○ If the data is keyed and hashed to a specific partition, you can simply query the specify partition
○ For example ,if you have tweets keyed by author ingested in Pulsar
SELECT tweet.author, tweet.content
WHERE tweet.author = “jerry” AND __partition__ = 1
FROM pulsar.”public/default”.tweets
● Query by publish time
○ Ledgers/segments are naturally sorted by publish time
○ Only data within publish time will be read
○ Select a range of publish times to minimize the data that needs to be read
SELECT tweet.author, tweet.content
WHERE tweet.author = “jerry” AND __partition__ = 1 AND __publish_time__ > timestamp '2020-06-15 09:00:00'
FROM pulsar.”public/default”.tweets
© 2020 SPLUNK INC.
Case study: Job search analytics at
zhaopin.com
© 2020 SPLUNK INC.
Background
● About ZhaoPin
○ Chinese job search website (Linkedin, Indeed, etc)
● Background
○ ZhaoPin is already a heavy user of Apache Pulsar
○ Using Pulsar to power their enterprise event bus
■ Data involving job position searches, job posts, and resume searches
Case study: Job search analytics at zhaopin.com
Source: https://streamnative.io/blog/tech/2020-05-07-zhaopin-tech-blog/
© 2020 SPLUNK INC.
● Debugging search results
○ “When the search results do not meet expectations…”
● Analyzing and improving search results
○ “Analyze the search criteria associated with a position that a job
seeker applied for, such as when the position was first exposed
to that user, in order to improve the search service.”
● Analyzing search logs
○ “Analyze search logs from different perspectives and generate
charts that summarize data in different ways, such as by city,
vocation, or keyword ranking. In this way, the search service can
be improved by making it more specific.”
Use cases
Case study: Job search analytics at zhaopin.com
Source: https://streamnative.io/blog/tech/2020-05-07-zhaopin-tech-blog/
© 2020 SPLUNK INC.
● ZhaoPin already using Pulsar
● Pulsar SQL allows queries using SQL syntax
● Pulsar SQL can save a large amount of data and is
easy to scale up
Why Pulsar SQL
Case study: Job search analytics at zhaopin.com
Source: https://streamnative.io/blog/tech/2020-05-07-zhaopin-tech-blog/
© 2020 SPLUNK INC.
Quick Start guide:
https://pulsar.apache.org/docs/en/sql-getting-started/
How to get started?
© 2020 SPLUNK INC.
● Performance tuning
● Store data in columnar format
○ Improve compression ratio
○ Materialize relevant columns
● Support different indices
Future work
© 2020 SPLUNK INC.
Questions?
Email: jerryp@splunk
4
1

Contenu connexe

Tendances

Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanStreamNative
 
Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...StreamNative
 
Serverless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar FunctionsServerless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar FunctionsStreamNative
 
Building a FaaS with pulsar
Building a FaaS with pulsarBuilding a FaaS with pulsar
Building a FaaS with pulsarStreamNative
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache PulsarStreamNative
 
Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)StreamNative
 
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...StreamNative
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...JinfengHuang3
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...StreamNative
 
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021StreamNative
 
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...StreamNative
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...StreamNative
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructuremattlieber
 
Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayKarthik Ramasamy
 
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...StreamNative
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...StreamNative
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkStreamNative
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021StreamNative
 

Tendances (20)

Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...
 
Serverless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar FunctionsServerless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar Functions
 
Building a FaaS with pulsar
Building a FaaS with pulsarBuilding a FaaS with pulsar
Building a FaaS with pulsar
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache Pulsar
 
Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)
 
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
 
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
 
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/day
 
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
 

Similaire à Interactive querying of streams using Apache Pulsar_Jerry peng

PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesTomas Moser
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixStitch Fix Algorithms
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemDatabricks
 
Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013Ivan Sanders
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...Marcin Bielak
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
How Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IOHow Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IOStreamNative
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSVMware Tanzu
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSCarlos Andrés García
 

Similaire à Interactive querying of streams using Apache Pulsar_Jerry peng (20)

PSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best PracticesPSUG 1 - 2024-01-22 - Onboarding Best Practices
PSUG 1 - 2024-01-22 - Onboarding Best Practices
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
 
Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
How Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IOHow Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IO
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
 

Plus de StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 

Plus de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Dernier

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Dernier (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Interactive querying of streams using Apache Pulsar_Jerry peng

  • 1. © 2020 SPLUNK INC. Interactive Querying of Streams Using Apache Pulsar™ Jerry Peng Pulsar Summit | June 2020 Principal Software Engineer | jerryp@splunk.com Apache {Pulsar, Heron, Storm} committer and PMC member
  • 2. © 2020 SPLUNK INC. Agenda 1) General use cases 2) Existing architectures 3) Apache Pulsar overview 4) Pulsar SQL 5) Concrete use case (Zhaoping.com) 6) Demo! 7) Questions?
  • 3. © 2020 SPLUNK INC. What are Streams? Continuous flows of data… Almost all data originate in this form
  • 4. © 2020 SPLUNK INC. Interactive Querying of Streams? Querying both latest and historical data
  • 5. © 2020 SPLUNK INC. How is it useful? ● Speed (i.e. data-driven processing) ○ Act faster ● Accuracy ○ In many contexts the wrong decision may be made if you do not have visibility that includes the most current data ○ For example, historical data is useful to predict a user is interested in buying a particular item, but if my analytics don’t also know that the user just purchased that item two minutes ago they’re going to make the wrong recommendation ● Simplification ○ Single place to go to access current and historical data
  • 6. © 2020 SPLUNK INC. Debugging ● Errors and Exception ● Troubleshooting systems and networks ● Have we seen these errors before? General use cases
  • 7. © 2020 SPLUNK INC. Monitoring (Audit logs) ● Answering the “What, When, Who, Why” ● Suspicious access patterns ● Example ○ Auditing CDC logs in financial institutions General use cases
  • 8. © 2020 SPLUNK INC. Exploring ● Raw or enriched data ● Really simplifies access if data is all in one location General use cases
  • 9. © 2020 SPLUNK INC. Lots of use cases ● Data analytics ● Business Intelligence ● Real-time dashboards ● etc… General use cases
  • 10. © 2020 SPLUNK INC. Stream processing patterns ComputeMessaging Storage Data Ingestion Data Processing / Querying Results StorageData Storage Data Serving
  • 11. © 2020 SPLUNK INC. Existing Solutions HDFS Messaging Real-time compute Storage Data Stream Querying Cloud Storage Apache Hadoop MR, Apache Spark, Presto, etc. Cloud Pub/Sub Apache Storm, Apache Flink, Apache Heron, etc.
  • 12. © 2020 SPLUNK INC. Problems with existing solutions ● Multiple Systems ● Duplication of data ○ Data consistency. Where is the source of truth? ● Latency between data ingestion and when data is queryable
  • 13. © 2020 SPLUNK INC. THIS IS WHERE APACHE PULSAR AND PULSAR SQL COMES IN…
  • 14. © 2020 SPLUNK INC. Apache Pulsar™ Flexible Messaging + Streaming System backed by a durable log storage
  • 15. © 2020 SPLUNK INC. Apache Pulsar as a Event Store 1 5
  • 16. © 2020 SPLUNK INC. Apache Pulsar Overview
  • 17. © 2020 SPLUNK INC. Architecture Multi-layer, scalable architecture Independent layers for processing, serving and storage Messaging and processing built on Apache Pulsar Storage built on Apache BookKeeper Consumer Producer Producer Producer Consumer Consumer Consumer Messaging Broker Broker Broker Bookie Bookie Bookie Bookie Bookie Event storage Function Processing WorkerWorker
  • 18. © 2020 SPLUNK INC. Segment Centric Storage ● In addition to partitioning, messages are stored in segments (based on time and size) ● Segments are independent from each others and spread across all storage nodes ● What this means for Pulsar SQL? ○ Allows SQL engine to read multiple bookies and leverage disk I/O and bandwidth of multiple machines even if the data is in one partition
  • 19. © 2020 SPLUNK INC. Writes ● Every segment/ledger has an ensemble ● Each entry in ledger has a ○ Write quorum ■ Nodes of the ensemble to which it is written (usually all) ○ Ack quorum ■ Nodes of the write quorum that must respond for that entry to be acknowledged (usually a majority) ● What this means for Pulsar SQL? ○ Allows users to configure the number of replicas SQL engine can read from ○ Trade off between read bandwidth and storage cost
  • 20. © 2020 SPLUNK INC. Apache Bookkeeper™ Internals ● Separate IO path for reads and writes ● Optimized for writing, tailing reads, catch-up reads ● What this means for Pulsar SQL? ○ Queries often involving scanning the data. ○ Read-a-head cache in BK allows for fast sequential reads
  • 21. © 2020 SPLUNK INC. Tiered Storage Unlimited topic storage capacity Achieves the true “stream-storage”: keep the raw data forever in stream form
  • 22. © 2020 SPLUNK INC. Tiered Storage ● Leverage cloud storage services to offload cold data — Completely transparent to clients ● Extremely cost effective — Backends (S3) (Coming GCS, HDFS) ● Example: Retain all data for 1 month — Offload all messages older than 1 day to S3 ● What this means for Pulsar SQL? ○ Pulsar SQL can query not only data in store in Bookies but also offloaded into a cloud storage service 2 2
  • 23. © 2020 SPLUNK INC. Schema Registry ● Store information on the data structure — Stored in BookKeeper ● Enforce data types on topic ● Allow for compatible schema evolutions ● JSON, Avro, and Protobuf supported ● What this means for Pulsar SQL? ○ Allows data to be structured so that it becomes queryable by a SQL language
  • 24. © 2020 SPLUNK INC. Pulsar SQL Interactive SQL queries over data stored in Pulsar Query old and real-time data 2 4
  • 25. © 2020 SPLUNK INC. Pulsar SQL / 2 ● Based on Presto by Facebook — https://prestodb.io/ ● Presto is a distributed query execution engine ● Fetches the data from multiple sources (HDFS, S3, MySQL, …) ● Full SQL compatibility 2 5
  • 26. © 2020 SPLUNK INC. Pulsar SQL / 3 ● Pulsar connector for Presto ○ Read data directly from BookKeeper — bypass Pulsar Broker ■ Can also read data offloaded to Tiered Storage (S3, GCS, etc.) ○ Many-to-many data reads ■ Data is split even on a single partition — multiple workers can read data in parallel from single Pulsar partition ■ Time based indexing — Use “publishTime” in predicates to reduce data being read from disk 2 6
  • 27. © 2020 SPLUNK INC. Pulsar SQL Architecture
  • 28. © 2020 SPLUNK INC. Benefits ● Do not need to move data into another system for querying ● Read data in parallel ○ Performance not impacted by partitioning ○ Increase throughput by increasing write quorum ● Newly arrived data able to be queried immediately
  • 29. © 2020 SPLUNK INC. Compared to other message buses? ● Other messaging platforms have Presto integrations ● Typically uses a consumer to read data from brokers ● Topic/partition served by a single broker (limiting disk IO and network bandwidth)
  • 30. © 2020 SPLUNK INC. User interaction Connect with CLI client $./bin/pulsar sql List Pulsar cluster presto> show catalogs; Catalog --------- pulsar system (2 rows) List Pulsar namespaces presto> show schemas in pulsar; Schema ----------------------- information_schema public/default public/functions sample/standalone/ns1 List Pulsar topics presto> show tables in pulsar."public/default"; Table ---------------- generator_test (1 row) Pulsar SQL
  • 31. © 2020 SPLUNK INC. User interaction Query data in topic presto> select * from pulsar."public/default".generator_test; firstname | middlename | lastname | email | username | password | telephonenumber | age | companyemail | -------------+-------------+-------------+----------------------------------+--------------+----------+-----------------+-----+-------------------------------------+ Genesis | Katherine | Wiley | genesis.wiley@gmail.com | genesisw | y9D2dtU3 | 959-197-1860 | 71 | genesis.wiley@interdemconsulting.eu | Brayden | | Stanton | brayden.stanton@yahoo.com | braydens | ZnjmhXik | 220-027-867 | 81 | brayden.stanton@supermemo.eu | Benjamin | Julian | Velasquez | benjamin.velasquez@yahoo.com | benjaminv | 8Bc7m3eb | 298-377-0062 | 21 | benjamin.velasquez@hostesltd.biz | Michael | Thomas | Donovan | donovan@mail.com | michaeld | OqBm9MLs | 078-134-4685 | 55 | michael.donovan@memortech.eu | Brooklyn | Avery | Roach | brooklynroach@yahoo.com | broach | IxtBLafO | 387-786-2998 | 68 | brooklyn.roach@warst.biz | Skylar | | Bradshaw | skylarbradshaw@yahoo.com | skylarb | p6eC6cKy | 210-872-608 | 96 | skylar.bradshaw@flyhigh.eu | . . . Pulsar SQL
  • 32. © 2020 SPLUNK INC. Demo
  • 33. © 2020 SPLUNK INC. Performance Setup • 3 Nodes • 12 CPU cores • 128 GB RAM • 2 X 1.2 TB NVMe disks Results • JSON (Compressed) • ~60 Millions Rows / Second • Avro (Compressed) • ~50 Million Rows / Second
  • 34. © 2020 SPLUNK INC. Improving query efficiency ● Query by partition ○ Scanning a large amounts of data may be costly and time-consuming ○ If the data is keyed and hashed to a specific partition, you can simply query the specify partition ○ For example ,if you have tweets keyed by author ingested in Pulsar SELECT tweet.author, tweet.content WHERE tweet.author = “jerry” AND __partition__ = 1 FROM pulsar.”public/default”.tweets ● Query by publish time ○ Ledgers/segments are naturally sorted by publish time ○ Only data within publish time will be read ○ Select a range of publish times to minimize the data that needs to be read SELECT tweet.author, tweet.content WHERE tweet.author = “jerry” AND __partition__ = 1 AND __publish_time__ > timestamp '2020-06-15 09:00:00' FROM pulsar.”public/default”.tweets
  • 35. © 2020 SPLUNK INC. Case study: Job search analytics at zhaopin.com
  • 36. © 2020 SPLUNK INC. Background ● About ZhaoPin ○ Chinese job search website (Linkedin, Indeed, etc) ● Background ○ ZhaoPin is already a heavy user of Apache Pulsar ○ Using Pulsar to power their enterprise event bus ■ Data involving job position searches, job posts, and resume searches Case study: Job search analytics at zhaopin.com Source: https://streamnative.io/blog/tech/2020-05-07-zhaopin-tech-blog/
  • 37. © 2020 SPLUNK INC. ● Debugging search results ○ “When the search results do not meet expectations…” ● Analyzing and improving search results ○ “Analyze the search criteria associated with a position that a job seeker applied for, such as when the position was first exposed to that user, in order to improve the search service.” ● Analyzing search logs ○ “Analyze search logs from different perspectives and generate charts that summarize data in different ways, such as by city, vocation, or keyword ranking. In this way, the search service can be improved by making it more specific.” Use cases Case study: Job search analytics at zhaopin.com Source: https://streamnative.io/blog/tech/2020-05-07-zhaopin-tech-blog/
  • 38. © 2020 SPLUNK INC. ● ZhaoPin already using Pulsar ● Pulsar SQL allows queries using SQL syntax ● Pulsar SQL can save a large amount of data and is easy to scale up Why Pulsar SQL Case study: Job search analytics at zhaopin.com Source: https://streamnative.io/blog/tech/2020-05-07-zhaopin-tech-blog/
  • 39. © 2020 SPLUNK INC. Quick Start guide: https://pulsar.apache.org/docs/en/sql-getting-started/ How to get started?
  • 40. © 2020 SPLUNK INC. ● Performance tuning ● Store data in columnar format ○ Improve compression ratio ○ Materialize relevant columns ● Support different indices Future work
  • 41. © 2020 SPLUNK INC. Questions? Email: jerryp@splunk 4 1