FLiP Into Trino
FLiP into Trino. Flink Pulsar Trino
Pulsar SQL (Trino/Presto)
Remember the days when you could wait until your batch data load was done and then you could run some simple queries or build stale dashboards? Those days are over, today you need instant analytics as the data is streaming in real-time. You need universal analytics where that data is. I will show you how to do this utilizing the latest cloud native open source tools. In this talk we will utilize Trino, Apache Pulsar, Pulsar SQL and Apache Flink to analyze instantly data from IoT, sensors, transportation systems, Logs, REST endpoints, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach how to use Pulsar SQL to run analytics on live data.
Tim Spann
Developer Advocate
StreamNative
David Kjerrumgaard
Developer Advocate
StreamNative
https://www.starburst.io/info/trinosummit/
https://github.com/tspannhw/FLiP-Into-Trino/blob/main/README.md
https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/tree/main/src/main/java
select * from pulsar."public/default"."weather";
Apache Pulsar plus Trio = fast analytics at scale
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
FLiP Into Trino
1. FLiP Into Trino
Tim Spann | Developer Advocate
David Kjerrumgaard | Developer Advocate
2. streamnative.io
Founded by the original developers of
Apache Pulsar and Apache BookKeeper,
StreamNative builds a cloud-native event
streaming platform that enables
enterprises to easily access data as
real-time event streams.
9. Apache Pulsar
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent Storage
● Pulsar Connectors
● REST API
● CLI
● Many clients available
● Four Different Subscription
Types
● Multi-Protocol Support
○ MQTT
○ AMQP
○ JMS
○ Kafka
○ ...
10. ● “Bookies”
● Stores messages and
cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing
and connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
12. Reader and
Batch API
Pulsar
IO/Connectors
Stream Processor
Applications
Prebuilt Connectors Custom Connectors
Microservices or
Event-Driven Architecture
Pub/Sub
API
Publisher
Subscriber
Admin
API
Operators &
Administrators
Teams
Tenant
Pulsar API
Design
13. Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover -
guaranteed order, single
active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given
key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
14. Unified
Messaging Model
Streaming
Messaging
Producer 1
Producer 2
Pulsar
Topic/Partition
m0
m1
m2
m3
m4
Consumer D-1
Consumer D-2
Consumer D-3
Subscription D
<
k
2
,
v
1
>
<
k
2
,
v
3
>
<k3,v2>
<
k
1
,
v
0
>
<
k
1
,
v
4
>
Key-Shared
Consumer C-1
Consumer C-2
Consumer C-3
Subscription C
m1
m2
m3
m4
m0
Shared
Failover
Consumer B-1
Consumer B-0
Subscription B
m1
m2
m3
m4
m0
In case of failure in
Consumer B-0
Consumer A-1
Consumer A-0
Subscription A
m1
m2
m3
m4
m0
Exclusive
X
21. Running a Pulsar SQL Query
● To run a query, you need to start Pulsar SQL with:
$ pulsar sql
● All queries must:
○ Be terminated with a ;
○ Use single quotes (') for strings
○ If you run a query with many results, Pulsar SQL will show a list
● Exit out by typing q
○ Scroll through results with the up and down arrows or page up and
page down keys
● Queries can be run using Presto's/Trino’s REST API
○ Query results are returned as JSON
22. Viewing Topics with Pulsar SQL
● Show available namespaces
SHOW schemas IN pulsar;
● Show topics in a namespace
SHOW tables IN pulsar."public/default";
● Show schema in a topic
SHOW columns IN pulsar."public/default".mytopic;
23. Supported SQL Syntax
SELECT card, suit FROM cards;
SELECT * FROM cards WHERE suit = "Spade";
SELECT * FROM cards WHERE card LIKE "1%";
SELECT * FROM cards WHERE suit = "Spade" AND card = "1";
SELECT * FROM cards LIMIT 10;
SELECT * FROM cards WHERE suit = "Spade" LIMIT 10;
SELECT suit, COUNT(card) FROM cards GROUP BY suit;
SELECT suit, card FROM cards ORDER BY suit, card;
24. Defining Schemas
To execute a query, Pulsar SQL needs to know the schema.
● Schemas are accessible from the Broker and stored in BookKeeper.
● Pulsar SQL needs to know:
○ Name of the column
○ Type of the column
○ Nullability of the column
● Pulsar SQL currently supports Avro and JSON for automatic schema
detection.
25. Use cases for Pulsar SQL
● Pulsar SQL is a useful tool for answering questions about data in your
streams, such as basic analytics or searching for specific data.
● Pulsar SQL is not intended for high throughput queries or for running
“continuous” queries that update as new records are added.
https://pulsar.apache.org/docs/en/sql-rest-api/
28. A cloud-native, real-time
messaging and streaming
platform to support
multi-cloud and hybrid
cloud strategies.
Powered
by Pulsar
Built for
Containers
Flink SQL
Cloud
Native
31. USE CASE
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vendor devices
creates data ingestion
challenges.
Other Sources: Transit data,
news, twitter, status feeds,
REST data, stock data and
more.
41. Let’s Keep
in Touch!
Tim Spann
Developer Advocate
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
42. Connect with the
Community
& Stay
Up-To-Date
● Join the Pulsar Slack channel -
Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar
on Twitter
● Subscribe to Monthly Pulsar Newsletter
for major news, events, project updates,
and resources in the Pulsar community