SlideShare a Scribd company logo
1 of 114
Download to read offline
© 2020 Ververica
1
© 2020 Ververica
2
● Caito Scherr
Introduction
© 2020 Ververica
3
● Caito Scherr
● Developer Advocate
Introduction
© 2020 Ververica
4
● Caito Scherr
● Developer Advocate
● Ververica, GmbH
Introduction
© 2020 Ververica
5
● Caito Scherr
● Developer Advocate
● Ververica, GmbH
Introduction
© 2020 Ververica
6
● Caito Scherr
● Developer Advocate
● Ververica, GmbH
● Portland, OR, USA
Introduction
© 2020 Ververica
7
Introduction
Introduction
© 2020 Ververica
8
● Credit: Marta Paes
Introduction
© 2020 Ververica
9
● Why Pulsar + Flink?
● Pulsar + Flink today
● Where SQL comes in
Agenda
© 2020 Ververica
10
● Why Pulsar + Flink?
● Pulsar + Flink today
● Where SQL comes in
Agenda
© 2020 Ververica
11
● Why Pulsar + Flink?
● Pulsar + Flink today
● Where SQL comes in
Agenda
© 2020 Ververica
12
Why Flink + Pulsar?
© 2020 Ververica
13
Flink + Pulsar Today
“Stream as a unified view on data”
“Batch as a special case of streaming”
© 2020 Ververica
14
Flink + Pulsar Today
Pulsar: Unified Storage
Unified Storage
(Segments / Pub/Sub)
“Stream as a unified view on data”
● Pub/Sub messaging layer (Streaming)
● Durable storage layer (Batch)
© 2020 Ververica
15
Why Flink + Pulsar?
Flink: Unified Processing Engine
Unified Processing Engine
(Batch / Streaming)
“Batch as a special case of streaming”
now
bounded query
unbounded query
past future
bounded query
start of the stream
unbounded query
● Reuse code and logic across batch and stream processing
● Ensure consistent semantics between processing modes
● Simplify operations
● Power applications mixing historic and real-time data
© 2020 Ververica
16
A Unified Data Stack
Unified Processing Engine
(Batch / Streaming)
Unified Storage
(Segments / Pub/Sub)
Why Flink + Pulsar?
© 2020 Ververica
17
Why Flink + Pulsar?
Flink is broad
Flink Runtime
Stateful Computations over Data Streams
Stateful
Stream Processing
Streams, State, Time
Event-Driven
Applications
Stateful Functions
Streaming
Analytics & ML
SQL, PyFlink, Tables
© 2020 Ververica
18
Why Flink + Pulsar?
Flink is broad
Flink Runtime
Stateful Computations over Data Streams
Stateful
Stream Processing
Streams, State, Time
Event-Driven
Applications
Stateful Functions
Streaming
Analytics & ML
SQL, PyFlink, Tables
© 2020 Ververica
19
Why Flink + Pulsar?
Use Cases
● Focus on business logic, not implementation
● Mixed workloads (batch and streaming)
● Maximize developer speed and autonomy
More high-level or domain-specific use cases can be modeled with SQL/Python and dynamic tables.
Examples
ML Feature Generation
Unified Online/Offline Model Training E2E Streaming Analytics Pipelines
© 2020 Ververica
20
Where SQL Fits
Flink SQL
SELECT user_id, COUNT(url) AS cnt
FROM clicks
GROUP BY user_id;
This is standard SQL (ANSI SQL)
“Everyone knows SQL, right?”
© 2020 Ververica
21
Where SQL Fits
Flink SQL
SELECT user_id, COUNT(url) AS cnt
FROM clicks
GROUP BY user_id;
This is standard SQL (ANSI SQL)
“Everyone knows SQL, right?”
also Flink SQL
© 2020 Ververica
22
Where SQL Fits
A Regular SQL Engine
user cnt
Mary 2
Bob 1
SELECT user_id,
COUNT(url) AS cnt
FROM clicks
GROUP BY user_id;
Take a snapshot when the
query starts
A final result is
produced
A row that was added after the query
was started is not considered
user cTime url
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
The query
terminates
© 2020 Ververica
23
Where SQL Fits
A Streaming SQL Engine
user cTime url
user cnt
SELECT user_id,
COUNT(url) AS cnt
FROM clicks
GROUP BY user_id;
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
Bob 1
Liz 1
Mary 1
Mary 2
Ingest all changes as
they happen
Continuously update the
result
The result is identical to the one-time query (at this point)
© 2020 Ververica
24
Where SQL Fits
Flink SQL in a Nutshell
● Standard SQL syntax and semantics (i.e. not a “SQL-flavor”)
● Unified APIs for batch and streaming
● Support for advanced time handling and operations (e.g. CDC, pattern matching)
UDF Support
Python
Java
Scala
Execution
TPC-DS Coverage
Batch
Streaming
+
Formats
Native Connectors
Apache Kafka
Elasticsearch
FileSystems
JDBC HBase
+
Kinesis
Metastore Postgres (JDBC)
Data Catalogs
Debezium
© 2020 Ververica
25
Flink + Pulsar Today
Pulsar Integration Over Time
Flink 1.6+
2018
Streaming Source/Sink Connectors
Table Sink Connector
© 2020 Ververica
26
Flink + Pulsar Today
Pulsar Integration Over Time
Flink 1.6+
2018
Streaming Source/Sink Connectors
Table Sink Connector
Pulsar Schema + Flink Catalog Integration
Table API/SQL as first-class citizens
Exactly-once Source
At-least once Sink
Flink 1.9+
© 2020 Ververica
27
Flink + Pulsar Today
Flink 1.12
Upserts (upsert-pulsar)
DDL Computed Columns, Watermarks and Metadata
End-to-end Exactly-once
Key-shared Subscription Model
Flink 1.6+
2018
Streaming Source/Sink Connectors
Table Sink Connector
Pulsar Integration Over Time
Pulsar Schema + Flink Catalog Integration
Table API/SQL as first-class citizens
Exactly-once Source
At-least once Sink
Flink 1.9+
© 2020 Ververica
28
Flink + Pulsar Today
Flink 1.12
Flink 1.6+
2018
Streaming Source/Sink Connectors
Table Sink Connector
Pulsar Integration Over Time
Pulsar Schema + Flink Catalog Integration
Table API/SQL as first-class citizens
Exactly-once Source
At-least once Sink
Flink 1.13 *
Apr/May’21
Contribution to the Apache Flink repository [FLINK-20726]
Port connector to the new, unified Source API (FLIP-27)
Flink 1.9+
Upserts (upsert-pulsar)
DDL Computed Columns, Watermarks and Metadata
End-to-end Exactly-once
Key-shared Subscription Model
© 2020 Ververica
29
Where SQL Fits
DEMO
1. Use the Twitter Firehose built-in connector to consume tweets about gardening 🌿 into a Pulsar topic (tweets).
© 2020 Ververica
30
Where SQL Fits
DEMO
DEMO
2. Start the Flink SQL Client and use a Pulsar catalog to access the topic directly as a table in Flink.
SQL Client
CREATE CATALOG pulsar WITH (
'type' = 'pulsar',
'service-url' = 'pulsar://pulsar:6650',
'admin-url' = 'http://pulsar:8080',
'format' = 'json'
);
Catalog DDL
© 2020 Ververica
31
Where SQL Fits
DEMO
DEMO
2.1. You can query the tweets topic off-the-bat using a simple SELECT statement — it’s treated as a Flink table!
SQL Client
© 2020 Ververica
32
Where SQL Fits
DEMO
DEMO
2.2. But then you find out that most Firehose events have a null createdTime. What now?
Not cool. 👹
SQL Client
© 2020 Ververica
33
DEMO
3. One way to get a relevant timestamp is to use Pulsar metadata to get the publishTime (i.e. ingestion time).
CREATE TABLE pulsar_tweets (
publishTime TIMESTAMP(3) METADATA,
WATERMARK FOR publishTime AS publishTime - INTERVAL '5' SECOND
) WITH (
'connector' = 'pulsar',
'topic' = 'persistent://public/default/tweets',
'value.format' = 'json',
'service-url' = 'pulsar://pulsar:6650',
'admin-url' = 'http://pulsar:8080',
'scan.startup.mode' = 'earliest-offset'
)
LIKE tweets;
Source Table DDL
Derive schema from the original topic
Define the source connector (Pulsar)
Read and use Pulsar message metadata
Where SQL Fits
© 2020 Ververica
34
Where SQL Fits
DEMO
4. Perform a simple windowed aggregation (count), and insert results into a new pulsar topic (tweets_agg).
CREATE TABLE pulsar_tweets_agg (
tmstmp TIMESTAMP(3),
tweet_cnt BIGINT
) WITH (
'connector'='pulsar',
'topic'='persistent://public/default/tweets_agg',
'value.format'='json',
'service-url'='pulsar://pulsar:6650',
'admin-url'='http://pulsar:8080'
);
Sink Table DDL
INSERT INTO pulsar_tweets_agg
SELECT TUMBLE_START(publishTime, INTERVAL '10' SECOND) AS
wStart,
COUNT(id) AS tweet_cnt
FROM pulsar_tweets
GROUP BY TUMBLE(publishTime, INTERVAL '10' SECOND);
Continuous SQL Query
© 2020 Ververica
35
Where SQL Fits
DEMO
5. We’ll get a count of the # of tweets in windows of 10 seconds (based on event time!).
© 2020 Ververica
36
Where SQL Fits
There’s a lot more to it!
Check out the Flink SQL Cookbook, where we share hands-on examples, patterns, and use cases for Flink SQL.
© 2020 Ververica
● YOW! Conference staff!!
● “Shared Language” photo models:
○ Zack Hobson, CTO
○ Cory Johannsen, Senior Software Engineer
37
Thank You!
@Caito_200_OK
Quoted Contributors & Reviewers:
● Mandy Riso - DevOps Engineer
● Eric Shamow - Engineering Manager
● Mike Hix - Senior Software Engineer
● Ben Ford - Product Manager
● Noçnica Fee - Developer Advocate
● Lucy Wyman - Senior Software Engineer
● Logan Ballard - Site Reliability Engineer
© 2020 Ververica
Resources
● Flink Ahead: What Comes After Batch & Streaming: https://youtu.be/h5OYmy9Yx7Y
● Apache Pulsar as one Storage System for Real Time & Historical Data Analysis:
https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c59017
● Flink Table API & SQL:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#ope
rations
● When Flink & Pulsar Come Together:
https://flink.apache.org/2019/05/03/pulsar-flink.html
● How to Query Pulsar Streams in Flink:
https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.ht
ml
● What’s New in the Flink/Pulsar Connector:
● https://flink.apache.org/2021/01/07/pulsar-flink-connector-270.html
● Marta’s Demo: https://github.com/morsapaes/flink-sql-pulsar
●
38
@Caito_200_OK
© 2020 Ververica
39
Where SQL Fits
© 2020 Ververica
40
© 2020 Ververica
© 2020 Ververica
42
Before We Start...
© 2020 Ververica
43
© 2020 Ververica
44
The Challenge
© 2020 Ververica
45
The Challenge
© 2020 Ververica
Metrics-Driven Metrics
© 2020 Ververica
47
Metrics-Driven Metrics
“You can measure almost anything,
but you cannot pay attention to
everything”
© 2020 Ververica
48
Metrics-Driven Metrics
“Any situation where people
create their own dashboards
without structure, quickly starts to
look like the cockpit of a 747”
© 2020 Ververica
49
Metrics-Driven Metrics
© 2020 Ververica
50
Metrics-Driven Metrics
© 2020 Ververica
51
Metrics-Driven Metrics
The dashboard should be:
• Meaningful
• Iterative
• Accessible
© 2020 Ververica
52
Metrics-Driven Metrics
The dashboard should reflect
your:
• Roadmap
• Highest Risk
• Most uncertain metrics
© 2020 Ververica
53
Metrics-Driven Metrics
© 2020 Ververica
54
Metrics-Driven Metrics
© 2020 Ververica
55
Metrics-Driven Metrics
© 2020 Ververica
56
Metrics-Driven Metrics
© 2020 Ververica
57
Metrics-Driven Metrics
© 2020 Ververica
58
Metrics-Driven Metrics
Prometheus
DataDog
© 2020 Ververica
59
Metrics-Driven Metrics
© 2020 Ververica
60
Metrics-Driven Metrics
© 2020 Ververica
61
Metrics-Driven Metrics
Highest Risk
© 2020 Ververica
62
Metrics-Driven Metrics
© 2020 Ververica
63
Metrics-Driven Metrics
Most Uncertain
© 2020 Ververica
64
Metrics-Driven Metrics
© 2020 Ververica
65
Metrics-Driven Metrics
Current
Roadmap
Priority
© 2020 Ververica
66
Metrics-Driven Metrics
Priority #2
© 2020 Ververica
67
Metrics-Driven Metrics
Priority #2
Current
Roadmap
Priority
Most Uncertain
Highest Risk
© 2020 Ververica
68
© 2020 Ververica
69
Shared Language
“Even the best metrics driven
development fails easily, when it’s
only built for yourself”
© 2020 Ververica
70
Shared Language
Implementation:
• Identify your impacted groups
• Identify the most effective
tools for each group
• Enable automation
© 2020 Ververica
71
Shared Language
© 2020 Ververica
72
Shared Language
© 2020 Ververica
73
Shared Language
© 2020 Ververica
74
Shared Language
© 2020 Ververica
75
Shared Language
© 2020 Ververica
76
Shared Language
© 2020 Ververica
77
Shared Language
© 2020 Ververica
78
Shared Language
© 2020 Ververica
79
Shared Language
© 2020 Ververica
80
Shared Language
© 2020 Ververica
81
Shared Language
© 2020 Ververica
82
Shared Language
© 2020 Ververica
83
Shared Language
© 2020 Ververica
84
© 2020 Ververica
85
Conclusion
For systems with complex integration points:
• Metrics-Driven Metrics: streamline technical
complexity
• Metrics as a Shared Language: streamline
interpersonal complications
© 2020 Ververica
86
Twitter
Caito_200_OK
Content
https://medium.com/@caito
http://caito-200-ok.com/
Email
Caito@ververica.com
Introduction
© 2020 Ververica
Credits - Images
● Flink logos:
https://wints.github.io/flink-web//community.html
● Data-Driven/Aware/Informed Design:
https://uxdesign.cc/becoming-a-data-aware-designer-1d7614ebc3ed
● Stream processing diagram:
https://www.ververica.com/what-is-stream-processing
● 747 Airplane cockpit:
https://www.reddit.com/r/pics/comments/5vv8qt/the_pilots_seat_and_cockpit_of_a_boein
g_747/
● DataDog Flink dashboard:
https://www.datadoghq.com/blog/monitor-apache-flink-with-datadog/
● All other photos & Images:
Caito Scherr
87
@Caito_200_OK
© 2020 Ververica
Credits
● Basic Data-Driven Development principles
https://www.portable.com.au/reports/principles-of-data-driven-design
● Data-driven design:
https://www.springboard.com/blog/data-driven-design/
● Becoming A Data-Aware Designer, + definition image - Illustration from “Designing
with Data” by King, Churchill, & Tan
https://uxdesign.cc/becoming-a-data-aware-designer-1d7614ebc3ed
● https://digitalprinciples.org/wp-content/uploads/PDD_Principle-BeDataDriven_v2.p
df
● Harvard Business Review - Data Driven Culture:
https://hbr.org/2020/02/10-steps-to-creating-a-data-driven-culture
88
@Caito_200_OK
© 2020 Ververica
© 2020 Ververica
90
© 2020 Ververica
Basic Principles
© 2020 Ververica
92
Basic Principles
Business
Metrics
Data-Driven
Development
Data-Driven
Design
© 2020 Ververica
93
Business
Metrics
Data-Driven
Development
Data-Driven
Design
Basic Principles
© 2020 Ververica
94
Business
Metrics
Data-Driven
Development
Data-Driven
Design
Basic Principles
© 2020 Ververica
95
Business
Metrics
Data-Driven
Development
Data-Driven
Design
Basic Principles
© 2020 Ververica
CONCLUSION
96
Audience Tactic Tools
You/Your Team • Dashboard should represent your priorities & values
• Bring Development + Operations together
Analytics platforms
• Flink UI, Ververica Platform
• Data Dog
• New Relic
• Grafana/Prometheus...
Upstream + Downstream
Teams
• Understand each other’s “normal”
• Clear ownership for integration points
Analytics + chat integrations
• Slack + PagerDuty
• Slack + Jenkins...
Leadership +
Stakeholders
• Business metrics
• Pivot between summary & depth
• Use tools they’re familiar with
Automated, non-engineering spaces
• URL endpoints
• Internal wiki/blog
(with embedded, automated output)
Customers • More manual approach Manual approach
• Your company’s customer-facing teams
© 2020 Ververica
BASIC PRINCIPLES
97
Quantitative Qualitative
@Caito_200_OK
© 2020 Ververica
BASIC PRINCIPLES
98
Quantitative Qualitative
Data-Driven Design
@Caito_200_OK
© 2020 Ververica
BASIC PRINCIPLES
99
Quantitative Qualitative
Data-Driven Design Data-Informed Design
@Caito_200_OK
© 2020 Ververica
BASIC PRINCIPLES
100
Quantitative Qualitative
Data-Driven Design Data-Aware Design
Data-Informed Design
@Caito_200_OK
© 2020 Ververica
BASIC PRINCIPLES
101
@Caito_200_OK
© 2020 Ververica
THE SHARED LANGUAGE OF METRICS
© 2020 Ververica
A SHARED LANGUAGE
103
Even the best data-driven development
fails easily when it’s only built for yourself.
@Caito_200_OK
© 2020 Ververica
Metrics as a Shared Language
104
Engineering Team Most
Technical
Most
Abstract
Up + Downstream Teams
Leadership/Stakeholders
Customers
@Caito_200_OK
© 2020 Ververica
Metrics as a Shared Language
105
Engineering Team Most
Technical
Most
Abstract
Up + Downstream Teams
Leadership/Stakeholders
Customers
@Caito_200_OK
© 2020 Ververica
Metrics as a Shared Language
106
Engineering Team Most
Technical
Most
Abstract
Up + Downstream Teams
Leadership/Stakeholders
Customers
@Caito_200_OK
© 2020 Ververica
A SHARED LANGUAGE
107
@Caito_200_OK
© 2020 Ververica
A SHARED LANGUAGE
108
@Caito_200_OK
© 2020 Ververica
A SHARED LANGUAGE
109
@Caito_200_OK
© 2020 Ververica
Metrics as a Shared Language
110
Engineering Team Most
Technical
Most
Abstract
Up + Downstream Teams
Leadership/Stakeholders
Customers
@Caito_200_OK
© 2020 Ververica
Metrics as a Shared Language
111
Engineering Team Most
Technical
Most
Abstract
Up + Downstream Teams
Leadership/Stakeholders
Customers
@Caito_200_OK
© 2020 Ververica
Metrics as a Shared Language
112
Engineering Team Most
Technical
Most
Abstract
Up + Downstream Teams
Leadership/Stakeholders
Customers
@Caito_200_OK
© 2020 Ververica
Summary
113
@Caito_200_OK
Concept Principles Implementation Benefit
Metrics-Driven-
Metrics cycle
Meaningful,
iterative,
accessible
High risk,
most uncertain,
current roadmap
priority
Incident
response,
Metrics as a
shared
language
Use tools familiar
to your audience
Small modular
units
Automate people
& process
challenges
© 2020 Ververica
114
Before We Start...
Flink’s REST API integration
http://hostname:8081/jobmanager/metrics
/taskmanagers/<taskmanagerid>/metrics
/taskmanagers/metrics
/jobs/metrics?jobs=D,E,F
...

More Related Content

What's hot

Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
StreamNative
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 

What's hot (20)

Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
 
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
 
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
 
Automation + dev ops summit hail hydrate! from stream to lake
Automation + dev ops summit   hail hydrate! from stream to lakeAutomation + dev ops summit   hail hydrate! from stream to lake
Automation + dev ops summit hail hydrate! from stream to lake
 
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
 
Building event streaming pipelines using Apache Pulsar
Building event streaming pipelines using Apache PulsarBuilding event streaming pipelines using Apache Pulsar
Building event streaming pipelines using Apache Pulsar
 
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixGuaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)
 
Axway amplify api management platform
Axway amplify api management platformAxway amplify api management platform
Axway amplify api management platform
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
 
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 

Similar to Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021

Similar to Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021 (20)

Select Star: Unified Batch & Streaming with Flink SQL & Pulsar
Select Star: Unified Batch & Streaming with Flink SQL & PulsarSelect Star: Unified Batch & Streaming with Flink SQL & Pulsar
Select Star: Unified Batch & Streaming with Flink SQL & Pulsar
 
Don't Cross the Streams! (or do, we got you)
Don't Cross the Streams! (or do, we got you)Don't Cross the Streams! (or do, we got you)
Don't Cross the Streams! (or do, we got you)
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?What's new for Apache Flink's Table & SQL APIs?
What's new for Apache Flink's Table & SQL APIs?
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Flink sql for continuous sql etl apps & Apache NiFi devops
Flink sql for continuous sql etl apps & Apache NiFi devopsFlink sql for continuous sql etl apps & Apache NiFi devops
Flink sql for continuous sql etl apps & Apache NiFi devops
 
Better, Faster, Stronger Streaming: Your First Dive into Flink SQL
Better, Faster, Stronger Streaming: Your First Dive into Flink SQLBetter, Faster, Stronger Streaming: Your First Dive into Flink SQL
Better, Faster, Stronger Streaming: Your First Dive into Flink SQL
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...
AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...
AMIS Oracle OpenWorld en Code One Review 2018 - Pillar 2: Custom Application ...
 
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
What's New in Oracle SQL Developer for 2018
What's New in Oracle SQL Developer for 2018What's New in Oracle SQL Developer for 2018
What's New in Oracle SQL Developer for 2018
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 

More from StreamNative

Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021

  • 2. © 2020 Ververica 2 ● Caito Scherr Introduction
  • 3. © 2020 Ververica 3 ● Caito Scherr ● Developer Advocate Introduction
  • 4. © 2020 Ververica 4 ● Caito Scherr ● Developer Advocate ● Ververica, GmbH Introduction
  • 5. © 2020 Ververica 5 ● Caito Scherr ● Developer Advocate ● Ververica, GmbH Introduction
  • 6. © 2020 Ververica 6 ● Caito Scherr ● Developer Advocate ● Ververica, GmbH ● Portland, OR, USA Introduction
  • 8. © 2020 Ververica 8 ● Credit: Marta Paes Introduction
  • 9. © 2020 Ververica 9 ● Why Pulsar + Flink? ● Pulsar + Flink today ● Where SQL comes in Agenda
  • 10. © 2020 Ververica 10 ● Why Pulsar + Flink? ● Pulsar + Flink today ● Where SQL comes in Agenda
  • 11. © 2020 Ververica 11 ● Why Pulsar + Flink? ● Pulsar + Flink today ● Where SQL comes in Agenda
  • 12. © 2020 Ververica 12 Why Flink + Pulsar?
  • 13. © 2020 Ververica 13 Flink + Pulsar Today “Stream as a unified view on data” “Batch as a special case of streaming”
  • 14. © 2020 Ververica 14 Flink + Pulsar Today Pulsar: Unified Storage Unified Storage (Segments / Pub/Sub) “Stream as a unified view on data” ● Pub/Sub messaging layer (Streaming) ● Durable storage layer (Batch)
  • 15. © 2020 Ververica 15 Why Flink + Pulsar? Flink: Unified Processing Engine Unified Processing Engine (Batch / Streaming) “Batch as a special case of streaming” now bounded query unbounded query past future bounded query start of the stream unbounded query ● Reuse code and logic across batch and stream processing ● Ensure consistent semantics between processing modes ● Simplify operations ● Power applications mixing historic and real-time data
  • 16. © 2020 Ververica 16 A Unified Data Stack Unified Processing Engine (Batch / Streaming) Unified Storage (Segments / Pub/Sub) Why Flink + Pulsar?
  • 17. © 2020 Ververica 17 Why Flink + Pulsar? Flink is broad Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Event-Driven Applications Stateful Functions Streaming Analytics & ML SQL, PyFlink, Tables
  • 18. © 2020 Ververica 18 Why Flink + Pulsar? Flink is broad Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Event-Driven Applications Stateful Functions Streaming Analytics & ML SQL, PyFlink, Tables
  • 19. © 2020 Ververica 19 Why Flink + Pulsar? Use Cases ● Focus on business logic, not implementation ● Mixed workloads (batch and streaming) ● Maximize developer speed and autonomy More high-level or domain-specific use cases can be modeled with SQL/Python and dynamic tables. Examples ML Feature Generation Unified Online/Offline Model Training E2E Streaming Analytics Pipelines
  • 20. © 2020 Ververica 20 Where SQL Fits Flink SQL SELECT user_id, COUNT(url) AS cnt FROM clicks GROUP BY user_id; This is standard SQL (ANSI SQL) “Everyone knows SQL, right?”
  • 21. © 2020 Ververica 21 Where SQL Fits Flink SQL SELECT user_id, COUNT(url) AS cnt FROM clicks GROUP BY user_id; This is standard SQL (ANSI SQL) “Everyone knows SQL, right?” also Flink SQL
  • 22. © 2020 Ververica 22 Where SQL Fits A Regular SQL Engine user cnt Mary 2 Bob 1 SELECT user_id, COUNT(url) AS cnt FROM clicks GROUP BY user_id; Take a snapshot when the query starts A final result is produced A row that was added after the query was started is not considered user cTime url Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… The query terminates
  • 23. © 2020 Ververica 23 Where SQL Fits A Streaming SQL Engine user cTime url user cnt SELECT user_id, COUNT(url) AS cnt FROM clicks GROUP BY user_id; Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… Bob 1 Liz 1 Mary 1 Mary 2 Ingest all changes as they happen Continuously update the result The result is identical to the one-time query (at this point)
  • 24. © 2020 Ververica 24 Where SQL Fits Flink SQL in a Nutshell ● Standard SQL syntax and semantics (i.e. not a “SQL-flavor”) ● Unified APIs for batch and streaming ● Support for advanced time handling and operations (e.g. CDC, pattern matching) UDF Support Python Java Scala Execution TPC-DS Coverage Batch Streaming + Formats Native Connectors Apache Kafka Elasticsearch FileSystems JDBC HBase + Kinesis Metastore Postgres (JDBC) Data Catalogs Debezium
  • 25. © 2020 Ververica 25 Flink + Pulsar Today Pulsar Integration Over Time Flink 1.6+ 2018 Streaming Source/Sink Connectors Table Sink Connector
  • 26. © 2020 Ververica 26 Flink + Pulsar Today Pulsar Integration Over Time Flink 1.6+ 2018 Streaming Source/Sink Connectors Table Sink Connector Pulsar Schema + Flink Catalog Integration Table API/SQL as first-class citizens Exactly-once Source At-least once Sink Flink 1.9+
  • 27. © 2020 Ververica 27 Flink + Pulsar Today Flink 1.12 Upserts (upsert-pulsar) DDL Computed Columns, Watermarks and Metadata End-to-end Exactly-once Key-shared Subscription Model Flink 1.6+ 2018 Streaming Source/Sink Connectors Table Sink Connector Pulsar Integration Over Time Pulsar Schema + Flink Catalog Integration Table API/SQL as first-class citizens Exactly-once Source At-least once Sink Flink 1.9+
  • 28. © 2020 Ververica 28 Flink + Pulsar Today Flink 1.12 Flink 1.6+ 2018 Streaming Source/Sink Connectors Table Sink Connector Pulsar Integration Over Time Pulsar Schema + Flink Catalog Integration Table API/SQL as first-class citizens Exactly-once Source At-least once Sink Flink 1.13 * Apr/May’21 Contribution to the Apache Flink repository [FLINK-20726] Port connector to the new, unified Source API (FLIP-27) Flink 1.9+ Upserts (upsert-pulsar) DDL Computed Columns, Watermarks and Metadata End-to-end Exactly-once Key-shared Subscription Model
  • 29. © 2020 Ververica 29 Where SQL Fits DEMO 1. Use the Twitter Firehose built-in connector to consume tweets about gardening 🌿 into a Pulsar topic (tweets).
  • 30. © 2020 Ververica 30 Where SQL Fits DEMO DEMO 2. Start the Flink SQL Client and use a Pulsar catalog to access the topic directly as a table in Flink. SQL Client CREATE CATALOG pulsar WITH ( 'type' = 'pulsar', 'service-url' = 'pulsar://pulsar:6650', 'admin-url' = 'http://pulsar:8080', 'format' = 'json' ); Catalog DDL
  • 31. © 2020 Ververica 31 Where SQL Fits DEMO DEMO 2.1. You can query the tweets topic off-the-bat using a simple SELECT statement — it’s treated as a Flink table! SQL Client
  • 32. © 2020 Ververica 32 Where SQL Fits DEMO DEMO 2.2. But then you find out that most Firehose events have a null createdTime. What now? Not cool. 👹 SQL Client
  • 33. © 2020 Ververica 33 DEMO 3. One way to get a relevant timestamp is to use Pulsar metadata to get the publishTime (i.e. ingestion time). CREATE TABLE pulsar_tweets ( publishTime TIMESTAMP(3) METADATA, WATERMARK FOR publishTime AS publishTime - INTERVAL '5' SECOND ) WITH ( 'connector' = 'pulsar', 'topic' = 'persistent://public/default/tweets', 'value.format' = 'json', 'service-url' = 'pulsar://pulsar:6650', 'admin-url' = 'http://pulsar:8080', 'scan.startup.mode' = 'earliest-offset' ) LIKE tweets; Source Table DDL Derive schema from the original topic Define the source connector (Pulsar) Read and use Pulsar message metadata Where SQL Fits
  • 34. © 2020 Ververica 34 Where SQL Fits DEMO 4. Perform a simple windowed aggregation (count), and insert results into a new pulsar topic (tweets_agg). CREATE TABLE pulsar_tweets_agg ( tmstmp TIMESTAMP(3), tweet_cnt BIGINT ) WITH ( 'connector'='pulsar', 'topic'='persistent://public/default/tweets_agg', 'value.format'='json', 'service-url'='pulsar://pulsar:6650', 'admin-url'='http://pulsar:8080' ); Sink Table DDL INSERT INTO pulsar_tweets_agg SELECT TUMBLE_START(publishTime, INTERVAL '10' SECOND) AS wStart, COUNT(id) AS tweet_cnt FROM pulsar_tweets GROUP BY TUMBLE(publishTime, INTERVAL '10' SECOND); Continuous SQL Query
  • 35. © 2020 Ververica 35 Where SQL Fits DEMO 5. We’ll get a count of the # of tweets in windows of 10 seconds (based on event time!).
  • 36. © 2020 Ververica 36 Where SQL Fits There’s a lot more to it! Check out the Flink SQL Cookbook, where we share hands-on examples, patterns, and use cases for Flink SQL.
  • 37. © 2020 Ververica ● YOW! Conference staff!! ● “Shared Language” photo models: ○ Zack Hobson, CTO ○ Cory Johannsen, Senior Software Engineer 37 Thank You! @Caito_200_OK Quoted Contributors & Reviewers: ● Mandy Riso - DevOps Engineer ● Eric Shamow - Engineering Manager ● Mike Hix - Senior Software Engineer ● Ben Ford - Product Manager ● Noçnica Fee - Developer Advocate ● Lucy Wyman - Senior Software Engineer ● Logan Ballard - Site Reliability Engineer
  • 38. © 2020 Ververica Resources ● Flink Ahead: What Comes After Batch & Streaming: https://youtu.be/h5OYmy9Yx7Y ● Apache Pulsar as one Storage System for Real Time & Historical Data Analysis: https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c59017 ● Flink Table API & SQL: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#ope rations ● When Flink & Pulsar Come Together: https://flink.apache.org/2019/05/03/pulsar-flink.html ● How to Query Pulsar Streams in Flink: https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.ht ml ● What’s New in the Flink/Pulsar Connector: ● https://flink.apache.org/2021/01/07/pulsar-flink-connector-270.html ● Marta’s Demo: https://github.com/morsapaes/flink-sql-pulsar ● 38 @Caito_200_OK
  • 47. © 2020 Ververica 47 Metrics-Driven Metrics “You can measure almost anything, but you cannot pay attention to everything”
  • 48. © 2020 Ververica 48 Metrics-Driven Metrics “Any situation where people create their own dashboards without structure, quickly starts to look like the cockpit of a 747”
  • 51. © 2020 Ververica 51 Metrics-Driven Metrics The dashboard should be: • Meaningful • Iterative • Accessible
  • 52. © 2020 Ververica 52 Metrics-Driven Metrics The dashboard should reflect your: • Roadmap • Highest Risk • Most uncertain metrics
  • 58. © 2020 Ververica 58 Metrics-Driven Metrics Prometheus DataDog
  • 61. © 2020 Ververica 61 Metrics-Driven Metrics Highest Risk
  • 63. © 2020 Ververica 63 Metrics-Driven Metrics Most Uncertain
  • 65. © 2020 Ververica 65 Metrics-Driven Metrics Current Roadmap Priority
  • 66. © 2020 Ververica 66 Metrics-Driven Metrics Priority #2
  • 67. © 2020 Ververica 67 Metrics-Driven Metrics Priority #2 Current Roadmap Priority Most Uncertain Highest Risk
  • 69. © 2020 Ververica 69 Shared Language “Even the best metrics driven development fails easily, when it’s only built for yourself”
  • 70. © 2020 Ververica 70 Shared Language Implementation: • Identify your impacted groups • Identify the most effective tools for each group • Enable automation
  • 85. © 2020 Ververica 85 Conclusion For systems with complex integration points: • Metrics-Driven Metrics: streamline technical complexity • Metrics as a Shared Language: streamline interpersonal complications
  • 87. © 2020 Ververica Credits - Images ● Flink logos: https://wints.github.io/flink-web//community.html ● Data-Driven/Aware/Informed Design: https://uxdesign.cc/becoming-a-data-aware-designer-1d7614ebc3ed ● Stream processing diagram: https://www.ververica.com/what-is-stream-processing ● 747 Airplane cockpit: https://www.reddit.com/r/pics/comments/5vv8qt/the_pilots_seat_and_cockpit_of_a_boein g_747/ ● DataDog Flink dashboard: https://www.datadoghq.com/blog/monitor-apache-flink-with-datadog/ ● All other photos & Images: Caito Scherr 87 @Caito_200_OK
  • 88. © 2020 Ververica Credits ● Basic Data-Driven Development principles https://www.portable.com.au/reports/principles-of-data-driven-design ● Data-driven design: https://www.springboard.com/blog/data-driven-design/ ● Becoming A Data-Aware Designer, + definition image - Illustration from “Designing with Data” by King, Churchill, & Tan https://uxdesign.cc/becoming-a-data-aware-designer-1d7614ebc3ed ● https://digitalprinciples.org/wp-content/uploads/PDD_Principle-BeDataDriven_v2.p df ● Harvard Business Review - Data Driven Culture: https://hbr.org/2020/02/10-steps-to-creating-a-data-driven-culture 88 @Caito_200_OK
  • 92. © 2020 Ververica 92 Basic Principles Business Metrics Data-Driven Development Data-Driven Design
  • 96. © 2020 Ververica CONCLUSION 96 Audience Tactic Tools You/Your Team • Dashboard should represent your priorities & values • Bring Development + Operations together Analytics platforms • Flink UI, Ververica Platform • Data Dog • New Relic • Grafana/Prometheus... Upstream + Downstream Teams • Understand each other’s “normal” • Clear ownership for integration points Analytics + chat integrations • Slack + PagerDuty • Slack + Jenkins... Leadership + Stakeholders • Business metrics • Pivot between summary & depth • Use tools they’re familiar with Automated, non-engineering spaces • URL endpoints • Internal wiki/blog (with embedded, automated output) Customers • More manual approach Manual approach • Your company’s customer-facing teams
  • 97. © 2020 Ververica BASIC PRINCIPLES 97 Quantitative Qualitative @Caito_200_OK
  • 98. © 2020 Ververica BASIC PRINCIPLES 98 Quantitative Qualitative Data-Driven Design @Caito_200_OK
  • 99. © 2020 Ververica BASIC PRINCIPLES 99 Quantitative Qualitative Data-Driven Design Data-Informed Design @Caito_200_OK
  • 100. © 2020 Ververica BASIC PRINCIPLES 100 Quantitative Qualitative Data-Driven Design Data-Aware Design Data-Informed Design @Caito_200_OK
  • 101. © 2020 Ververica BASIC PRINCIPLES 101 @Caito_200_OK
  • 102. © 2020 Ververica THE SHARED LANGUAGE OF METRICS
  • 103. © 2020 Ververica A SHARED LANGUAGE 103 Even the best data-driven development fails easily when it’s only built for yourself. @Caito_200_OK
  • 104. © 2020 Ververica Metrics as a Shared Language 104 Engineering Team Most Technical Most Abstract Up + Downstream Teams Leadership/Stakeholders Customers @Caito_200_OK
  • 105. © 2020 Ververica Metrics as a Shared Language 105 Engineering Team Most Technical Most Abstract Up + Downstream Teams Leadership/Stakeholders Customers @Caito_200_OK
  • 106. © 2020 Ververica Metrics as a Shared Language 106 Engineering Team Most Technical Most Abstract Up + Downstream Teams Leadership/Stakeholders Customers @Caito_200_OK
  • 107. © 2020 Ververica A SHARED LANGUAGE 107 @Caito_200_OK
  • 108. © 2020 Ververica A SHARED LANGUAGE 108 @Caito_200_OK
  • 109. © 2020 Ververica A SHARED LANGUAGE 109 @Caito_200_OK
  • 110. © 2020 Ververica Metrics as a Shared Language 110 Engineering Team Most Technical Most Abstract Up + Downstream Teams Leadership/Stakeholders Customers @Caito_200_OK
  • 111. © 2020 Ververica Metrics as a Shared Language 111 Engineering Team Most Technical Most Abstract Up + Downstream Teams Leadership/Stakeholders Customers @Caito_200_OK
  • 112. © 2020 Ververica Metrics as a Shared Language 112 Engineering Team Most Technical Most Abstract Up + Downstream Teams Leadership/Stakeholders Customers @Caito_200_OK
  • 113. © 2020 Ververica Summary 113 @Caito_200_OK Concept Principles Implementation Benefit Metrics-Driven- Metrics cycle Meaningful, iterative, accessible High risk, most uncertain, current roadmap priority Incident response, Metrics as a shared language Use tools familiar to your audience Small modular units Automate people & process challenges
  • 114. © 2020 Ververica 114 Before We Start... Flink’s REST API integration http://hostname:8081/jobmanager/metrics /taskmanagers/<taskmanagerid>/metrics /taskmanagers/metrics /jobs/metrics?jobs=D,E,F ...