SlideShare une entreprise Scribd logo
1  sur  33
1Confidential
KSQL: Streaming SQL for Kafka
An Introduction
Neil Avery, @avery_neil,
September 2017
2Confidential
August 2018, Kafka Summit SF Announcement
A Developer Preview of
KSQL
A Streaming SQL Engine for Apache KafkaTM
from Confluent
3Confidential
Agenda
● What is KSQL for?
● Why KSQL?
● KSQL concepts
● Demo: Working with KSQL to process and visualize data
● Core concepts: Stream and Table
● Understand the KSQL ecosystem
● Roadmap
4Confidential
What is it for ?
● Streaming ETL
○ Kafka is popular for data pipelines.
○ KSQL enables easy transformations of
data within the pipe
● Anomaly Detection
○ Identifying patterns or anomalies in
real-time data, surfaced in milliseconds
● Monitoring
○ Log data monitoring, tracking
and alerting
○ Sensor / IoT data
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM auth_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number HAVING count(*) > 3;
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_strm
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
5Confidential
Why KSQL?
Stream processing development is hard - you need developer skills; not good if you
are a data-scientist, analyst, or non-developer.
● SQL based; simple & intuitive
● SQL simplifies deployment - no jars, no artifacts or binaries; just run SQL
● Interact and access your data via the CLI: SELECT * from XXX where A,B,C
● Easily get data-in and out of Kafka (and process it)
● Use SQL to process your data by leveraging Kafka Streams
● Built on Kafka and its Streams API: distributed, scalable, reliable, and real-time.
6Confidential
KSQL Concepts
● STREAM and TABLE as first-class citizens
● Interpretations of Topic content
● STREAM - data in motion
● TABLE - collected state of a stream (aggregations)
○ One record per key (per window)
○ Current values (compacted topic) ← Not yet in KSQL
○ Changelog
7Confidential
Let’s try it out...
> KSQL
8Confidential
We can build this…
9Confidential
Start our Docker environment and generate Data
Launch the clickstream docker image & run kafka.
export KAFKA_HEAP_OPTS="-Xmx256M -Xms256M"
$ docker run -p 33000:3000 -it confluentinc/ksql-clickstream-demo bash
root@bf73923012ab:/# confluent start
Starting zookeeper
zookeeper is [UP]
Starting kafka
<<snip>>
Run the Data generator to simulate web-traffic:
root@bf73923012ab:/# ksql-datagen -daemon quickstart=clickstream format=json topic=clickstream maxInterval=100
iterations=500000
Writing console output to /tmp/ksql-logs/ksql.out
root@bf73923012ab:/# tail -f /tmp/ksql-logs/ksql.out
111.203.236.146 --> ([ '111.203.236.146' | 36 | '-' | '07/Sep/2017:11:07:20 +0000' | 1504782440181 | 'GET
/site/login.html HTTP/1.1' | '407' | '4196' | '-' | 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36' ])
10Confidential
Start the KSQL CLI in client-server mode
Start the KSQL server.
$ ksql-server-start /etc/ksql/ksqlserver.properties > /tmp/ksql-logs/ksql-server.log 2>&1 &
Start the CLI on port 8080.
$ ksql-cli remote http://localhost:8080
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | .  ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
<<snip>>
11Confidential
Building a Stream
STREAM: A stream is an unbounded sequence of structured data (“facts”).
For example, a stream of financial transactions such as “Alice sent $100 to Bob, then
Charlie sent $50 to Bob”.
Facts in a stream are immutable, new facts can be inserted to a stream, existing facts can
never be updated or deleted.
CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request
varchar, status int, userid varchar, bytes bigint, agent varchar)
WITH
(kafka_topic = 'clickstream', value_format = 'json');
12Confidential
KSQL> Working with Streams
1. ksql> list TOPICS;
2. ksql> CREATE STREAM clickstream (_time bigint,time varchar, ip varchar,
request varchar, status int, userid varchar, bytes bigint, agent varchar)
with (kafka_topic = 'clickstream', value_format = 'json');
3. ksql> list STREAMS;
4. ksql> DESCRIBE CLICKSTREAM;
5. ksql> SELECT * from CLICKSTREAM limit 10;
6. ksql> SELECT * from CLICKSTREAM WHERE request like ‘%html%’;
13Confidential
Create and Interact with a Table
TABLE: A table is a view of a STREAM and represents a collection of evolving facts.
We could have a table that contains the latest financial information such as:
“Bob’s current account balance is $150”.
Similar to a traditional database table but enriched by streaming semantics such as
windowing.
● Facts in a table are mutable, new facts can be inserted to the table, and
existing facts can be updated or deleted.
● Tables can be created from a Kafka topic or derived from streams and tables.
CREATE TABLE IP_SUM as SELECT ip, sum(bytes)/1024 as kbytes
FROM CLICKSTREAM WINDOW SESSION (300 second) GROUP BY ip;
14Confidential
KSQL CLI> Build a TABLE using SELECT
kql> SELECT ip, sum(bytes)/1024 as kbytes FROM CLICKSTREAM WINDOW SESSION (300 second)
GROUP BY ip;
111.145.8.144 | 4
222.245.174.248 | 5
233.90.225.227 | 39
<<snip>>
ksql> CREATE TABLE IP_SUM as SELECT ip, sum(bytes)/1024 as kbytes FROM CLICKSTREAM
window SESSION (300 second) GROUP BY ip;
ksql> SELECT * from IP_SUM limit 10;
1504788602258 | 233.173.215.103 : Window{start=1504788556778 end=-} | 233.173.215.103 | 374
<<snip>>
15Confidential
KSQL CLI> Build a TABLE using SELECT
kql> LIST TABLES;
Table Name | Kafka Topic | Format | Windowed
----------------------------------------------
IP_SUM | IP_SUM | JSON | true
ksql> DESCRIBE IP_SUM;
Field | Type
---------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
IP | VARCHAR(STRING)
KBYTES | BIGINT
ksql> SELECT * from IP_SUM where IP like ‘%33%’ limit 10;
1505314606146 | 233.203.236.146 : Window{start=1505314602405 end=-} | 233.203.236.146 | 4
16Confidential
Visualize the Table in Grafana
1. Build a timestamped TABLE from a table. We need timestamped data for ES
ksql> CREATE TABLE IP_SUM_TS as SELECT rowTime as event_ts, * FROM IP_SUM;
2. Start Elasticsearch
$ /etc/init.d/elasticsearch start
[....] Starting Elasticsearch Server
3. Start Grafana
$ /etc/init.d/grafana-server start
4. Connect the Stream IP_SUM_TS to Elastic and add the datasource to Grafana
# cd /usr/share/doc/ksql-clickstream-demo/
# ./ksql-connect-es-grafana.sh ip_sum_ts
17Confidential
Viewing the data in Grafana
18Confidential
Running the Clickstream demo
From: https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo
1. # ksql-datagen quickstart=clickstream_users format=json topic=clickstream_users
maxInterval=10 iterations=50
2. # ksql-datagen quickstart=clickstream_codes format=json topic=clickstream_codes
maxInterval=20 iterations=100
3. ksql> run script '/usr/share/doc/ksql-clickstream-demo/clickstream-schema.sql';
4. # cd /usr/share/doc/ksql-clickstream-demo
# ./ksql-tables-to-grafana.sh
Loading Clickstream-Demo TABLES to Confluent-Connect => Elastic => Grafana datasource
Logging to: /tmp/ksql-connect.log
Charting CLICK_USER_SESSIONS_TS
<<snip>>
5. # ./clickstream-analysis-dashboard.sh
19Confidential
View the dashboard
20Confidential
Do you think that’s a table you are querying?
21Confidential
The Stream-Table duality
22Confidential
Recap: Stream-Table duality
● STREAM and TABLE as first-class citizens
● Interpretations of topic content
● STREAM - data in motion
● TABLE - collected state of a stream (aggregations)
○ One record per key (per window)
○ Current values (compacted topic) ← Not yet in KSQL
○ Changelog
23Confidential
Window Aggregations
Three types supported (same as KStreams):
● TUMBLING: Fixed-size, non-overlapping, gap-less windows
○ SELECT ip, count(*) AS hits FROM clickstream
WINDOW TUMBLING (size 1 minute) GROUP BY ip;
● HOPPING: Fixed-size, overlapping windows
○ SELECT ip, SUM(bytes) AS bytes_per_ip_and_bucket FROM clickstream
WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY ip;
● SESSION: Dynamically-sized, non-overlapping, data-driven window
○ SELECT ip, SUM(bytes) AS bytes_per_ip FROM clickstream
WINDOW SESSION (20 second) GROUP BY ip;
More: http://docs.confluent.io/current/streams/developer-guide.html#windowing
24Confidential
Resources and Admin
● LIST TOPICS;
● LIST STREAMS;
● LIST TABLES;
● SHOW PROPERTIES;
● LIST QUERIES;
● If you need to stop one:
○ TERMINATE <query-id>;
25Confidential
Functions
● Scalar functions:
○ CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE
○ ABS, CEIL, FLOOR, RANDOM, ROUND
○ StringToTimestamp, TimestampToString
○ EXTRACTJSONFIELD
○ CAST
● Aggregate functions:
○ SUM, COUNT, MIN, MAX
26Confidential
Developing in KSQL
● Interactive development using the CLI
● Capture SQL commands in a stream-application.sql
● Automate setup into your CI
● <more tools coming>
set 'commit.interval.ms'='2000';
set 'cache.max.bytes.buffering'='10000000';
set 'auto.offset.reset'='earliest';
DROP STREAM clickstream;
CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar,
status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream',
value_format = 'json');
DROP TABLE events_per_min;
create table events_per_min as select userid, count(*) as events from clickstream
window TUMBLING (size 10 second) group by
userid;
-- VIEW - Enrich with rowTime
DROP TABLE events_per_min_ts;
CREATE TABLE events_per_min_ts as select rowTime as event_ts, * from
events_per_min;
27Confidential
The KSQL Architecture & Ecosystem
Stream [p1, p2, p3]
Table [p1, p2, p3]
insert()
select()
insert()
insert()
select()
select()
Topic [p1, p2, p3]
28Confidential
Mode #1 Stand-alone aka ‘local mode’
● Starts a CLI, an Engine, and a REST server all in
the same JVM
● Ideal for laptop development etc.
○ Use with default settings:
> bin/ksql-cli local
○ Or with customized settings:
> bin/ksql-cli local --properties-file
foo/bar/ksql.properties
● Careful with service and command topic naming!
(more on this in a moment...)
29Confidential
Mode #2 Client-Server
● Start any number of Server nodes
○ > bin/ksql-server-start
○ > bin/ksql-server-start --properties-file
foo.properties
● Start any number of CLIs, specifying a server
address as ‘remote’ endpoint
○ > bin/ksql-cli remote http://server:8090
● All Engines share the work
○ Instances of the same KStreams Apps
○ Scale up/down without restarting
30Confidential
KSQL Session Variables
● Just as in MySQL, ORACLE etc. there are settings to control how your CLI behaves
● Defaults can be set in the ksql.properties file
● To see a list of currently set or default variable values:
○ ksql> show properties;
● Useful examples:
○ num.stream.threads=4
○ commit.interval.ms=1000
○ cache.max.bytes.buffering=2000000
● TIP! - Your new best friend for testing or building a demo is:
○ ksql> set ‘auto.offset.reset’ = ‘earliest’;
31Confidential
Roadmap, 2018
● GA of current feature set. Improved quality, stability, and operations
● Complete our view of what a SQL streaming platform should provide for
Streams and Tables
● Additional aggregate functions. We will continue to expand the set of analytics
functions
● Testing tools. Many data-platforms suffer from an inherent inability to test. With
KSQL testing capability is a primary focus and we will provide frameworks to
support continuous integration and unit test
[subject to change]
Kafka Summit is coming to London!
April 23-24, 2018
Subscribe for updates on CFP, sponsorships and more at
www.kafka-summit.org
33Confidential
Thank you
Neil Avery, neil@confluent.io @avery_neil

Contenu connexe

Tendances

Tendances (20)

Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
ksqlDB로 시작하는 스트림 프로세싱
ksqlDB로 시작하는 스트림 프로세싱ksqlDB로 시작하는 스트림 프로세싱
ksqlDB로 시작하는 스트림 프로세싱
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 

Similaire à KSQL: Streaming SQL for Kafka

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 

Similaire à KSQL: Streaming SQL for Kafka (20)

KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS SummitAutomatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
 
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
 
Query Your Streaming Data on Kafka using SQL: Why, How, and What
Query Your Streaming Data on Kafka using SQL: Why, How, and WhatQuery Your Streaming Data on Kafka using SQL: Why, How, and What
Query Your Streaming Data on Kafka using SQL: Why, How, and What
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
KSQL and Kafka Streams – When to Use Which, and When to Use Both
KSQL and Kafka Streams – When to Use Which, and When to Use BothKSQL and Kafka Streams – When to Use Which, and When to Use Both
KSQL and Kafka Streams – When to Use Which, and When to Use Both
 
KSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache KafkaKSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache Kafka
 

Plus de confluent

Plus de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Dernier

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Dernier (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 

KSQL: Streaming SQL for Kafka

  • 1. 1Confidential KSQL: Streaming SQL for Kafka An Introduction Neil Avery, @avery_neil, September 2017
  • 2. 2Confidential August 2018, Kafka Summit SF Announcement A Developer Preview of KSQL A Streaming SQL Engine for Apache KafkaTM from Confluent
  • 3. 3Confidential Agenda ● What is KSQL for? ● Why KSQL? ● KSQL concepts ● Demo: Working with KSQL to process and visualize data ● Core concepts: Stream and Table ● Understand the KSQL ecosystem ● Roadmap
  • 4. 4Confidential What is it for ? ● Streaming ETL ○ Kafka is popular for data pipelines. ○ KSQL enables easy transformations of data within the pipe ● Anomaly Detection ○ Identifying patterns or anomalies in real-time data, surfaced in milliseconds ● Monitoring ○ Log data monitoring, tracking and alerting ○ Sensor / IoT data CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum'; CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM auth_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_strm WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 5. 5Confidential Why KSQL? Stream processing development is hard - you need developer skills; not good if you are a data-scientist, analyst, or non-developer. ● SQL based; simple & intuitive ● SQL simplifies deployment - no jars, no artifacts or binaries; just run SQL ● Interact and access your data via the CLI: SELECT * from XXX where A,B,C ● Easily get data-in and out of Kafka (and process it) ● Use SQL to process your data by leveraging Kafka Streams ● Built on Kafka and its Streams API: distributed, scalable, reliable, and real-time.
  • 6. 6Confidential KSQL Concepts ● STREAM and TABLE as first-class citizens ● Interpretations of Topic content ● STREAM - data in motion ● TABLE - collected state of a stream (aggregations) ○ One record per key (per window) ○ Current values (compacted topic) ← Not yet in KSQL ○ Changelog
  • 9. 9Confidential Start our Docker environment and generate Data Launch the clickstream docker image & run kafka. export KAFKA_HEAP_OPTS="-Xmx256M -Xms256M" $ docker run -p 33000:3000 -it confluentinc/ksql-clickstream-demo bash root@bf73923012ab:/# confluent start Starting zookeeper zookeeper is [UP] Starting kafka <<snip>> Run the Data generator to simulate web-traffic: root@bf73923012ab:/# ksql-datagen -daemon quickstart=clickstream format=json topic=clickstream maxInterval=100 iterations=500000 Writing console output to /tmp/ksql-logs/ksql.out root@bf73923012ab:/# tail -f /tmp/ksql-logs/ksql.out 111.203.236.146 --> ([ '111.203.236.146' | 36 | '-' | '07/Sep/2017:11:07:20 +0000' | 1504782440181 | 'GET /site/login.html HTTP/1.1' | '407' | '4196' | '-' | 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36' ])
  • 10. 10Confidential Start the KSQL CLI in client-server mode Start the KSQL server. $ ksql-server-start /etc/ksql/ksqlserver.properties > /tmp/ksql-logs/ksql-server.log 2>&1 & Start the CLI on port 8080. $ ksql-cli remote http://localhost:8080 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = <<snip>>
  • 11. 11Confidential Building a Stream STREAM: A stream is an unbounded sequence of structured data (“facts”). For example, a stream of financial transactions such as “Alice sent $100 to Bob, then Charlie sent $50 to Bob”. Facts in a stream are immutable, new facts can be inserted to a stream, existing facts can never be updated or deleted. CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid varchar, bytes bigint, agent varchar) WITH (kafka_topic = 'clickstream', value_format = 'json');
  • 12. 12Confidential KSQL> Working with Streams 1. ksql> list TOPICS; 2. ksql> CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid varchar, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json'); 3. ksql> list STREAMS; 4. ksql> DESCRIBE CLICKSTREAM; 5. ksql> SELECT * from CLICKSTREAM limit 10; 6. ksql> SELECT * from CLICKSTREAM WHERE request like ‘%html%’;
  • 13. 13Confidential Create and Interact with a Table TABLE: A table is a view of a STREAM and represents a collection of evolving facts. We could have a table that contains the latest financial information such as: “Bob’s current account balance is $150”. Similar to a traditional database table but enriched by streaming semantics such as windowing. ● Facts in a table are mutable, new facts can be inserted to the table, and existing facts can be updated or deleted. ● Tables can be created from a Kafka topic or derived from streams and tables. CREATE TABLE IP_SUM as SELECT ip, sum(bytes)/1024 as kbytes FROM CLICKSTREAM WINDOW SESSION (300 second) GROUP BY ip;
  • 14. 14Confidential KSQL CLI> Build a TABLE using SELECT kql> SELECT ip, sum(bytes)/1024 as kbytes FROM CLICKSTREAM WINDOW SESSION (300 second) GROUP BY ip; 111.145.8.144 | 4 222.245.174.248 | 5 233.90.225.227 | 39 <<snip>> ksql> CREATE TABLE IP_SUM as SELECT ip, sum(bytes)/1024 as kbytes FROM CLICKSTREAM window SESSION (300 second) GROUP BY ip; ksql> SELECT * from IP_SUM limit 10; 1504788602258 | 233.173.215.103 : Window{start=1504788556778 end=-} | 233.173.215.103 | 374 <<snip>>
  • 15. 15Confidential KSQL CLI> Build a TABLE using SELECT kql> LIST TABLES; Table Name | Kafka Topic | Format | Windowed ---------------------------------------------- IP_SUM | IP_SUM | JSON | true ksql> DESCRIBE IP_SUM; Field | Type --------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) IP | VARCHAR(STRING) KBYTES | BIGINT ksql> SELECT * from IP_SUM where IP like ‘%33%’ limit 10; 1505314606146 | 233.203.236.146 : Window{start=1505314602405 end=-} | 233.203.236.146 | 4
  • 16. 16Confidential Visualize the Table in Grafana 1. Build a timestamped TABLE from a table. We need timestamped data for ES ksql> CREATE TABLE IP_SUM_TS as SELECT rowTime as event_ts, * FROM IP_SUM; 2. Start Elasticsearch $ /etc/init.d/elasticsearch start [....] Starting Elasticsearch Server 3. Start Grafana $ /etc/init.d/grafana-server start 4. Connect the Stream IP_SUM_TS to Elastic and add the datasource to Grafana # cd /usr/share/doc/ksql-clickstream-demo/ # ./ksql-connect-es-grafana.sh ip_sum_ts
  • 18. 18Confidential Running the Clickstream demo From: https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo 1. # ksql-datagen quickstart=clickstream_users format=json topic=clickstream_users maxInterval=10 iterations=50 2. # ksql-datagen quickstart=clickstream_codes format=json topic=clickstream_codes maxInterval=20 iterations=100 3. ksql> run script '/usr/share/doc/ksql-clickstream-demo/clickstream-schema.sql'; 4. # cd /usr/share/doc/ksql-clickstream-demo # ./ksql-tables-to-grafana.sh Loading Clickstream-Demo TABLES to Confluent-Connect => Elastic => Grafana datasource Logging to: /tmp/ksql-connect.log Charting CLICK_USER_SESSIONS_TS <<snip>> 5. # ./clickstream-analysis-dashboard.sh
  • 20. 20Confidential Do you think that’s a table you are querying?
  • 22. 22Confidential Recap: Stream-Table duality ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream (aggregations) ○ One record per key (per window) ○ Current values (compacted topic) ← Not yet in KSQL ○ Changelog
  • 23. 23Confidential Window Aggregations Three types supported (same as KStreams): ● TUMBLING: Fixed-size, non-overlapping, gap-less windows ○ SELECT ip, count(*) AS hits FROM clickstream WINDOW TUMBLING (size 1 minute) GROUP BY ip; ● HOPPING: Fixed-size, overlapping windows ○ SELECT ip, SUM(bytes) AS bytes_per_ip_and_bucket FROM clickstream WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY ip; ● SESSION: Dynamically-sized, non-overlapping, data-driven window ○ SELECT ip, SUM(bytes) AS bytes_per_ip FROM clickstream WINDOW SESSION (20 second) GROUP BY ip; More: http://docs.confluent.io/current/streams/developer-guide.html#windowing
  • 24. 24Confidential Resources and Admin ● LIST TOPICS; ● LIST STREAMS; ● LIST TABLES; ● SHOW PROPERTIES; ● LIST QUERIES; ● If you need to stop one: ○ TERMINATE <query-id>;
  • 25. 25Confidential Functions ● Scalar functions: ○ CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE ○ ABS, CEIL, FLOOR, RANDOM, ROUND ○ StringToTimestamp, TimestampToString ○ EXTRACTJSONFIELD ○ CAST ● Aggregate functions: ○ SUM, COUNT, MIN, MAX
  • 26. 26Confidential Developing in KSQL ● Interactive development using the CLI ● Capture SQL commands in a stream-application.sql ● Automate setup into your CI ● <more tools coming> set 'commit.interval.ms'='2000'; set 'cache.max.bytes.buffering'='10000000'; set 'auto.offset.reset'='earliest'; DROP STREAM clickstream; CREATE STREAM clickstream (_time bigint,time varchar, ip varchar, request varchar, status int, userid int, bytes bigint, agent varchar) with (kafka_topic = 'clickstream', value_format = 'json'); DROP TABLE events_per_min; create table events_per_min as select userid, count(*) as events from clickstream window TUMBLING (size 10 second) group by userid; -- VIEW - Enrich with rowTime DROP TABLE events_per_min_ts; CREATE TABLE events_per_min_ts as select rowTime as event_ts, * from events_per_min;
  • 27. 27Confidential The KSQL Architecture & Ecosystem Stream [p1, p2, p3] Table [p1, p2, p3] insert() select() insert() insert() select() select() Topic [p1, p2, p3]
  • 28. 28Confidential Mode #1 Stand-alone aka ‘local mode’ ● Starts a CLI, an Engine, and a REST server all in the same JVM ● Ideal for laptop development etc. ○ Use with default settings: > bin/ksql-cli local ○ Or with customized settings: > bin/ksql-cli local --properties-file foo/bar/ksql.properties ● Careful with service and command topic naming! (more on this in a moment...)
  • 29. 29Confidential Mode #2 Client-Server ● Start any number of Server nodes ○ > bin/ksql-server-start ○ > bin/ksql-server-start --properties-file foo.properties ● Start any number of CLIs, specifying a server address as ‘remote’ endpoint ○ > bin/ksql-cli remote http://server:8090 ● All Engines share the work ○ Instances of the same KStreams Apps ○ Scale up/down without restarting
  • 30. 30Confidential KSQL Session Variables ● Just as in MySQL, ORACLE etc. there are settings to control how your CLI behaves ● Defaults can be set in the ksql.properties file ● To see a list of currently set or default variable values: ○ ksql> show properties; ● Useful examples: ○ num.stream.threads=4 ○ commit.interval.ms=1000 ○ cache.max.bytes.buffering=2000000 ● TIP! - Your new best friend for testing or building a demo is: ○ ksql> set ‘auto.offset.reset’ = ‘earliest’;
  • 31. 31Confidential Roadmap, 2018 ● GA of current feature set. Improved quality, stability, and operations ● Complete our view of what a SQL streaming platform should provide for Streams and Tables ● Additional aggregate functions. We will continue to expand the set of analytics functions ● Testing tools. Many data-platforms suffer from an inherent inability to test. With KSQL testing capability is a primary focus and we will provide frameworks to support continuous integration and unit test [subject to change]
  • 32. Kafka Summit is coming to London! April 23-24, 2018 Subscribe for updates on CFP, sponsorships and more at www.kafka-summit.org
  • 33. 33Confidential Thank you Neil Avery, neil@confluent.io @avery_neil