SlideShare une entreprise Scribd logo
1  sur  60
1Confidential
KSQL
An Open Source Streaming SQL Engine for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
2KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
3KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
4KSQL- Streaming SQL for Apache Kafka
Apache Kafka - A Distributed, Scalable Commit Log
5KSQL- Streaming SQL for Apache Kafka
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Apache Kafka – The Rise of a Streaming Platform
6KSQL- Streaming SQL for Apache Kafka
Apache Kafka – The Rise of a Streaming Platform
7KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
8KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
9KSQL- Streaming SQL for Apache Kafka
Why KSQL?
Population
CodingSophistication
Realm of Stream Processing
New, Expanded Realm
BI
Analysts
Core
Developers
Data
Engineers
Core Developers
who don’t like
Java
10KSQL- Streaming SQL for Apache Kafka
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
• mapValues()
• filter()
• punctuate()
• Select…from…
• Join…where…
• Group by..
Flexibility Simplicity
Kafka Streams KSQL
Consumer
Producer
11KSQL- Streaming SQL for Apache Kafka
What is it for ?
Streaming ETL
• Kafka is popular for data pipelines
• KSQL enables easy transformations of data within the pipe
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
12KSQL- Streaming SQL for Apache Kafka
What is it for ?
Simple Derivations of Existing Topics
• One-liner to re-partition and / or re-key a topic for new uses
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT *
FROM clickstream
PARTITION BY user_id;
13KSQL- Streaming SQL for Apache Kafka
What is it for ?
Analytics, e.g. Anomaly Detection
• Identifying patterns or anomalies in real-time data, surfaced in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY card_number
HAVING count(*) > 3;
14KSQL- Streaming SQL for Apache Kafka
What is it for ?
Real Time Monitoring
• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
15KSQL- Streaming SQL for Apache Kafka
Where is KSQL not such a great fit (at least today)?
Powerful ad-hoc query
○ Limited span of time usually
retained in Kafka
○ No indexes
BI reports (Tableau etc.)
○ No indexes
○ No JDBC (most Bi tools are not
good with continuous results!)
16KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
17KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
18KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● No need for source code
• Zero, none at all, not even one line.
• No SerDes, no generics, no lambdas, ...
● All the Kafka Streams “magic” out-of-the-box
• Exactly Once Semantics
• Windowing
• Event-time aggregation
• Late-arriving data
• Distributed, fault-tolerant, scalable, ...
19KSQL- Streaming SQL for Apache Kafka
STREAM and TABLE as first-class citizens
20KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` [, …] ) ]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WHERE `condition` ]
[ PARTITION BY `column_name` ]
● where property can be any of the following:
KAFKA_TOPIC = name - what to call the sink topic
FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream
AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to
PARTITIONS = # - number of partitions in sink topic
TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the
event time.
21KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expression` [, ...] )]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
● where property values are same as for ‚Create Streams as Select‘
22KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
23KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries
● Three types supported (same as Kafka Streams):
• TUMBLING (= SLIDING)
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (
size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20
second) GROUP BY itemid;
24KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
25KSQL- Streaming SQL for Apache Kafka
Create a STREAM and a TABLE from Kafka Topics
ksql> CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews',
value_format='DELIMITED');
ksql> CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH
(kafka_topic='users', value_format='JSON');
ksql> SELECT pageid FROM pageviews_original LIMIT 3;
ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM
pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
26KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Hello World
27KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
28KSQL- Streaming SQL for Apache Kafka
KSQL - Components
KSQL has 3 main components:
1. The CLI, designed to be familiar to users of MySQL, Postgres etc.
2. The Engine which actually runs the Kafka Streams topologies
3. The REST server interface enables an Engine to receive instructions from the CLI
(Note that you also need a Kafka Cluster… KSQL is deployed independently)
29KSQL- Streaming SQL for Apache Kafka
Kafka Cluster
JVM
KSQL EngineRESTKSQL>
#1 STAND-ALONE AKA ‘LOCAL MODE’
30KSQL- Streaming SQL for Apache Kafka
#1 STAND-ALONE AKA ‘LOCAL MODE’
Starts a CLI, an Engine,
and a REST server all
in the same JVM
Ideal for laptop development
• Start with default settings:
• > bin/ksql-cli local
Or with customized settings:
• > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
31KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Kafka Cluster
JVM
KSQL Engine
REST
KSQL>
JVM
KSQL Engine
REST
JVM
KSQL Engine
REST
32KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Start any number
of Server nodes
• > bin/ksql-server-start
Start any number of CLIs and
specify ‘remote’ server address
• >bin/ksql-cli remote http://myserver:8090
All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up / down without restart
33KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Kafka Cluster
JVM
KSQL Engine
JVM
KSQL Engine
JVM
KSQL Engine
34KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Running the KSQL server
with a pre-defined set of
instructions/queries
• Version control your queries and
transformations as code
Start any number of Engine
instances
• Pass a file of KSQL statements to execute
• > bin/ksql-node query-file=foo/bar.sql
All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up/down without restart
35KSQL- Streaming SQL for Apache Kafka
Dedicating resources
36KSQL- Streaming SQL for Apache Kafka
How do you deploy applications?
37KSQL- Streaming SQL for Apache Kafka
Where to develop and operate your applications?
38KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
39KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
Kafka
Producer
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Stream of
Log Events
40KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
• https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo#clickstream-analysis
• Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana
• 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I
• Setup in 5 minutes (with or without Docker)
SELECT STREAM
CEIL(timestamp TO HOUR) AS timeWindow, productId,
COUNT(*) AS hourlyOrders, SUM(units) AS units
FROM Orders GROUP BY CEIL(timestamp TO HOUR),
productId;
timeWindow | productId | hourlyOrders | units
------------+-----------+--------------+-------
08:00:00 | 10 | 2 | 5
08:00:00 | 20 | 1 | 8
09:00:00 | 10 | 4 | 22
09:00:00 | 40 | 1 | 45
... | ... | ... | ...
41KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Clickstream Analysis
42KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
43KSQL- Streaming SQL for Apache Kafka
KSQL Quick Start
github.com/confluentinc/ksql
Local runtime
or
Docker container
44KSQL- Streaming SQL for Apache Kafka
Remember: Developer Preview!
Caveats of Developer Preview
• No ORDER BY yet
• No Stream-stream joins yet
• Limited function library
• Avro support only via workaround
• Breaking API / Syntax changes still possible
BE EXCITED, BUT BE ADVISED
45KSQL- Streaming SQL for Apache Kafka
Resources and Next Steps
Get Involved
• Try the Quickstart on GitHub
• Check out the code
• Play with the examples
The point of a developer preview is to improve things—together!
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
46KSQL- Streaming SQL for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.confluent.io
www.kai-waehner.de
LinkedIn
Questions? Feedback?
Please contact me…
Come to our booth…
Come to Kafka Summit London in April 2018…
47KSQL- Streaming SQL for Apache Kafka
Appendix
48KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● STREAM and TABLE as first-class citizens
● Interpretations of topic content
● STREAM - data in motion
● TABLE - collected state of a stream
• One record per key (per window)
• Current values (compacted topic) not yet
• Changelog
● STREAM – TABLE Joins
49KSQL- Streaming SQL for Apache Kafka
Schema & Format
● A Kafka broker knows how to move bytes
• Technically a key-value message (byte[], byte[])
● To enable declarative SQL-like queries and transformations we have to define
a richer structure
● Structural metadata maintained in an in-memory catalog
• DDL is recorded in a special topic
50KSQL- Streaming SQL for Apache Kafka
Schema & Format
Start with message (value) format
● JSON - the simplest choice
● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in.
Will be expanded.
● AVRO - requires that you also supply a schema-file (.avsc)
Pseudo-columns are automatically provided
• ROWKEY, ROWTIME - for querying the message key and timestamp
• (PARTITION, OFFSET coming soon)
• CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH
(value_format = 'delimited', kafka_topic='my_pageview_topic');
51KSQL- Streaming SQL for Apache Kafka
Schema & Datatypes
● varchar / string
● boolean / bool
● integer / int
● bigint / long
● double
● array(of_type) - of-type must be primitive (no nested Array or Map yet)
● map(key_type, value_type) - key-type must be string, value-type must be
primitive
52KSQL- Streaming SQL for Apache Kafka
Interactive Querying
● Great for iterative development
● LIST (or SHOW) STREAMS / TABLES
● DESCRIBE STREAM / TABLE
● SELECT
• Selects rows from a KSQL stream or table.
• The result of this statement will be printed out in the console.
• To stop the continuous query in the CLI press Ctrl+C.
53KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
54KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries :-)
● Three types supported (same as KStreams):
• TUMBLING (= SLIDING)
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (
size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20
second) GROUP BY itemid;
55KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS SELECT
● Once your query is ready and you want to run your query non-interactively
• CREATE STREAM AS SELECT ...;
● Creates a new KSQL Stream along with the corresponding Kafka topic and
streams the result of the SELECT query into the topic
● To find what streams are already running:
• SHOW QUERIES;
● If you need to stop one:
• TERMINATE query_id;
56KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` [, …] ) ]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WHERE `condition` ]
[ PARTITION BY `column_name` ]
● where property can be any of the following:
KAFKA_TOPIC = name - what to call the sink topic
FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream
AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to
PARTITIONS = # - number of partitions in sink topic
TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the
event time.
57KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS SELECT
● Once your query is ready and you want to run it non-interactively
● CREATE TABLE AS SELECT ...;
● Just like ‚CREATE STREAM AS SELECT‘ but for aggregations
58KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expression` [, ...] )]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
● where property values are same as for ‚Create Streams as Select‘
59KSQL- Streaming SQL for Apache Kafka
Functions
● Scalar Functions:
• CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE
• ABS, CEIL, FLOOR, RANDOM, ROUND
• StringToTimestamp, TimestampToString
• GetStringFromJSON
• CAST
● Aggregate Functions:
• SUM, COUNT, MIN, MAX
● User- defined Functions:
• Java Interface
60KSQL- Streaming SQL for Apache Kafka
Session Variables
● Just as in MySQL, ORCL etc. there are settings to control how your CLI
behaves
● Set any property the KStreams consumers/producers will understand
● Defaults can be set in the ksql.properties file
● To see a list of currently set or default variable values:
• ksql> show properties;
● Useful examples:
• num.stream.threads=4
• commit.interval.ms=1000
• cache.max.bytes.buffering=2000000
● TIP! - Your new best friend for testing or building a demo is:
• ksql> set ‘auto.offset.reset’ = ‘earliest’;

Contenu connexe

Tendances

Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 

Tendances (20)

Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
 
Becoming an AWS Policy Ninja using AWS IAM - AWS Summit Tel Aviv 2017
Becoming an AWS Policy Ninja using AWS IAM - AWS Summit Tel Aviv 2017Becoming an AWS Policy Ninja using AWS IAM - AWS Summit Tel Aviv 2017
Becoming an AWS Policy Ninja using AWS IAM - AWS Summit Tel Aviv 2017
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Exploring KSQL Patterns
Exploring KSQL PatternsExploring KSQL Patterns
Exploring KSQL Patterns
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
 
HOT Understanding this important update optimization
HOT Understanding this important update optimizationHOT Understanding this important update optimization
HOT Understanding this important update optimization
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
 
Schema registry
Schema registrySchema registry
Schema registry
 

Similaire à KSQL – An Open Source Streaming Engine for Apache Kafka

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 

Similaire à KSQL – An Open Source Streaming Engine for Apache Kafka (20)

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache Kafka
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 

Plus de Kai Wähner

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 

Plus de Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

KSQL – An Open Source Streaming Engine for Apache Kafka

  • 1. 1Confidential KSQL An Open Source Streaming SQL Engine for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  • 2. 2KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 3. 3KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 4. 4KSQL- Streaming SQL for Apache Kafka Apache Kafka - A Distributed, Scalable Commit Log
  • 5. 5KSQL- Streaming SQL for Apache Kafka The Log ConnectorsConnectors Producer Consumer Streaming Engine Apache Kafka – The Rise of a Streaming Platform
  • 6. 6KSQL- Streaming SQL for Apache Kafka Apache Kafka – The Rise of a Streaming Platform
  • 7. 7KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  • 8. 8KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 9. 9KSQL- Streaming SQL for Apache Kafka Why KSQL? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java
  • 10. 10KSQL- Streaming SQL for Apache Kafka Trade-Offs • subscribe() • poll() • send() • flush() • mapValues() • filter() • punctuate() • Select…from… • Join…where… • Group by.. Flexibility Simplicity Kafka Streams KSQL Consumer Producer
  • 11. 11KSQL- Streaming SQL for Apache Kafka What is it for ? Streaming ETL • Kafka is popular for data pipelines • KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 12. 12KSQL- Streaming SQL for Apache Kafka What is it for ? Simple Derivations of Existing Topics • One-liner to re-partition and / or re-key a topic for new uses CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id;
  • 13. 13KSQL- Streaming SQL for Apache Kafka What is it for ? Analytics, e.g. Anomaly Detection • Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTES) GROUP BY card_number HAVING count(*) > 3;
  • 14. 14KSQL- Streaming SQL for Apache Kafka What is it for ? Real Time Monitoring • Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 15. 15KSQL- Streaming SQL for Apache Kafka Where is KSQL not such a great fit (at least today)? Powerful ad-hoc query ○ Limited span of time usually retained in Kafka ○ No indexes BI reports (Tableau etc.) ○ No indexes ○ No JDBC (most Bi tools are not good with continuous results!)
  • 16. 16KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 17. 17KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  • 18. 18KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● No need for source code • Zero, none at all, not even one line. • No SerDes, no generics, no lambdas, ... ● All the Kafka Streams “magic” out-of-the-box • Exactly Once Semantics • Windowing • Event-time aggregation • Late-arriving data • Distributed, fault-tolerant, scalable, ...
  • 19. 19KSQL- Streaming SQL for Apache Kafka STREAM and TABLE as first-class citizens
  • 20. 20KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  • 21. 21KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  • 22. 22KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 23. 23KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries ● Three types supported (same as Kafka Streams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 24. 24KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 25. 25KSQL- Streaming SQL for Apache Kafka Create a STREAM and a TABLE from Kafka Topics ksql> CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews', value_format='DELIMITED'); ksql> CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH (kafka_topic='users', value_format='JSON'); ksql> SELECT pageid FROM pageviews_original LIMIT 3; ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
  • 26. 26KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Hello World
  • 27. 27KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 28. 28KSQL- Streaming SQL for Apache Kafka KSQL - Components KSQL has 3 main components: 1. The CLI, designed to be familiar to users of MySQL, Postgres etc. 2. The Engine which actually runs the Kafka Streams topologies 3. The REST server interface enables an Engine to receive instructions from the CLI (Note that you also need a Kafka Cluster… KSQL is deployed independently)
  • 29. 29KSQL- Streaming SQL for Apache Kafka Kafka Cluster JVM KSQL EngineRESTKSQL> #1 STAND-ALONE AKA ‘LOCAL MODE’
  • 30. 30KSQL- Streaming SQL for Apache Kafka #1 STAND-ALONE AKA ‘LOCAL MODE’ Starts a CLI, an Engine, and a REST server all in the same JVM Ideal for laptop development • Start with default settings: • > bin/ksql-cli local Or with customized settings: • > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
  • 31. 31KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Kafka Cluster JVM KSQL Engine REST KSQL> JVM KSQL Engine REST JVM KSQL Engine REST
  • 32. 32KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Start any number of Server nodes • > bin/ksql-server-start Start any number of CLIs and specify ‘remote’ server address • >bin/ksql-cli remote http://myserver:8090 All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up / down without restart
  • 33. 33KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Kafka Cluster JVM KSQL Engine JVM KSQL Engine JVM KSQL Engine
  • 34. 34KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Running the KSQL server with a pre-defined set of instructions/queries • Version control your queries and transformations as code Start any number of Engine instances • Pass a file of KSQL statements to execute • > bin/ksql-node query-file=foo/bar.sql All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  • 35. 35KSQL- Streaming SQL for Apache Kafka Dedicating resources
  • 36. 36KSQL- Streaming SQL for Apache Kafka How do you deploy applications?
  • 37. 37KSQL- Streaming SQL for Apache Kafka Where to develop and operate your applications?
  • 38. 38KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 39. 39KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis Kafka Producer Elastic search Grafana Kafka Cluster Kafka Connect KSQL Stream of Log Events
  • 40. 40KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis • https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo#clickstream-analysis • Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana • 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I • Setup in 5 minutes (with or without Docker) SELECT STREAM CEIL(timestamp TO HOUR) AS timeWindow, productId, COUNT(*) AS hourlyOrders, SUM(units) AS units FROM Orders GROUP BY CEIL(timestamp TO HOUR), productId; timeWindow | productId | hourlyOrders | units ------------+-----------+--------------+------- 08:00:00 | 10 | 2 | 5 08:00:00 | 20 | 1 | 8 09:00:00 | 10 | 4 | 22 09:00:00 | 40 | 1 | 45 ... | ... | ... | ...
  • 41. 41KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Clickstream Analysis
  • 42. 42KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 43. 43KSQL- Streaming SQL for Apache Kafka KSQL Quick Start github.com/confluentinc/ksql Local runtime or Docker container
  • 44. 44KSQL- Streaming SQL for Apache Kafka Remember: Developer Preview! Caveats of Developer Preview • No ORDER BY yet • No Stream-stream joins yet • Limited function library • Avro support only via workaround • Breaking API / Syntax changes still possible BE EXCITED, BUT BE ADVISED
  • 45. 45KSQL- Streaming SQL for Apache Kafka Resources and Next Steps Get Involved • Try the Quickstart on GitHub • Check out the code • Play with the examples The point of a developer preview is to improve things—together! https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
  • 46. 46KSQL- Streaming SQL for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.confluent.io www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me… Come to our booth… Come to Kafka Summit London in April 2018…
  • 47. 47KSQL- Streaming SQL for Apache Kafka Appendix
  • 48. 48KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream • One record per key (per window) • Current values (compacted topic) not yet • Changelog ● STREAM – TABLE Joins
  • 49. 49KSQL- Streaming SQL for Apache Kafka Schema & Format ● A Kafka broker knows how to move bytes • Technically a key-value message (byte[], byte[]) ● To enable declarative SQL-like queries and transformations we have to define a richer structure ● Structural metadata maintained in an in-memory catalog • DDL is recorded in a special topic
  • 50. 50KSQL- Streaming SQL for Apache Kafka Schema & Format Start with message (value) format ● JSON - the simplest choice ● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in. Will be expanded. ● AVRO - requires that you also supply a schema-file (.avsc) Pseudo-columns are automatically provided • ROWKEY, ROWTIME - for querying the message key and timestamp • (PARTITION, OFFSET coming soon) • CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH (value_format = 'delimited', kafka_topic='my_pageview_topic');
  • 51. 51KSQL- Streaming SQL for Apache Kafka Schema & Datatypes ● varchar / string ● boolean / bool ● integer / int ● bigint / long ● double ● array(of_type) - of-type must be primitive (no nested Array or Map yet) ● map(key_type, value_type) - key-type must be string, value-type must be primitive
  • 52. 52KSQL- Streaming SQL for Apache Kafka Interactive Querying ● Great for iterative development ● LIST (or SHOW) STREAMS / TABLES ● DESCRIBE STREAM / TABLE ● SELECT • Selects rows from a KSQL stream or table. • The result of this statement will be printed out in the console. • To stop the continuous query in the CLI press Ctrl+C.
  • 53. 53KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 54. 54KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries :-) ● Three types supported (same as KStreams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 55. 55KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS SELECT ● Once your query is ready and you want to run your query non-interactively • CREATE STREAM AS SELECT ...; ● Creates a new KSQL Stream along with the corresponding Kafka topic and streams the result of the SELECT query into the topic ● To find what streams are already running: • SHOW QUERIES; ● If you need to stop one: • TERMINATE query_id;
  • 56. 56KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  • 57. 57KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS SELECT ● Once your query is ready and you want to run it non-interactively ● CREATE TABLE AS SELECT ...; ● Just like ‚CREATE STREAM AS SELECT‘ but for aggregations
  • 58. 58KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  • 59. 59KSQL- Streaming SQL for Apache Kafka Functions ● Scalar Functions: • CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE • ABS, CEIL, FLOOR, RANDOM, ROUND • StringToTimestamp, TimestampToString • GetStringFromJSON • CAST ● Aggregate Functions: • SUM, COUNT, MIN, MAX ● User- defined Functions: • Java Interface
  • 60. 60KSQL- Streaming SQL for Apache Kafka Session Variables ● Just as in MySQL, ORCL etc. there are settings to control how your CLI behaves ● Set any property the KStreams consumers/producers will understand ● Defaults can be set in the ksql.properties file ● To see a list of currently set or default variable values: • ksql> show properties; ● Useful examples: • num.stream.threads=4 • commit.interval.ms=1000 • cache.max.bytes.buffering=2000000 ● TIP! - Your new best friend for testing or building a demo is: • ksql> set ‘auto.offset.reset’ = ‘earliest’;