Join Tom Green, Solution Engineer at Confluent for this Lunch and Learn talk covering KSQL. Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka®. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and it supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.
By attending one of these sessions, you will learn:
-How to query streams, using SQL, without writing code.
-How KSQL provides automated scalability and out-of-the-box high availability for streaming queries
-How KSQL can be used to join streams of data from different sources
-The differences between Streams and Tables in Apache Kafka
Strategies for Landing an Oracle DBA Job as a Fresher
Introduction to KSQL: Streaming SQL for Apache Kafka®
1. 1C O N F I D E N T I A L
KSQL
The streaming SQL engine for Apache Kafka®
tom.green@confluent.io
2. Confluent Partner Briefing
Agenda & Housekeeping
● Introduction to KSQL Presentation
● KSQL Demonstration
● Q & A - please ask your questions via that chat box
Follow up materials
● The slides and recording will be emailed
3. Confluent Partner Briefing
Process
streams of data
in real time, as they
occur.
11010
1
01011
1
00110
1
10001
0
3
Apache Kafka is a Distributed Streaming Platform
Publish and
subscribe to streams
of data
similar to a message
queue or enterprise
messaging system.
11010
1
01011
1
00110
1
10001
0
Store streams
of data
in a fault tolerant
way.
11010
1
01011
1
00110
1
10001
0
4. 5C O N F I D E N T I A L
KSQL
The streaming SQL engine for Apache
Kafka® to write real-time applications in SQL
5. 7C O N F I D E N T I A L
Lower the bar to enter the world of streaming
User Population
CodingSophistication
Core developers who use Java/Scala
Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
streams
6. 8C O N F I D E N T I A L
KSQL
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
streams
Lowering the
bar: KSQL vs.
Kafka
Streams
Lower the bar to enter the world of streaming
vs.
7. 9C O N F I D E N T I A L
KSQL
● You write only SQL.
No Java, Python, or
other boilerplate to
wrap around it!
● Create KSQL user
defined functions in
Java when needed.
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
8. 10C O N F I D E N T I A L
New user experience: interactive stream processing
9. 11C O N F I D E N T I A L
KSQL can be used interactively + programmatically
ksql>
1 UI
POST /query
2 CLI 3 REST 4 Headless
10. 12C O N F I D E N T I A L
All you need is Kafka and KSQL
1.Build & package
2. Submit job
required for
fault-tolerance
ksql> SELECT * FROM myStream
Without KSQL With KSQL
processing
storage
12. 14C O N F I D E N T I A L
Stream/Table Duality
13. 15C O N F I D E N T I A L
Alice + €50
The Stream-Table Duality
Stream
(payments)
Table
(balance)
time
Alice €50
Bob + €18
Alice €50 Alice €50
Bob €18
Alice + €25
Alice €50
Bob €18
Alice €75
Bob €18
Alice – €60
Alice €75
Bob €18
Alice €15
Bob €18
15. 17C O N F I D E N T I A L
Data exploration
KSQL example use cases
Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
16. 18C O N F I D E N T I A L
Example: Retail
KSQL joins the two
streams in real-time
Stream of shipments
that arrive
Stream of purchases from
online and physical stores
17. 19C O N F I D E N T I A L
Example: IoT, Automotive, Connected Cars
KSQL joins the two
streams in real-time
Kafka Connect
streams data in
Cars send telemetry data
via Kafka API
Kafka Streams application
to notify customers
18. 20C O N F I D E N T I A L
KSQL for Real-Time Monitoring
● Log data monitoring
● Tracking and alerting
● Syslog data
● Sensor / IoT data
● Application metrics
CREATE STREAM syslog_invalid_users AS
SELECT host, message
FROM syslog
WHERE message LIKE '%Invalid user%';
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
19. 21C O N F I D E N T I A L
KSQL for Anomaly Detection
● Identify patterns or
anomalies in real-
time data, surfaced
in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
20. 22C O N F I D E N T I A L
KSQL for Streaming ETL
● Joining, filtering, and
aggregating streams
of event data
CREATE STREAM vip_actions AS
SELECT user_id, page, action
FROM clickstream c
LEFT JOIN users u
ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
21. 23C O N F I D E N T I A L
KSQL for Data Transformation
● Easily make
derivations of
existing topics
CREATE STREAM pageviews_avro
WITH (PARTITIONS=6,
VALUE_FORMAT='AVRO') AS
SELECT * FROM pageviews_json
PARTITION BY user_id;
23. 27C O N F I D E N T I A L
Example: CDC from DB via Kafka to Elastic
KSQL processes table
changes in real-time
Kafka Connect
streams data in
Kafka Connect
streams data out
ratings
24. 28C O N F I D E N T I A L
Deployment Scalability and HA
26. 30C O N F I D E N T I A L
How to run KSQL
#1 Interactive KSQL, for development & testing
ksql>
POST /query
Kafka Cluster
(data)
KSQL Cluster
(processing)
KSQL does not run
on Kafka brokers!
...
27. 31C O N F I D E N T I A L
How to run KSQL
#2 Headless KSQL, for production
Kafka Cluster
(data)
servers started
with same
.sql file
KSQL Cluster
(processing)
...
interaction for
UI, CLI, REST
is disabled
37. 41C O N F I D E N T I A L
How to run KSQL
KSQL Server
(JVM process)
…and many more…
DEB, RPM, ZIP, TAR downloads
http://confluent.io/ksql
Docker images
confluentinc/cp-ksql-server
confluentinc/cp-ksql-cli
38. 43C O N F I D E N T I A L
Resources and Next Steps
confluentinc/ksql
http://confluent.io/ksql
http://cnfl.io/slack