Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Unlocking the World of
Stream Processing with KSQL
The Streaming SQL Engine for Apache Kafka
Michael G. Noll, Confluent
@m...
Founded by the creators
of Apache Kafka
Technology Developed
while at LinkedIn
Largest Contributor and
tester of Apache Ka...
Apache Kafka Databases
SQLStream Processing
Booked hotel, flight Ordered a taxi
Chatted with friends
Listened to musicPaid money
Played a video game
Read a newspaper ...
Billing Information
Purchases
Geolocation Updates
And more such data
STREAMS of
customer data
(continuously flowing)
TABLE...
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
5+5
KSQL is the Easiest Way to Process with Kafka
Kafka
(data)
KSQL
(processing)
read,
write
network
All you need is Kafka – n...
KSQL is the Easiest Way to Process with Kafka
Runs
Everywhere
Elastic, Scalable,
Fault-Tolerant,
Distributed
Powerful Proc...
Stream processing with Kafka
Example: Using Kafka’s Streams API for writing
elastic, scalable, fault-tolerant Java and Sca...
Stream processing with Kafka
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
Sam...
Easier, faster workflow
write code in
package app
run app
write (K)SQL
Java or Scala
ksql>
Kafka Streams API KSQL
…
(1 or ...
Interactive KSQL usage
ksql> POST /query
CLI REST API1 3UI2
KSQL
are some
what
use cases?
10+5
KSQL for Data Exploration
SELECT page, user_id, status, bytes
FROM clickstream
WHERE user_agent LIKE 'Mozilla/5.0%';
An ea...
KSQL for Data Enrichment
CREATE STREAM enriched_payments AS
SELECT payment_id, u.country, total
FROM payments_stream p
LEF...
KSQL for Streaming ETL
CREATE STREAM clicks_from_vip_users AS
SELECT user_id, u.country, page, action
FROM clickstream c
L...
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW ...
KSQL for Real-Time Monitoring
CREATE TABLE failing_vehicles AS
SELECT vehicle, COUNT(*)
FROM vehicle_monitoring_stream
WIN...
KSQL for Data Transformation
CREATE STREAM clicks_by_user_id
WITH (PARTITIONS=6,
TIMESTAMP='view_time’
VALUE_FORMAT='JSON'...
Where is KSQL not such a great fit?
BI reports
• Because no indexes
• No JDBC (most BI tools are not good
with continuous ...
KSQL
does
How
work?
15+7
Shoulders of Streaming Giants
Consumer,
Producer
KSQL
Kafka Streams
powers
powers
Flexibility
Ease of Use
CREATE STREAM, C...
KSQL Architecture
KSQL
Engine
REST
API
Processing happens here,
powered by Kafka Streams
ksql>
Programmatic access from
Ja...
Runs Everywhere, Works with What You Have
Physical
…and many more…
KSQL Architecture
Kafka
(your data)
KSQL
read,
write
…
More KSQL
…
FraudTeam
…
MobileTeam
KSQLCluster
Servers form a
Kafka...
KSQL Interactive Usage
Start 1+ KSQL servers
$ ksql-server-start
Interact with
KSQL CLI, UI, etc.
$ ksql http://ksql-serve...
KSQL Headless, Non-Interactive Usage
$ ksql-server-start --queries-file application.sql
ksql>
Typically version
controlled...
Example Journey from Idea to Production
Interactive KSQL
for development and testing
Headless KSQL
for Production
Desired ...
Stream-Table
The
Duality
22+15
Stream-Table Duality
CREATE STREAM enriched_payments AS
SELECT payment_id, u.country, total
FROM payments_stream p
LEFT JO...
Do you think that’s a table you are querying ?
Stream Table
Stream-Table Duality
Alice 1
Alice 1
Charlie 5
Alice 3
Charlie 5
(Alice, 1)
(Charlie, 5)
(Alice, 3)
Alice 1
A...
Stream-Table Duality
aggregation
changelog
“materialized view”
of the stream
(like SUM, COUNT)
Stream Table
Apache Kafka Databases
Stream-Table Duality
Example: CDC from DB via Kafka to Elastic
customers
Kafka Connect
streams data in
Kafka Connect
streams data out
KSQL proc...
Example: Real-time Data Enrichment
Kafka Connect
streams data in
<wherever>
Kafka Connect
streams data out
Devices write
d...
Fault-Tolerance, powered by Kafka
Server A:
“I do stateful stream
processing, like tables,
joins, aggregations.”
“streamin...
Fault-Tolerance, powered by Kafka
Processing fails over automatically, without data loss or miscomputation.
1 Kafka consum...
Elasticity and Scalability, powered by Kafka
You can add, remove, restart servers in KSQL clusters during live operations....
Wrapping up
37
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
KSQL is the Easiest Way to Process with Kafka
Runs
Everywhere
Elastic, Scalable,
Fault-Tolerant,
Distributed
Powerful Proc...
Where to go from here
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
https://github.com/confluenti...
Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka
Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka
Prochain SlideShare
Chargement dans…5
×
Prochain SlideShare
What to Upload to SlideShare
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

6

Partager

Télécharger pour lire hors ligne

Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka

Télécharger pour lire hors ligne

Slides of my Strata London 2018 talk:
https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/65325

Abstract:

Modern businesses have data at their core, and this data is changing continuously. Stream processing is what allows you harness this torrent of information in real time, and thousands of companies use Apache Kafka as the core platform for streaming data to transform and reshape their industries. However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies require the user to write code in programming languages such as Java or Scala. This hard requirement on coding skills is preventing many companies to unlock the benefits of stream processing to their full effect.

However, imagine that instead of having to write a lot of code in a programming language like Java or Scala for your favorite stream processing technology, all you’d need to get started with stream processing is a simple SQL statement, such as: SELECT * FROM payments-kafka-stream WHERE fraudProbability > 0.8.

Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. With KSQL, there’s no need to write any code in a programming language. KSQL brings together the worlds of streams and databases by allowing you to work with your data in a stream and in a table format. Built on top of Kafka’s Streams API, KSQL supports many powerful operations, including filtering, transformations, aggregations, joins, windowing, sessionization, and much more. It is open source (Apache 2.0 licensed), distributed, scalable, fault tolerant, and real time. You’ll learn how KSQL makes it easy to get started with a wide range of stream processing use cases and how to get up and running as you explore how it all works under the hood.

Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka

  1. 1. Unlocking the World of Stream Processing with KSQL The Streaming SQL Engine for Apache Kafka Michael G. Noll, Confluent @miguno
  2. 2. Founded by the creators of Apache Kafka Technology Developed while at LinkedIn Largest Contributor and tester of Apache Kafka • Founded in 2014 • Raised $84M from Benchmark, Index, Sequoia • Transacting in 20 countries • Commercial entities in US, UK, Germany, Australia
  3. 3. Apache Kafka Databases SQLStream Processing
  4. 4. Booked hotel, flight Ordered a taxi Chatted with friends Listened to musicPaid money Played a video game Read a newspaper <add your example>
  5. 5. Billing Information Purchases Geolocation Updates And more such data STREAMS of customer data (continuously flowing) TABLE of customer profiles (continuously updated) Motivating example
  6. 6. KSQLis the Streaming SQL Engine for Apache Kafka 5+5
  7. 7. KSQL is the Easiest Way to Process with Kafka Kafka (data) KSQL (processing) read, write network All you need is Kafka – no complex deployments of bespoke systems for stream processing! CREATE STREAM CREATE TABLE SELECT …and more…
  8. 8. KSQL is the Easiest Way to Process with Kafka Runs Everywhere Elastic, Scalable, Fault-Tolerant, Distributed Powerful Processing incl. Filters, Transforms, Joins, Aggregations, Windowing Supports Streams and Tables Open Source (Apache v2) Kafka Security Integration Event-Time Processing Zero Programming in Java, Scala 0 Exactly-Once Processing
  9. 9. Stream processing with Kafka Example: Using Kafka’s Streams API for writing elastic, scalable, fault-tolerant Java and Scala applications Main Logic
  10. 10. Stream processing with Kafka CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; Same example, now with KSQL. Not a single line of Java or Scala code needed.
  11. 11. Easier, faster workflow write code in package app run app write (K)SQL Java or Scala ksql> Kafka Streams API KSQL … (1 or many instances)
  12. 12. Interactive KSQL usage ksql> POST /query CLI REST API1 3UI2
  13. 13. KSQL are some what use cases? 10+5
  14. 14. KSQL for Data Exploration SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; An easy way to inspect data in Kafka SHOW TOPICS; PRINT 'my-topic' FROM BEGINNING;
  15. 15. KSQL for Data Enrichment CREATE STREAM enriched_payments AS SELECT payment_id, u.country, total FROM payments_stream p LEFT JOIN users_table u ON p.user_id = u.user_id; Join data from a variety of sources to see the full picture 1 Stream-table join
  16. 16. KSQL for Streaming ETL CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum'; Filter, cleanse, process data while it is moving
  17. 17. KSQL for Anomaly Detection CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; Aggregate data to identify patterns or anomalies in real-time 2 … per 30sec windows 1 Aggregate data
  18. 18. KSQL for Real-Time Monitoring CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 3; Derive insights from events (IoT, sensors, etc.) and turn them into actions
  19. 19. KSQL for Data Transformation CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Quickly make derivations of existing data in Kafka 1 Re-partition the data 2 Convert data to JSON
  20. 20. Where is KSQL not such a great fit? BI reports • Because no indexes • No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries • Because no indexes to facilitate efficient random lookups on arbitrary record fields KSQL is a streaming SQL engine for Apache Kafka. For example, streaming queries run forever until explicitly terminated.
  21. 21. KSQL does How work? 15+7
  22. 22. Shoulders of Streaming Giants Consumer, Producer KSQL Kafka Streams powers powers Flexibility Ease of Use CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), … subscribe(), poll(), send(), flush(), beginTransaction(), …
  23. 23. KSQL Architecture KSQL Engine REST API Processing happens here, powered by Kafka Streams ksql> Programmatic access from Java, Go, Python, .NET, … UI CLI KSQL Server (JVM process) Physical …
  24. 24. Runs Everywhere, Works with What You Have Physical …and many more…
  25. 25. KSQL Architecture Kafka (your data) KSQL read, write … More KSQL … FraudTeam … MobileTeam KSQLCluster Servers form a Kafka consumer group to process data collaboratively network
  26. 26. KSQL Interactive Usage Start 1+ KSQL servers $ ksql-server-start Interact with KSQL CLI, UI, etc. $ ksql http://ksql-server:8088 ksql> REST API
  27. 27. KSQL Headless, Non-Interactive Usage $ ksql-server-start --queries-file application.sql ksql> Typically version controlled for auditing, rollbacks, etc. REST API disabled Start 1+ KSQL servers with .sql file containing pre-defined queries.
  28. 28. Example Journey from Idea to Production Interactive KSQL for development and testing Headless KSQL for Production Desired KSQL queries have been identified REST “Hmm, let me try out this idea...”
  29. 29. Stream-Table The Duality 22+15
  30. 30. Stream-Table Duality CREATE STREAM enriched_payments AS SELECT payment_id, u.country, total FROM payments_stream p LEFT JOIN users_table u ON p.user_id = u.user_id; CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 3; Stream Table (from previous slides)
  31. 31. Do you think that’s a table you are querying ?
  32. 32. Stream Table Stream-Table Duality Alice 1 Alice 1 Charlie 5 Alice 3 Charlie 5 (Alice, 1) (Charlie, 5) (Alice, 3) Alice 1 Alice 1 Charlie 5 Alice 3 Charlie 5 Table
  33. 33. Stream-Table Duality aggregation changelog “materialized view” of the stream (like SUM, COUNT) Stream Table
  34. 34. Apache Kafka Databases Stream-Table Duality
  35. 35. Example: CDC from DB via Kafka to Elastic customers Kafka Connect streams data in Kafka Connect streams data out KSQL processes table changes in real-time
  36. 36. Example: Real-time Data Enrichment Kafka Connect streams data in <wherever> Kafka Connect streams data out Devices write directly via Kafka API KSQL joins the stream and table in real-time customers
  37. 37. Fault-Tolerance, powered by Kafka Server A: “I do stateful stream processing, like tables, joins, aggregations.” “streaming restore” of A’s local state to BChangelog Topic “streaming backup” of A’s local state KSQL Kafka A key challenge of distributed stream processing is fault-tolerant state. State is automatically migrated in case of server failure Server B: “I restore the state and continue processing where server A stopped.”
  38. 38. Fault-Tolerance, powered by Kafka Processing fails over automatically, without data loss or miscomputation. 1 Kafka consumer group rebalance is triggered 2 Processing and state of #3 is migrated via Kafka to remaining servers #1 + #2 #3 died so #1 and #2 take over 1 Kafka consumer group rebalance is triggered 2 Part of processing incl. state is migrated via Kafka from #1 + #2 to server #3 #3 is back so the work is split again
  39. 39. Elasticity and Scalability, powered by Kafka You can add, remove, restart servers in KSQL clusters during live operations. 1 Kafka consumer group rebalance is triggered 2 Part of processing incl. state is migrated via Kafka to additional server processes “We need more processing power!” Kafka consumer group rebalance is triggered 1 2 Processing incl. state of stopped servers is migrated via Kafka to remaining servers “Ok, we can scale down again.”
  40. 40. Wrapping up 37
  41. 41. KSQLis the Streaming SQL Engine for Apache Kafka
  42. 42. KSQL is the Easiest Way to Process with Kafka Runs Everywhere Elastic, Scalable, Fault-Tolerant, Distributed Powerful Processing incl. Filters, Transforms, Joins, Aggregations, Windowing Supports Streams and Tables Open Source (Apache v2) Kafka Security Integration Event-Time Processing Zero Programming in Java, Scala 0 Exactly-Once Processing
  43. 43. Where to go from here http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql https://github.com/confluentinc/ksql Find me & Confluent at booth #315
  • JongminBaek3

    Jul. 8, 2020
  • JaeChangSong

    Jun. 6, 2018
  • DaveAbrahams

    May. 28, 2018
  • thossmann

    May. 26, 2018
  • StreamingAnalytics

    May. 25, 2018
  • DavidMoors1

    May. 25, 2018

Slides of my Strata London 2018 talk: https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/65325 Abstract: Modern businesses have data at their core, and this data is changing continuously. Stream processing is what allows you harness this torrent of information in real time, and thousands of companies use Apache Kafka as the core platform for streaming data to transform and reshape their industries. However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies require the user to write code in programming languages such as Java or Scala. This hard requirement on coding skills is preventing many companies to unlock the benefits of stream processing to their full effect. However, imagine that instead of having to write a lot of code in a programming language like Java or Scala for your favorite stream processing technology, all you’d need to get started with stream processing is a simple SQL statement, such as: SELECT * FROM payments-kafka-stream WHERE fraudProbability > 0.8. Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. With KSQL, there’s no need to write any code in a programming language. KSQL brings together the worlds of streams and databases by allowing you to work with your data in a stream and in a table format. Built on top of Kafka’s Streams API, KSQL supports many powerful operations, including filtering, transformations, aggregations, joins, windowing, sessionization, and much more. It is open source (Apache 2.0 licensed), distributed, scalable, fault tolerant, and real time. You’ll learn how KSQL makes it easy to get started with a wide range of stream processing use cases and how to get up and running as you explore how it all works under the hood.

Vues

Nombre de vues

1 319

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

56

Actions

Téléchargements

55

Partages

0

Commentaires

0

Mentions J'aime

6

×