Speakers: Jay Smith, Cloud Customer Engineer, Google Cloud + Gwen Shapira, Product Manager, Confluent
Curious about Apache Kafka®? Find out why you would want to use the de facto standard for real-time streaming, the easiest way to get started and how to leverage the extensive Apache Kafka ecosystem. In this chat, we'll talk about three common use cases, review stream processing patterns and discuss integration with important GCP services such as BigQuery. We'll also demo how to implement real-time clickstream analytics on Confluent Cloud, fully managed Apache Kafka as a service.
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
First Steps with Apache Kafka on Google Cloud Platform
1. Cloud OnAir
CE TV: First Steps
with Apache Kafka
on Google Cloud
Platform
Gwen Shapira
Principal Data Architect,
Confluent
Jay Smith
Cloud Customer Engineer,
Google Cloud
2. Overview
Setting the scene for
stream processing
via an example
1
Introducing the key
concepts of the Kafka
Broker, Connect
and KStreams
2
Two introductory one
advanced use-cases
3
Demo
4
4. What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
5. What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
6. What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
7. What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
8. What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
9. What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
10. What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
13. The log is a simple idea
Old New
Storing all messages
for N days
Messages are added
at the end of the log
14. Consumers have a
position all of their own
Old New
Fred
is here
Sally
is here
George
is here
Scan Scan
Scan
15. Shared data to get scalability
Producer
(1)
Producer
(2)
Producer
(3)
Cluster of
machines
Messages are sent to
different partitions
Partitions live on
different machines
16. Replicate to get fault tolerance
Machine A Machine B
msgLeader
Replicate msg
24. Engine for Continuous Computation
SELECT card_number,
count(*)
FROM
authorization_attempts
WINDOW (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
26. Clickstream Analysis and Enrichment
Google
BigQuery
App
Mobile
App
Log file
REST
Proxy
Connector
Connector
KSQL
ConnectorOracle
27. Clickstream Analysis and Enrichment
App
Mobile
App
Log file
REST
Proxy
Connector
KSQL
ConnectorOracle
Google
BigQuery
Connector
How do we get data
from Kafka to BigQuery?
28. Let me count the ways...
1
2
3
4
Batch method:
Secor to GCS and GCS to BigQuery
Pros: multiple output formats, parallelized load,
flexible partitioning
Cons: No support for Avro input, have
to go via GCS
Kafka Streaming:
Use KafkaStreams or KafkaConsumer to read from Kafka
and use BigQuery Streaming API to write to BigQuery.
Pros: Complete control
Cons: With great powers come great responsibility
Kafka Connect:
Use Kafka Connect to read from Kafka and write
to BigQuery.
Pros: Handles Avro, auto-generate schema,
handle schema updates, error handling.
Cons: Need to decide on topology, no support
for reprocessing or batch+streams
Apache Beam:
Use Google Dataflow to read from Kafka and
write to BigQuery.
Pros: Complete control, ability to run in batch
and streaming modes
Cons: Requires coding, fewer “batteries
included”
29. Real-time ETL for Data Warehouse
Legacy PHP
Web Application
Operational MySQL
New Python
Web Application
REST Proxy
Kafka
Connect
Kafka
Connect
Google
BigQuery
Legacy Mainframe
Google Data
Flow
KSQL
30. and even...
DB DBApp App Devices
Devices
Devices
Devices
MQTT
Proxy
Building
Model
Training
Data
Model
Params
KSQL
Production
ML
Model,
features,
params,
Data
Output
31. You want to use the
ecosystem. Not install,
configure, tune,
manage, troubleshoot,
get paged...
34. Try Confluent
Cloud for yourself
Request a quote for
Confluent Cloud Enterprise
cnfl.io/cce
Get started in minutes with
Confluent Cloud Professional
cnfl.io/ccp
35. That’s a wrap.
Gwen Shapira
Product Manager, Confluent
Jay Smith
Cloud Customer Engineer,
Google Cloud