First Steps with Apache Kafka on Google Cloud Platform

Cloud OnAir
CE TV: First Steps
with Apache Kafka
on Google Cloud
Platform
Gwen Shapira
Principal Data Architect,
Confluent
Jay Smith
Cloud Customer Engineer,
Google Cloud

Overview
Setting the scene for
stream processing
via an example
1
Introducing the key
concepts of the Kafka
Broker, Connect
and KStreams
2
Two introductory one
advanced use-cases
3
Demo
4

Kafka is a
Streaming
Platform
The Log
Producer Consumer
Connectors Connectors
Streaming engine

What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud

What exactly is Stream Processing?
A simple example
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;

Streaming is the
toolset for dealing
with events as
they move!

Producer Consumer
Streaming engine
The Log
Looking more
closely: Kafka’s
Distributed Log

The log is a simple idea
Old New
Storing all messages
for N days
Messages are added
at the end of the log

Consumers have a
position all of their own
Old New
Fred
is here
Sally
is here
George
is here
Scan Scan
Scan

Shared data to get scalability
Producer
(1)
Producer
(2)
Producer
(3)
Cluster of
machines
Messages are sent to
different partitions
Partitions live on
different machines

Replicate to get fault tolerance
Machine A Machine B
msgLeader
Replicate msg

Replication provides resiliency
A ‘replica’ takes over on machine failure

Linearly scalable architecture
KAFKA
Producers
Consumers
Single topic:
● Many producers machines
● Many consumer machines
● Many Broker machines
No Bottleneck!!

Consumers and
Producers in
20+ languages

The
Connect
API
The Log
Producer Consumer
Streaming engine

Ingest / Egest into practically any data source
Kafka
connect
Kafka
connect
Kafka

Ingest/Egest data from/to many data sources
and many more

The Log
Producer Consumer
Streaming engine
The Kafka
Streams API /
KSQL

Engine for Continuous Computation
SELECT card_number,
count(*)
FROM
authorization_attempts
WINDOW (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;

So…
What can we
use this for?

Clickstream Analysis and Enrichment
Google
BigQuery
App
Mobile
App
Log file
REST
Proxy
Connector
Connector
KSQL
ConnectorOracle

Clickstream Analysis and Enrichment
App
Mobile
App
Log file
REST
Proxy
Connector
KSQL
ConnectorOracle
Google
BigQuery
Connector
How do we get data
from Kafka to BigQuery?

Let me count the ways...
1
2
3
4
Batch method:
Secor to GCS and GCS to BigQuery
Pros: multiple output formats, parallelized load,
flexible partitioning
Cons: No support for Avro input, have
to go via GCS
Kafka Streaming:
Use KafkaStreams or KafkaConsumer to read from Kafka
and use BigQuery Streaming API to write to BigQuery.
Pros: Complete control
Cons: With great powers come great responsibility
Kafka Connect:
Use Kafka Connect to read from Kafka and write
to BigQuery.
Pros: Handles Avro, auto-generate schema,
handle schema updates, error handling.
Cons: Need to decide on topology, no support
for reprocessing or batch+streams
Apache Beam:
Use Google Dataflow to read from Kafka and
write to BigQuery.
Pros: Complete control, ability to run in batch
and streaming modes
Cons: Requires coding, fewer “batteries
included”

Real-time ETL for Data Warehouse
Legacy PHP
Web Application
Operational MySQL
New Python
Web Application
REST Proxy
Kafka
Connect
Kafka
Connect
Google
BigQuery
Legacy Mainframe
Google Data
Flow
KSQL

and even...
DB DBApp App Devices
Devices
Devices
Devices
MQTT
Proxy
Building
Model
Training
Data
Model
Params
KSQL
Production
ML
Model,
features,
params,
Data
Output

You want to use the
ecosystem. Not install,
configure, tune,
manage, troubleshoot,
get paged...

Try Confluent
Cloud for yourself
Request a quote for
Confluent Cloud Enterprise
cnfl.io/cce
Get started in minutes with
Confluent Cloud Professional
cnfl.io/ccp

That’s a wrap.
Gwen Shapira
Product Manager, Confluent
Jay Smith
Cloud Customer Engineer,
Google Cloud

First Steps with Apache Kafka on Google Cloud Platform

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à First Steps with Apache Kafka on Google Cloud Platform

Similaire à First Steps with Apache Kafka on Google Cloud Platform (20)

Plus de confluent

Plus de confluent (20)

Dernier

Dernier (20)

First Steps with Apache Kafka on Google Cloud Platform