Speaker: Pere Urbón-Bayes, Technical Account Manager, Confluent
The need to integrate a swarm of systems has always been present in the history of IT; however, with the advent of microservices, big data and IoT, this has simply exploded.
Through the exploration of a few use cases, this presentation will introduce stream processing, a powerful and scalable way to transform and connect applications around your business.
We will explain in this talk how Apache Kafka® and the Confluent Platform can be used to connect the diverse collection of applications that the actual business faces. Components such as KSQL where non-developers can process streaming events at scale or those that are Kafka Streams-oriented to build scalable applications to process event data.
2. 2
Overview
1. Set the stage.
2. Introducing the key concepts ( Kafka Broker, Connect and KStreams)
3. Using events for notifications and state transfer
4. Let’s build an small application
5. Conclusion
6. 7
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is Stream Processing?
authorization_attempts possible_fraud
7. 8
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
8. 9
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
9. 10
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
10. 11
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
11. 12
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
13. 14
Looking more closely: What is a Streaming Platform?
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
14. 15
Looking more closely: Kafka’s Distributed Log
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
15. 16
Kafka’s Distributed Log: A durable messaging system
Kafka is similar to a traditional messaging system (ActiveMQ,
Rabbit,..) but with:
• Better scalability
• Fault Tolerance
• Hight Availability
• Better storage.
16. 17
The log is a simple idea
Messages are always
appended at the end
Old New
17. 18
Consumers have a position all of their own
Sally
is here
George
is here
Fred
is here
Old New
Scan Scan
Scan
20. 21
Shard data to get scalability
Messages are sent to different
partitions
Producer (1) Producer (2) Producer (3)
Cluster
of
machine
s
Partitions live on different machines
21. 22
Replicate to get fault tolerance
replicate
msg
msg
leader
Machine A
Machine B
23. 24
Linearly Scalable Architecture
Single topic:
- Many producers machines
- Many consumer machines
- Many Broker machines
No Bottleneck!!
Consumers
Producers
KAFKA
24. 25
Clusters can be connected to provide Worldwide, localized views
25
NY
London
Tokyo
Replicator Replicator
Replicator
26. 27
Ingest / Egest into practically any data source
Kafka
Connect
Kafka
Connect
Kafka
27. 28
List of Kafka Connect sources and sinks (and more…)
Amazon S3
Elasticsearch
HDFS
JDBC
Couchbase
Cassandra
Oracle
SAP
Vertica
Blockchain
JMX
Kenesis
MongoDB
MQTT
NATS
Postgres
Rabbit
Redis
Twitter
DynamoDB
FTP
Github
BigQuery
Google Pub Sub
RethinkDB
Salesforce
Solr
Splunk
28. 29
The Kafka Streams API / KSQL
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
29. 30
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
Engine for Continuous Computation
30. 31
But it’s just an API
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
builder.stream(”caterpillars")
.map((k, v) -> coolTransformation(k, v))
.to(“butterflies”);
new KafkaStreams(builder.build(), props()).start();
}
31
32. 33
Windows / Retention – Handle Late Events
The asynchronous dilemma: Who was first? The order or the payment?
KAFKA
Payments
Orders
Buffer 5 mins
Emailer
Join by Key
33. 34
KAFKA
Payments
Orders
Buffer 5 mins
Emailer
Join by Key
KStream orders = builder.stream(“Orders”);
KStream payments = builder.stream(“Payments”);
orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN))
.peek((key, pair) -> emailer.sendMail(pair));
Windows / Retention – Handle Late Events
34. 35
A KTable is just a stream with infinite retention
KAFKA
Emailer
Orders, Payments
Customers
Join
35. 36
KStream orders = builder.stream(“Orders”);
KStream payments = builder.stream(“Payments”);
KTable customers = builder.table(“Customers”);
orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN))
.join(customers, (tuple, cust) -> tuple.setCust(cust))
.peek((key, tuple) -> emailer.sendMail(tuple));
KAFKA
Emailer
Orders, Payments
Customers
Join
Materialize a
table in two
lines of code!
A KTable is just a stream with infinite retention
44. 46
Buying an iPad (with REST)
• Orders Service calls Shipping
Service to tell it to ship item.
• Shipping service looks up
address to ship to (from
Customer Service)
Submit
Order
shipOrder() getCustomer()
Orders
Service
Shipping
Service
Customer
Service
Webserver
45. 47
Buying an iPad with Events for Notification
Message Broker (Kafka)
Submit
Order
Order
Created
getCustomer()
REST
Notification
Orders
Service
Shipping
Service
Customer
Service
Webserver
KAFKA
- Orders Service no longer
knows about the Shipping
service (or any other service).
Events are fire and forget.
47. 49
Event streams are the key to scalable service ecosystems
Sender has no knowledge of
who consumes the event they
send. This decouples the
system.
Orders
Service
59. 61
Key benefits of an Streaming Platform
Streams help you improve Microservices deployments in a number of ways:
• Decouple ecosystems so they are more pluggable and easier to change.
• Evolve away from legacy systems.
• Improve response time by building on asynchronicity first solution.
• Bring progressive responsiveness into the core of your platform.
• Build an agnostic central nervous system for your data systems.
As well:
• Bring data as first class citizen allowing true independence between teams (producers and
consumers).
• Safely manage the evolution of data in the ecosystem, as time passes.
• Embrace event sourcing with an immutable log that can be rewound and replayed.
• Build different (materialized) views based on each service requirements.