Kafka Streams is a client library for processing and transforming streams of data stored in Apache Kafka clusters. It allows embedding stream processing logic directly into applications using a simple Java DSL. Kafka Streams applications can perform stateful transformations like filtering, mapping, aggregations and joins on Kafka data. The processing is integrated with Kafka's storage and replication capabilities to ensure exactly-once semantics even in the cloud.
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Kafka Streams: From the grounds up to the Cloud
1. Kafka Streams:
From the grounds up to the Cloud
Marius Bogoevici, Chief Architect, Red Hat
Spring One Platform, Dec 4, 2017
@mariusbogoevici
2. Marius Bogoevici
● Chief Architect, Data Streaming at Red Hat
● Spring ecosystem contributor since 2008
○ Spring Integration
● Spring team member between 2014 and 2017
○ Spring XD, Spring Integration Kafka
○ Spring Cloud Stream project lead
● Co-author “Spring Integration in Action”, Manning, 2012
3. Kafka: from messaging system to streaming platform
(based on https://www.confluent.io/blog/apache-kafka-goes-1-0/)
Distributed
log
Replication,
Fault
tolerance
Connect and
Streams
Transactions,
Exactly once
4. How about
applications that are
both producers and
consumers and
perform complex
computations?
Kafka as a distributed messaging system
5. Kafka Streams
● Client library for stream processing
○ Embed stream processing features into
regular Java applications (microservice
model)
○ Create sophisticated topologies of
independent applications
● Functional transformations via DSL:
○ Mapping, filtering, flatMap
○ Aggregation, joins (multiple topics)
○ Windowing
● Kafka-to-Kafka semantics
● One-record-at-a-time processing (no
microbatching)
● Stateful processing support
● Transactions/exactly once
Kafka Cluster
Application
Kafka Streams
6. Kafka Streams - important concepts
● KStream
○ Record stream abstraction
○ Read from/written to external topic or produced from other KStream via operators such as
map/filter
● KTable/GlobalKTable
○ Changelog stream abstraction (key is meaningful)
○ Read from external topic as a sequence of updates
○ Produced from other tables or stream joins, aggregations etc
● State Store
○ Key-value store for intermediate aggregation data, KTable materialized views, arbitrary
key-value data produced during
○ Replicated externally
● Time windowing
8. Kafka Streams stateful processing (default stores)
Kafka Cluster
Application
Kafka Streams
In-memory
state store
Local disk
● Pluggable state store model
● Key-value data store
● Default strategy:
○ In-memory (fast access)
○ Local disk (for fast recovery)
○ Replicated to Kafka (for resilience)
● Tightly integrated with Kafka: state
updates are correlated with offset commits
changelog
9. Spring Cloud Stream
● Event-driven microservice framework
● Developer focus on writing business code
● Middleware-agnostic programming model
● Binders:
○ Kafka
○ RabbitMQ
○ AWS Kinesis
○ Google Pub Sub
○ Apache Artemis (community)
● Easy to deploy with Spring Cloud Data
Flow
10. Spring Cloud Stream KStream Processor (since 1.3)
countswords
Spring Cloud Stream
KStream API outputinput
Spring Boot
Programming
model (developer
focus)
Application model (configuration
options, StreamConfig based on Spring
Boot properties, KStreamBuilder,
KStream binder)
Externalized configuration,
uberjar construction, health
monitoring endpoints
11. Kafka Streams in the Cloud
Application
Kafka Streams
Application
Kafka Streams
Docker
uberjar
Spring Cloud Data Flow
12. Kafka Streams stateful and stateless deployments
Kafka Cluster
Application
Kafka Streams
In-memory
state store
Local disk
● Changes propagated to changelog topic
● Stored locally for recovery/restart
● Fully stateless deployments require to
replay the topic on restart/failover
● State store recovery can be optimized by
providing access to stateful deployments
changelog
13. Kafka Streams with Kubernetes StatefulSets
Application
Kafka Streams
Pod
Application
Kafka Streams
Pod
Application
Kafka Streams
Pod
volume-word-count-0
word-count-1 word-count-2
volume-word-count-1 volume-word-count-2
word-count-0