Presenter: Tim Berglund, Senior Director of Developer Experience, Confluent
It has become a truism in the past decade that building systems at scale, using non-relational databases, requires giving up on the transactional guarantees afforded by the relational databases of yore. ACID transactional semantics are fine, but we all know you can’t have them all in a distributed system. Or can we?
In this talk, I will argue that by designing our systems around a distributed log like Apache Kafka®, we can in fact achieve ACID semantics at scale. We can ensure that distributed write operations can be applied atomically, consistently, in isolation between services, and of course with durability. What seems to be a counterintuitive conclusion ends up being straightforwardly achievable using existing technologies, as an elusive set of properties becomes relatively easy to achieve with the right architectural paradigm underlying the application.
66. • Declarative data integration
framework
• Extensive community library of
connectors
• Horizontally scalable and fault-
tolerant
• Pretty easy to extend
Kafka
Connect
broker
broker
broker
broker
data
source
data
sink
Kafka
Connect
74. • Java API
• Filter, join, aggregate, etc.
• Locates stream processing
with your application
• Scales like a Consumer Group
(but better!)
KTable<Long, Movie> movies =
builder.table(“movies”,
Materialized.
<Long, Movie,KeyValueStore<
Bytes, byte[]>>
as(“movies-store")
.withValueSerde(movieSerde)
.withKeySerde(Serdes.Long())
);
76. CREATE TABLE movie_ratings AS
SELECT title,
SUM(rating)/COUNT(rating) AS avg_rating,
COUNT(rating) AS num_ratings
FROM ratings
LEFT OUTER JOIN movies
ON ratings.movie_id = movies.movie_id
GROUP BY title;
78. • Declarative stream processing
language
• Provides stream and table
abstractions
• Filter, join, aggregate
• Run on horizontally scalable
KSQL cluster
CREATE TABLE movie_ratings AS
SELECT title,
SUM(rating)/COUNT(rating) AS avg_rating,
COUNT(rating) AS num_ratings
FROM ratings
LEFT OUTER JOIN movies
ON ratings.movie_id = movies.movie_id
GROUP BY title;