1. Apache Kafka® and the Data Mesh
James Gollan
Senior Solutions Engineer, Confluent
Gnanaguru (Guru) Sattanathan
Senior Solutions Engineer, Confluent
2. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Agenda
2
Opening & Introduction
Data Mesh - A brief recap
Apache Kafka & Data Mesh
How to get started ?
Demo
5. 5
Data Mesh
A First Look
Domain
Retail
Core Banking
Institutional
...
Data
Product
6. Domain-driven
Decentralization
Local Autonomy
Per Domain
(Organizational
Concerns)
Data as a
First-class Product
Product thinking,
“Microservice for
Data”
Federated
Governance
Interoperability
Across Domains,
Network Effects
(Organizational
Concerns)
Self-serve
Data Platform
Infrastructure as a
Platform
Across Domains
1 2 3 4
The Principles of a Data Mesh
7. Principle 1: Domain-driven Decentralization
Anti-pattern: responsibility for data
becomes the domain of the DWH team
Pattern: Ownership of a data asset given to
the “local” team that is most familiar with it
Centralized
Data Ownership
Decentralized
Data Ownership
Objective: Ensure data is owned by those that truly understand it
8. Principle 2: Data as a First-Class Product
8
• Objective: Make shared data discoverable, addressable, trustworthy, secure,
so other teams can make good use of it.
• Data is treated as a true product, not a by-product.
This product thinking is important to prevent data chauvinism.
9. Principle 3: Self-serve Data Platform
9
Central infrastructure that provides real-time and historical data on demand
Objective: Make domains autonomous in their execution through rapid data provisioning
10. Principle 4: Federated Governance
10
• Objective: Independent data products can interoperate and create network effects.
• Establish global standards, like governance, that apply to all data products in the mesh.
• Ideally, these global standards and rules are applied automatically by the platform.
Domain Domain Domain Domain
Self-serve Data Platform
What is decided
locally by a domain?
What is globally?
(implemented and
enforced by platform)
Must balance between Decentralization vs. Centralization. No silver bullet!
12. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Paradigm for Data-at-Rest: Relational Databases
Databases
Slow, daily
batch processing
Simple, static
real-time queries
13. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Spaghetti: Data architectures often lack rigour
13
14. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka provides a solution. The implementation.
14
Kafka
Centralize an immutable stream of facts. Decentralize the freedom to act, adapt, and change.
15. Messaging reimagined as a 1st class data
system
01
Publish & Subscribe
to Streams of Events
02
Store
your Event Streams
03
Process & Analyze
your Events Streams
16. Data Product
Data Product
Why is Event Streaming a good fit for meshing?
16
0 1 2 3 4 5 6 1
7
Streams are real-time, low latency ⇒ Propagate data immediately.
Streams are highly scalable ⇒ Handle today’s massive data volumes.
Streams are stored, replayable ⇒ Capture real-time & historical data.
Streams are immutable ⇒ Auditable source of record.
Streams are addressable, discoverable, … ⇒ Meet key criteria for mesh data.
Streams are popular for Microservices ⇒ Adapting to Data Mesh is often easy.
18. Instantly Connect Popular Data Sources & Sinks
20+
Partner Supported
(self-managed)
Data Diode
Confluent Supported
(self-managed)
90+
Growing list of fully managed
connectors in Cloud
Amazon S3 Blob storage
30+
Kinesis
Redshift
Event Hubs
Data Lake Gen 2
Cloud Dataproc
19. Event Streaming inside a data product
19
Input
Data
Ports
Output
Data
Ports
ksqlDB to filter,
process, join,
aggregate, analyze
Stream data from
other DPs or internal
systems into ksqlDB
1 2 Stream data to internal
systems or the outside.
Pull queries can drive a
req/res API.
3
Req/Res API
Pull Queries
Use ksqlDB, Kafka Streams apps, etc. for processing data in motion.
20. Event Streaming inside a data product
20
Input
Data
Ports
Output
Data
Ports
MySQL
Sink
Connector
Source
Connector
DB client apps
work as usual
Stream data from
other Data Products
into your local DB
Stream data to the outside
with CDC and e.g. the
Outbox Pattern, ksqlDB, etc.
1 3
2
Use Kafka connectors and CDC to “streamify” classic databases.
21. 21
Ease of replication
across the Mesh
Cluster Linking & Other
Replication capabilities
Data
Product
STREAM
PROCESSOR
ksqlDB
Query is the interface
to the mesh
Events are the interface to
the mesh
22. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 22
developer.confluent.io