17. 17
Kafka Connect: Converters
Convert between the source and sink record objects
and the binary format used to persist them in Kafka.
JSON, Avro, and others
18. 18
Kafka Connect: Single Message Transforms (SMTs)
Modify the structure of keys and values, topic,
and partition of source and sink record objects.
19. 19
Kafka Connect: Single Message Transforms (SMTs)
Modify the structure of keys and values, topics,
and partition of source and sink record objects.
21. 21
Kafka Connect: Classpath Isolation
A plugin is a directory containing the JARs for a
connector, transform, and/or converters.
JAR files, sample configs, etc.
22. 22
Kafka Connect: Classpath Isolation
my-plugins (include on the plugin.path )
JAR files, sample configs, etc.
kafka-connect-foo-connector
The plugin.path worker configuration property
lists the directories that contain plugins
23. 23
Kafka Connect: Classpath Isolation
my-plugins (include on the plugin.path )
kafka-connect-foo-connector
kafka-connect-bar-connector
Workers isolate the JARs for each connector, transform,
and converter to prevent conflicts.
24. 24
Kafka Connect: Offsets
Kafka Connect automatically and periodically commits
the progress of connectors.
Connectors restart at last committed position.
25. 25
Kafka Connect: Delivery Guarantees
Kafka Connect processes each record once
under normal operations.
But things can go wrong,
so it can only guarantee at-least-once delivery.
Sink connectors can achieve exactly once
when they store offsets in the sink system.
26. 26
Agenda
1. Kafka Connect basics
2. Choosing and using connectors
3. Planning Kafka Connect deployments
28. 28
Source Connectors
JDBC source connector works with lots of DBMSes.
Access data in tables, views, or custom queries.
Incremental mode requires
creation and/or modification columns.
Detects soft deletes only, not removed rows.
29. 29
Source Connectors
Change Data Capture (CDC) connectors
monitor system for all changes, including deleted rows.
CDC connectors often require
non-standard and source-specific APIs.
Typically detect changes only in physical tables only.
31. 31
Sink Connectors
How are topics and partitions mapped to the external system?
Some sink connectors are more flexible than others.
32. 32
Sink Connectors
Is the sink connector at least once or exactly once?
Confluent’s HDFS and S3 are exactly once.
But many at-least-once sink connectors are idempotent.
33. 33
Choosing and using connectors
Use a playground that is easy to clean up and restart.
Confluent CLI is perfect for this!
34. 34
Confluent Command Line Interface (CLI)
Utility to easily operate Kafka-related services
on your local machine.
$ confluent start
$ confluent current
$ confluent help
$ confluent logs
$ confluent top
$ confluent status
$ confluent status connectors
$ confluent config <connector_name>
$ confluent load <connector_name>
$ confluent status <connector_name>
$ confluent unload <connector_name>
$ confluent stop connect
$ confluent stop
$ confluent destroy
Uses configuration files in
`etc/kafka` and
`etc/schema-registry`
Stores data in temporary directory:
$CONFLUENT_CURRENT or
`confluent current`
Zookeeper, Kafka, Schema Registry,
Kafka REST, and Connect Distributed
36. 36
Agenda
1. Kafka Connect basics
2. Choosing and using connectors
3. Planning Kafka Connect deployments
37. 37
Planning Kafka Connect deployments
Understand the schemas of your records.
Confluent Schema Registry and Avro Converters
make schema evolution possible.
Producers and consumers can
adapt to new schemas at different times.
Enforce forward and/or backward compatibility.
38. 38
Planning Kafka Connect deployments
Install connectors on all workers in the cluster,
not just one of the workers.
39. 39
Planning Kafka Connect deployments
Kafka Connect is a simple Java application,
so it can be deployed on many systems,
including Kubernetes, Mesos, EC2, etc.
40. 40
Planning Kafka Connect deployments
Don’t overload your workers.
If they are too busy, they may skip heartbeats
and drop out of the cluster, causing a rebalance.
(improvement: KAFKA-5741)
41. 41
Planning Kafka Connect deployments
Tune the producers and consumers.
• producers and consumers
• offset commit intervals
• poll intervals, batch sizes, # of tasks
42. 42
Planning Kafka Connect deployments
Minimize rebalances.
If you need more isolation and control,
use separate worker clusters.