Complex/large-scale implementations of OSS systems, Kafka included, involve customizations and in-house developed tools and plugins. Transition from one system to another is a complicated process and making it iterative increases the chance of success. In this talk we’ll take a look at the Kafka Adaptor that enables use of Kafka Connect Sinks in the Pulsar ecosystem.
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
1. Simplifying migration
from Kafka to Pulsar
Andrey Yegorov
Senior Software Engineer at DataStax
Committer at Apache Bookkeeper
Contributor at Apache Pulsar
Pulsar Virtual Summit North America 2021
7. Prerequisite
work
● Implement GenericObject - Allow GenericRecord to wrap any Java Object
https://github.com/apache/pulsar/pull/10057
● Pulsar IO: Allow to develop Sinks that support Schema but without setting it at
build time (Sink<GenericObject>) https://github.com/apache/pulsar/pull/10034
● Add Schema.getNativeSchema https://github.com/apache/pulsar/pull/10076
● GenericObject - support KeyValue in Message#getValue()
https://github.com/apache/pulsar/pull/10107
● GenericObject: handle KeyValue with SEPARATED encoding
https://github.com/apache/pulsar/pull/10186
● Sink<GenericObject> unwrap internal AutoConsumeSchema and allow to handle
topics with KeyValue schema https://github.com/apache/pulsar/pull/10211
● And others
7
A lot of work to enable development of the KCA Sink
(kudos to my colleague Enrico Olivelli):
8. Kafka Connect
Adaptor Sink
work
● Add getPartitionIndex() to the Record<>
https://github.com/apache/pulsar/pull/9947
● Exposed SubscriptionType in the SinkContext
https://github.com/apache/pulsar/pull/10446
● SinkContext: ability to seek/pause/resume consumer for a topic
https://github.com/apache/pulsar/pull/10498
● Add ability to use Kafka's sinks as pulsar sinks
https://github.com/apache/pulsar/pull/9927
● Kafka connect sink adaptor to support non-primitive schemas
https://github.com/apache/pulsar/pull/10410
8
Done and work in progress, so far:
Thank you
Goal
Prerequisite work
Kafka Connect Adaptor Sink work
Demo
Pulsar community
Everyone who reviewed the code and contributed ideas
DataStax
and all the people whose memes I “borrowed”
Complex/large-scale implementations of OSS systems, Kafka included, involve customizations and in-house developed tools and plugins.
Transition from one system to another is a complicated process and making it iterative increases the chance of success.
Simplify move from Kafka to Pulsar for power users of Kafka who rely on integrations of Kafka with other systems.
Postpone rewrite of custom Kafka Connect Sinks to native Pulsar Sinks
Enable Pulsar integrations when corresponding Pulsar Sink does not exist but the Kafka Connect Sink does
Enable Pulsar integrations when existing Pulsar Sink’s behavior or functionality does not match what the integration rely on
Let’s use something more exciting than a simple FileStreamSinkConnector
AmazonKinesisSinkConnector it is
Let’s use mock kinesis for simplicity
And run it all locally
We took a Kafka Connect Sink
Packaged it for use with Pulsar
Configured it to send messages to Kinesis
Sent a message to Pulsar
And the message appeared in Kinesis!