Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Simplifying migration
from Kafka to Pulsar
Andrey Yegorov
Senior Software Engineer at DataStax
Committer at Apache Bookkee...
Agenda
2
Thank you!
3
Problem
4
Goal
5
Diagrams
are
important
6
Pulsar
Kafka Connect Adaptor Sink
Kafka Connect Sink
Incoming
Data
Third-party system
Outgoing
Da...
Prerequisite
work
● Implement GenericObject - Allow GenericRecord to wrap any Java Object
https://github.com/apache/pulsar...
Kafka Connect
Adaptor Sink
work
● Add getPartitionIndex() to the Record<>
https://github.com/apache/pulsar/pull/9947
● Exp...
Demo
9
Plan
10
Setup mock
Kinesis
$ brew install awscli
$ aws configure
Use ("mock-kinesis-access-key", "mock-kinesis-secret-key") for ac...
Build AWS
Kinesis-Kafka
Connector
Get code from https://github.com/awslabs/kinesis-kafka-connector
Make it skip certificat...
Package
Build a nar with kinesis connector included:
diff --git a/pulsar-io/kafka-connect-adaptor-nar/pom.xml b/pulsar-io/...
Let’s roll
Start pulsar standalone:
$ bin/pulsar standalone
Run the sink:
$ bin/pulsar-admin sinks localrun -a ./pulsar-io...
Config
$ cat ~/sink-kinesis.yaml
processingGuarantees: "EFFECTIVELY_ONCE"
configs:
"topic": "my-topic"
"offsetStorageTopic...
Action!
Produce message to pulsar topic:
$ bin/pulsar-client produce my-topic --messages "Hello"
Read data from Kinesis:
#...
17
Thank
you!
THE
END
18
Prochain SlideShare
Chargement dans…5
×
Prochain SlideShare
What to Upload to SlideShare
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

0

Partager

Télécharger pour lire hors ligne

Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021

Télécharger pour lire hors ligne

Complex/large-scale implementations of OSS systems, Kafka included, involve customizations and in-house developed tools and plugins. Transition from one system to another is a complicated process and making it iterative increases the chance of success. In this talk we’ll take a look at the Kafka Adaptor that enables use of Kafka Connect Sinks in the Pulsar ecosystem.

  • Soyez le premier à aimer ceci

Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021

  1. 1. Simplifying migration from Kafka to Pulsar Andrey Yegorov Senior Software Engineer at DataStax Committer at Apache Bookkeeper Contributor at Apache Pulsar Pulsar Virtual Summit North America 2021
  2. 2. Agenda 2
  3. 3. Thank you! 3
  4. 4. Problem 4
  5. 5. Goal 5
  6. 6. Diagrams are important 6 Pulsar Kafka Connect Adaptor Sink Kafka Connect Sink Incoming Data Third-party system Outgoing Data
  7. 7. Prerequisite work ● Implement GenericObject - Allow GenericRecord to wrap any Java Object https://github.com/apache/pulsar/pull/10057 ● Pulsar IO: Allow to develop Sinks that support Schema but without setting it at build time (Sink<GenericObject>) https://github.com/apache/pulsar/pull/10034 ● Add Schema.getNativeSchema https://github.com/apache/pulsar/pull/10076 ● GenericObject - support KeyValue in Message#getValue() https://github.com/apache/pulsar/pull/10107 ● GenericObject: handle KeyValue with SEPARATED encoding https://github.com/apache/pulsar/pull/10186 ● Sink<GenericObject> unwrap internal AutoConsumeSchema and allow to handle topics with KeyValue schema https://github.com/apache/pulsar/pull/10211 ● And others 7 A lot of work to enable development of the KCA Sink (kudos to my colleague Enrico Olivelli):
  8. 8. Kafka Connect Adaptor Sink work ● Add getPartitionIndex() to the Record<> https://github.com/apache/pulsar/pull/9947 ● Exposed SubscriptionType in the SinkContext https://github.com/apache/pulsar/pull/10446 ● SinkContext: ability to seek/pause/resume consumer for a topic https://github.com/apache/pulsar/pull/10498 ● Add ability to use Kafka's sinks as pulsar sinks https://github.com/apache/pulsar/pull/9927 ● Kafka connect sink adaptor to support non-primitive schemas https://github.com/apache/pulsar/pull/10410 8 Done and work in progress, so far:
  9. 9. Demo 9
  10. 10. Plan 10
  11. 11. Setup mock Kinesis $ brew install awscli $ aws configure Use ("mock-kinesis-access-key", "mock-kinesis-secret-key") for access/secret keys correspondingly when asked. Follow modified steps from https://github.com/etspaceman/kinesis-mock: $ docker pull ghcr.io/etspaceman/kinesis-mock:0.0.4 $ docker run -p 443:4567 -p 4568:4568 ghcr.io/etspaceman/kinesis-mock:0.0.4 Note port 443 in the mapping. Docker will still show something like: k.m.KinesisMockService - Starting Kinesis Http2 Mock Service on port 4567 k.m.KinesisMockService - Starting Kinesis Http1 Plain Mock Service on port 4568 Create Kinesis stream: $ aws kinesis create-stream --endpoint-url https://localhost/ --no-verify-ssl -- stream-name test-kinesis --shard-count 1 11
  12. 12. Build AWS Kinesis-Kafka Connector Get code from https://github.com/awslabs/kinesis-kafka-connector Make it skip certificate verification (for Kinesis mock): diff --git a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java index f86f3fd..2920fb8 100644 --- a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java +++ b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java @@ -359,6 +359,8 @@ public class AmazonKinesisSinkTask extends SinkTask { // The namespace to upload metrics under. config.setMetricsNamespace(metricsNameSpace); + config.setVerifyCertificate(false); + return new KinesisProducer(config); } Build it/install into local maven repo: $ mvn clean install -DskipTest 12
  13. 13. Package Build a nar with kinesis connector included: diff --git a/pulsar-io/kafka-connect-adaptor-nar/pom.xml b/pulsar-io/kafka-connect-adaptor- nar/pom.xml index ea9bedbd056..c7fa9a1ebca 100644 --- a/pulsar-io/kafka-connect-adaptor-nar/pom.xml +++ b/pulsar-io/kafka-connect-adaptor-nar/pom.xml @@ -36,6 +36,11 @@ <artifactId>pulsar-io-kafka-connect-adaptor</artifactId> <version>${project.version}</version> </dependency> + <dependency> + <groupId>com.amazonaws</groupId> + <artifactId>amazon-kinesis-kafka-connector</artifactId> + <version>0.0.9-SNAPSHOT</version> + </dependency> </dependencies> Build it: $ mvn -f pulsar-io/kafka-connect-adaptor-nar/pom.xml clean package - DskipTests 13
  14. 14. Let’s roll Start pulsar standalone: $ bin/pulsar standalone Run the sink: $ bin/pulsar-admin sinks localrun -a ./pulsar-io/kafka-connect-adaptor- nar/target/pulsar-io-kafka-connect-adaptor-nar-2.8.0-SNAPSHOT.nar --name kwrap --namespace public/default/ktest --parallelism 1 -i my-topic --sink- config-file ~/sink-kinesis.yaml 14
  15. 15. Config $ cat ~/sink-kinesis.yaml processingGuarantees: "EFFECTIVELY_ONCE" configs: "topic": "my-topic" "offsetStorageTopic": "kafka-connect-sink-offset-kinesis" "pulsarServiceUrl": "pulsar://localhost:6650/" "kafkaConnectorSinkClass": "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector" "kafkaConnectorConfigProperties": "name": "test-kinesis-sink" 'connector.class': "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector" "tasks.max": "1" "topics": "my-topic" "kinesisEndpoint": "localhost" "region": "us-east-1" "streamName": "test-kinesis" "singleKinesisProducerPerPartition": "true" "pauseConsumption": "true" "maxConnections": "1" 15 Properties passed to the Kafka Connect Sink
  16. 16. Action! Produce message to pulsar topic: $ bin/pulsar-client produce my-topic --messages "Hello" Read data from Kinesis: # Get shard iterator for kinesis and use it later: $ aws kinesis get-shard-iterator --shard-id shardId-000000000000 -- shard-iterator-type TRIM_HORIZON --stream-name test-kinesis --endpoint- url https://localhost/ --no-verify-ssl $ aws kinesis get-records --endpoint-url https://localhost/ --no- verify-ssl --shard-iterator <SHARD_ITERATOR_HERE> {"SequenceNumber": "49618471738282782665106189312850320303184854662386810882", "ApproximateArrivalTimestamp": "2021-05-21T14:08:35-07:00", "Data": "SGVsbG8=", "PartitionKey": "0", "EncryptionType": "NONE"} https://www.base64decode.org/ tells us that “SGVsbG8=” is “Hello”. 16
  17. 17. 17 Thank you!
  18. 18. THE END 18

Complex/large-scale implementations of OSS systems, Kafka included, involve customizations and in-house developed tools and plugins. Transition from one system to another is a complicated process and making it iterative increases the chance of success. In this talk we’ll take a look at the Kafka Adaptor that enables use of Kafka Connect Sinks in the Pulsar ecosystem.

Vues

Nombre de vues

93

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

0

Actions

Téléchargements

4

Partages

0

Commentaires

0

Mentions J'aime

0

×