SlideShare une entreprise Scribd logo
1  sur  18
Simplifying migration
from Kafka to Pulsar
Andrey Yegorov
Senior Software Engineer at DataStax
Committer at Apache Bookkeeper
Contributor at Apache Pulsar
Pulsar Virtual Summit North America 2021
Agenda
2
Thank you!
3
Problem
4
Goal
5
Diagrams
are
important
6
Pulsar
Kafka Connect Adaptor Sink
Kafka Connect Sink
Incoming
Data
Third-party system
Outgoing
Data
Prerequisite
work
● Implement GenericObject - Allow GenericRecord to wrap any Java Object
https://github.com/apache/pulsar/pull/10057
● Pulsar IO: Allow to develop Sinks that support Schema but without setting it at
build time (Sink<GenericObject>) https://github.com/apache/pulsar/pull/10034
● Add Schema.getNativeSchema https://github.com/apache/pulsar/pull/10076
● GenericObject - support KeyValue in Message#getValue()
https://github.com/apache/pulsar/pull/10107
● GenericObject: handle KeyValue with SEPARATED encoding
https://github.com/apache/pulsar/pull/10186
● Sink<GenericObject> unwrap internal AutoConsumeSchema and allow to handle
topics with KeyValue schema https://github.com/apache/pulsar/pull/10211
● And others
7
A lot of work to enable development of the KCA Sink
(kudos to my colleague Enrico Olivelli):
Kafka Connect
Adaptor Sink
work
● Add getPartitionIndex() to the Record<>
https://github.com/apache/pulsar/pull/9947
● Exposed SubscriptionType in the SinkContext
https://github.com/apache/pulsar/pull/10446
● SinkContext: ability to seek/pause/resume consumer for a topic
https://github.com/apache/pulsar/pull/10498
● Add ability to use Kafka's sinks as pulsar sinks
https://github.com/apache/pulsar/pull/9927
● Kafka connect sink adaptor to support non-primitive schemas
https://github.com/apache/pulsar/pull/10410
8
Done and work in progress, so far:
Demo
9
Plan
10
Setup mock
Kinesis
$ brew install awscli
$ aws configure
Use ("mock-kinesis-access-key", "mock-kinesis-secret-key") for access/secret keys
correspondingly when asked.
Follow modified steps from https://github.com/etspaceman/kinesis-mock:
$ docker pull ghcr.io/etspaceman/kinesis-mock:0.0.4
$ docker run -p 443:4567 -p 4568:4568 ghcr.io/etspaceman/kinesis-mock:0.0.4
Note port 443 in the mapping. Docker will still show something like:
k.m.KinesisMockService - Starting Kinesis Http2 Mock Service on port 4567
k.m.KinesisMockService - Starting Kinesis Http1 Plain Mock Service on port 4568
Create Kinesis stream:
$ aws kinesis create-stream --endpoint-url https://localhost/ --no-verify-ssl --
stream-name test-kinesis --shard-count 1
11
Build AWS
Kinesis-Kafka
Connector
Get code from https://github.com/awslabs/kinesis-kafka-connector
Make it skip certificate verification (for Kinesis mock):
diff --git a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
index f86f3fd..2920fb8 100644
--- a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
+++ b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
@@ -359,6 +359,8 @@ public class AmazonKinesisSinkTask extends SinkTask {
// The namespace to upload metrics under.
config.setMetricsNamespace(metricsNameSpace);
+ config.setVerifyCertificate(false);
+ return new KinesisProducer(config);
}
Build it/install into local maven repo:
$ mvn clean install -DskipTest
12
Package
Build a nar with kinesis connector included:
diff --git a/pulsar-io/kafka-connect-adaptor-nar/pom.xml b/pulsar-io/kafka-connect-adaptor-
nar/pom.xml
index ea9bedbd056..c7fa9a1ebca 100644
--- a/pulsar-io/kafka-connect-adaptor-nar/pom.xml
+++ b/pulsar-io/kafka-connect-adaptor-nar/pom.xml
@@ -36,6 +36,11 @@
<artifactId>pulsar-io-kafka-connect-adaptor</artifactId>
<version>${project.version}</version>
</dependency>
+ <dependency>
+ <groupId>com.amazonaws</groupId>
+ <artifactId>amazon-kinesis-kafka-connector</artifactId>
+ <version>0.0.9-SNAPSHOT</version>
+ </dependency>
</dependencies>
Build it:
$ mvn -f pulsar-io/kafka-connect-adaptor-nar/pom.xml clean package -
DskipTests 13
Let’s roll
Start pulsar standalone:
$ bin/pulsar standalone
Run the sink:
$ bin/pulsar-admin sinks localrun -a ./pulsar-io/kafka-connect-adaptor-
nar/target/pulsar-io-kafka-connect-adaptor-nar-2.8.0-SNAPSHOT.nar --name
kwrap --namespace public/default/ktest --parallelism 1 -i my-topic --sink-
config-file ~/sink-kinesis.yaml
14
Config
$ cat ~/sink-kinesis.yaml
processingGuarantees: "EFFECTIVELY_ONCE"
configs:
"topic": "my-topic"
"offsetStorageTopic": "kafka-connect-sink-offset-kinesis"
"pulsarServiceUrl": "pulsar://localhost:6650/"
"kafkaConnectorSinkClass": "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector"
"kafkaConnectorConfigProperties":
"name": "test-kinesis-sink"
'connector.class': "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector"
"tasks.max": "1"
"topics": "my-topic"
"kinesisEndpoint": "localhost"
"region": "us-east-1"
"streamName": "test-kinesis"
"singleKinesisProducerPerPartition": "true"
"pauseConsumption": "true"
"maxConnections": "1"
15
Properties passed
to the Kafka Connect Sink
Action!
Produce message to pulsar topic:
$ bin/pulsar-client produce my-topic --messages "Hello"
Read data from Kinesis:
# Get shard iterator for kinesis and use it later:
$ aws kinesis get-shard-iterator --shard-id shardId-000000000000 --
shard-iterator-type TRIM_HORIZON --stream-name test-kinesis --endpoint-
url https://localhost/ --no-verify-ssl
$ aws kinesis get-records --endpoint-url https://localhost/ --no-
verify-ssl --shard-iterator <SHARD_ITERATOR_HERE>
{"SequenceNumber":
"49618471738282782665106189312850320303184854662386810882",
"ApproximateArrivalTimestamp": "2021-05-21T14:08:35-07:00",
"Data": "SGVsbG8=",
"PartitionKey": "0",
"EncryptionType": "NONE"}
https://www.base64decode.org/ tells us that “SGVsbG8=” is “Hello”.
16
17
Thank
you!
THE
END
18

Contenu connexe

Tendances

Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
confluent
 

Tendances (20)

Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarni
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
 
Building a FaaS with pulsar
Building a FaaS with pulsarBuilding a FaaS with pulsar
Building a FaaS with pulsar
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Getting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison HighamGetting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison Higham
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
 
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Interactive querying of streams using Apache Pulsar_Jerry peng
Interactive querying of streams using Apache Pulsar_Jerry pengInteractive querying of streams using Apache Pulsar_Jerry peng
Interactive querying of streams using Apache Pulsar_Jerry peng
 
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
Synchronous Commands over Apache Kafka (Neil Buesing, Object Partners, Inc) K...
 

Similaire à Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021

Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
andymccurdy
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
Doug Chang
 

Similaire à Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021 (20)

Backend frx for movmi
Backend frx for movmiBackend frx for movmi
Backend frx for movmi
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Java 6 [Mustang] - Features and Enchantments
Java 6 [Mustang] - Features and Enchantments Java 6 [Mustang] - Features and Enchantments
Java 6 [Mustang] - Features and Enchantments
 
Continuous Delivery: The Next Frontier
Continuous Delivery: The Next FrontierContinuous Delivery: The Next Frontier
Continuous Delivery: The Next Frontier
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
 
Nginx Reverse Proxy with Kafka.pptx
Nginx Reverse Proxy with Kafka.pptxNginx Reverse Proxy with Kafka.pptx
Nginx Reverse Proxy with Kafka.pptx
 
What’s new in cas 4.2
What’s new in cas 4.2 What’s new in cas 4.2
What’s new in cas 4.2
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
Jakarta RESTful Web Services: Status Quo and Roadmap | JakartaOne Livestream
Jakarta RESTful Web Services: Status Quo and Roadmap | JakartaOne LivestreamJakarta RESTful Web Services: Status Quo and Roadmap | JakartaOne Livestream
Jakarta RESTful Web Services: Status Quo and Roadmap | JakartaOne Livestream
 
Training
TrainingTraining
Training
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Extend and build on Kubernetes
Extend and build on KubernetesExtend and build on Kubernetes
Extend and build on Kubernetes
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
Presentation on Japanese doc sprint
Presentation on Japanese doc sprintPresentation on Japanese doc sprint
Presentation on Japanese doc sprint
 
Full Stack Scala
Full Stack ScalaFull Stack Scala
Full Stack Scala
 

Plus de StreamNative

Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 

Plus de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021

  • 1. Simplifying migration from Kafka to Pulsar Andrey Yegorov Senior Software Engineer at DataStax Committer at Apache Bookkeeper Contributor at Apache Pulsar Pulsar Virtual Summit North America 2021
  • 6. Diagrams are important 6 Pulsar Kafka Connect Adaptor Sink Kafka Connect Sink Incoming Data Third-party system Outgoing Data
  • 7. Prerequisite work ● Implement GenericObject - Allow GenericRecord to wrap any Java Object https://github.com/apache/pulsar/pull/10057 ● Pulsar IO: Allow to develop Sinks that support Schema but without setting it at build time (Sink<GenericObject>) https://github.com/apache/pulsar/pull/10034 ● Add Schema.getNativeSchema https://github.com/apache/pulsar/pull/10076 ● GenericObject - support KeyValue in Message#getValue() https://github.com/apache/pulsar/pull/10107 ● GenericObject: handle KeyValue with SEPARATED encoding https://github.com/apache/pulsar/pull/10186 ● Sink<GenericObject> unwrap internal AutoConsumeSchema and allow to handle topics with KeyValue schema https://github.com/apache/pulsar/pull/10211 ● And others 7 A lot of work to enable development of the KCA Sink (kudos to my colleague Enrico Olivelli):
  • 8. Kafka Connect Adaptor Sink work ● Add getPartitionIndex() to the Record<> https://github.com/apache/pulsar/pull/9947 ● Exposed SubscriptionType in the SinkContext https://github.com/apache/pulsar/pull/10446 ● SinkContext: ability to seek/pause/resume consumer for a topic https://github.com/apache/pulsar/pull/10498 ● Add ability to use Kafka's sinks as pulsar sinks https://github.com/apache/pulsar/pull/9927 ● Kafka connect sink adaptor to support non-primitive schemas https://github.com/apache/pulsar/pull/10410 8 Done and work in progress, so far:
  • 11. Setup mock Kinesis $ brew install awscli $ aws configure Use ("mock-kinesis-access-key", "mock-kinesis-secret-key") for access/secret keys correspondingly when asked. Follow modified steps from https://github.com/etspaceman/kinesis-mock: $ docker pull ghcr.io/etspaceman/kinesis-mock:0.0.4 $ docker run -p 443:4567 -p 4568:4568 ghcr.io/etspaceman/kinesis-mock:0.0.4 Note port 443 in the mapping. Docker will still show something like: k.m.KinesisMockService - Starting Kinesis Http2 Mock Service on port 4567 k.m.KinesisMockService - Starting Kinesis Http1 Plain Mock Service on port 4568 Create Kinesis stream: $ aws kinesis create-stream --endpoint-url https://localhost/ --no-verify-ssl -- stream-name test-kinesis --shard-count 1 11
  • 12. Build AWS Kinesis-Kafka Connector Get code from https://github.com/awslabs/kinesis-kafka-connector Make it skip certificate verification (for Kinesis mock): diff --git a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java index f86f3fd..2920fb8 100644 --- a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java +++ b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java @@ -359,6 +359,8 @@ public class AmazonKinesisSinkTask extends SinkTask { // The namespace to upload metrics under. config.setMetricsNamespace(metricsNameSpace); + config.setVerifyCertificate(false); + return new KinesisProducer(config); } Build it/install into local maven repo: $ mvn clean install -DskipTest 12
  • 13. Package Build a nar with kinesis connector included: diff --git a/pulsar-io/kafka-connect-adaptor-nar/pom.xml b/pulsar-io/kafka-connect-adaptor- nar/pom.xml index ea9bedbd056..c7fa9a1ebca 100644 --- a/pulsar-io/kafka-connect-adaptor-nar/pom.xml +++ b/pulsar-io/kafka-connect-adaptor-nar/pom.xml @@ -36,6 +36,11 @@ <artifactId>pulsar-io-kafka-connect-adaptor</artifactId> <version>${project.version}</version> </dependency> + <dependency> + <groupId>com.amazonaws</groupId> + <artifactId>amazon-kinesis-kafka-connector</artifactId> + <version>0.0.9-SNAPSHOT</version> + </dependency> </dependencies> Build it: $ mvn -f pulsar-io/kafka-connect-adaptor-nar/pom.xml clean package - DskipTests 13
  • 14. Let’s roll Start pulsar standalone: $ bin/pulsar standalone Run the sink: $ bin/pulsar-admin sinks localrun -a ./pulsar-io/kafka-connect-adaptor- nar/target/pulsar-io-kafka-connect-adaptor-nar-2.8.0-SNAPSHOT.nar --name kwrap --namespace public/default/ktest --parallelism 1 -i my-topic --sink- config-file ~/sink-kinesis.yaml 14
  • 15. Config $ cat ~/sink-kinesis.yaml processingGuarantees: "EFFECTIVELY_ONCE" configs: "topic": "my-topic" "offsetStorageTopic": "kafka-connect-sink-offset-kinesis" "pulsarServiceUrl": "pulsar://localhost:6650/" "kafkaConnectorSinkClass": "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector" "kafkaConnectorConfigProperties": "name": "test-kinesis-sink" 'connector.class': "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector" "tasks.max": "1" "topics": "my-topic" "kinesisEndpoint": "localhost" "region": "us-east-1" "streamName": "test-kinesis" "singleKinesisProducerPerPartition": "true" "pauseConsumption": "true" "maxConnections": "1" 15 Properties passed to the Kafka Connect Sink
  • 16. Action! Produce message to pulsar topic: $ bin/pulsar-client produce my-topic --messages "Hello" Read data from Kinesis: # Get shard iterator for kinesis and use it later: $ aws kinesis get-shard-iterator --shard-id shardId-000000000000 -- shard-iterator-type TRIM_HORIZON --stream-name test-kinesis --endpoint- url https://localhost/ --no-verify-ssl $ aws kinesis get-records --endpoint-url https://localhost/ --no- verify-ssl --shard-iterator <SHARD_ITERATOR_HERE> {"SequenceNumber": "49618471738282782665106189312850320303184854662386810882", "ApproximateArrivalTimestamp": "2021-05-21T14:08:35-07:00", "Data": "SGVsbG8=", "PartitionKey": "0", "EncryptionType": "NONE"} https://www.base64decode.org/ tells us that “SGVsbG8=” is “Hello”. 16

Notes de l'éditeur

  1. Thank you Goal Prerequisite work Kafka Connect Adaptor Sink work Demo
  2. Pulsar community Everyone who reviewed the code and contributed ideas DataStax and all the people whose memes I “borrowed”
  3. Complex/large-scale implementations of OSS systems, Kafka included, involve customizations and in-house developed tools and plugins. Transition from one system to another is a complicated process and making it iterative increases the chance of success.
  4. Simplify move from Kafka to Pulsar for power users of Kafka who rely on integrations of Kafka with other systems. Postpone rewrite of custom Kafka Connect Sinks to native Pulsar Sinks Enable Pulsar integrations when corresponding Pulsar Sink does not exist but the Kafka Connect Sink does Enable Pulsar integrations when existing Pulsar Sink’s behavior or functionality does not match what the integration rely on
  5. Let’s use something more exciting than a simple FileStreamSinkConnector AmazonKinesisSinkConnector it is Let’s use mock kinesis for simplicity And run it all locally
  6. We took a Kafka Connect Sink Packaged it for use with Pulsar Configured it to send messages to Kinesis Sent a message to Pulsar And the message appeared in Kinesis!