SlideShare une entreprise Scribd logo
1  sur  66
Télécharger pour lire hors ligne
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Use Case
Blue-green deploys
with Pulsar & Envoy
in an event-driven
microservice
ecosystem
Kai Levy & Zach Walsh
Toast, Inc.
Kai and Zach both work on Toast’s
Scale team, building shared
infrastructure and solving problems of
messaging, routing and persistence at
scale.
Kai Levy
Senior Software Engineer
Toast
Zach Walsh
Senior Software Engineer
Toast
Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
We empower the restaurant
community to delight their guests,
do what they love, and thrive
Toast’s technology platform
Toast’s microservice ecosystem
How it started How it’s going
How it’s going (with Pulsar)
2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change Data Capture (CDC)
A History of
Pulsar at
Toast
2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change Data Capture (CDC)
A History of
Pulsar at
Toast
2019 Pulsar pilot
● Initial exploration & testing
● Cluster productionalization
● First features, such as migrating change data
capture
Persistence & Stability
Seamless Pulsar
failover
● RabbitMQ: potential stability issues + in-memory data-storage = lost messages
○ Manual maintenance was a big burden
● Pulsar’s data replication & automatic topic balancing eliminated these concerns
Horizontal Scalability
broker 0
…
● Supports adding more topics without manual provisioning
● Throughput has grown more than 5x without any change in architecture
broker 1 broker 2 broker 3 broker n
2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change Data Capture (CDC)
A History of
Pulsar at
Toast
2019 Pulsar pilot
● Initial exploration & testing
● Cluster productionalization
● First features, such as migrating change data
capture
2020 Full-fledged adoption
● Teams across Toast rapidly built features on top of
Pulsar to help restaurants survive the pandemic
● Decorated streams built on Pulsar, which enabled
more scalable consumers
CDC
notify-topic
Domain service
(Source of Truth)
service2
service1
service3
Full-fledged adoption
…
serviceN
CDC data decorator service
notify-topic decorated-stream
Domain service
(Source of Truth)
service1
…
serviceN
Full-fledged adoption
Order status notifications
Delivery & curbside arrival notifications for consumers
- helping restaurants pivot to digital
Full-fledged adoption
Tip pool tracking
Tip pooling information is kept up-to-date with orders
information
Loyalty points accrual
Consumer-facing loyalty programs help Toast
restaurants thrive
Restaurant availability
Third party platforms are notified when a restaurant
goes offline
2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change data capture (CDC)
A History of
Pulsar at
Toast
2019 Pulsar pilot
● Initial exploration & testing
● Cluster productionalization
● First features, such as migrating change data
capture
2020 Full-fledged adoption
● Teams across Toast rapidly built features on top of
Pulsar to help restaurants survive the pandemic
● Decorated streams built on Pulsar, which enabled
more scalable consumers
2022 Next-gen order processing
● Critical replatforming projects in development will
help Toast reach the next level of scale
● Event-driven architecture being widely used for new
features
Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
Pulsar adoption has grown steadily
user
adoption
(linear)
Toast client libraries
Providing Toast-specific functionality for free
1. Out-of-box authentication
2. Dead-letter topic guidance (+ topic registries)
3. Metric instrumentation
4. Message parsing
5. Pulsar client configuration
+
Authentication & authorization
● Automatic service authentication provided by the client libraries
○ Easy to use with any of our supported application frameworks
● Contributed a patch into the public Java client library
Dead-Letter Topics
● Standards for undeliverable messages
○ Per-subscription DLQs, or automatic
acknowledgement after redelivery
○ Integrated with service configuration
Topic registries with terraform
● Started with in-house provider
○ Now migrating to StreamNative provider
● Lets us manage namespace authorization
● Provide defaults for retention & persistence
● Central place for discovering events
Developers write infrastructure as code
Metrics
● Automatically report over 2 dozen
metrics
○ Consistent across services
● Critical for operations & monitoring
● Added our own custom metrics
● Adding APM integrations
ackLatency
ackTimeouts
auto-acknowledgements
Message Parsing
We parse Protobuf messages into friendly Kotlin data classes
● Our open-source, Kotlin-first
protocol buffer compiler
● One-line usage for engineers
building on our client
Configuration recommendations
Providing guidance around client settings
● Producer batching
● Acknowledgement timeout
● Receiver queue size
● Redelivery delay
● Unique consumer & producer names
Starting Pulsar consumer status recorder with config: {
"topicNames" : [ "persistent://…" ],
"topicsPattern" : null,
"subscriptionName" : "...",
"subscriptionType" : "Shared",
"subscriptionMode" : "Durable",
"receiverQueueSize" : 1000,
"acknowledgementsGroupTimeMicros" : 100000,
"negativeAckRedeliveryDelayMicros" : 500000,
"maxTotalReceiverQueueSizeAcrossPartitions" : 50000,
"consumerName" : null,
"ackTimeoutMillis" : 30000,
"tickDurationMillis" : 1000,
"priorityLevel" : 0,
"maxPendingChuckedMessage" : 10,
"autoAckOldestChunkedMessageOnQueueFull" : false,
"expireTimeOfIncompleteChunkedMessageMillis" : 60000,
"cryptoFailureAction" : "FAIL",
"properties" : { },
"readCompacted" : false,
"subscriptionInitialPosition" : "Latest",
"patternAutoDiscoveryPeriod" : 60,
"regexSubscriptionMode" : "PersistentOnly",
But something is still missing…
Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast (the problem)
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
Deployment and elevation practices
service v1 service v2
HTTP ingress control plane
Deployment and elevation practices
service v2
service v1
HTTP ingress control plane
Deployment and elevation practices
service v1 service v2
HTTP ingress control plane
service v2
Deployment and elevation practices
service v1 service v2
HTTP ingress control plane
service v2
Deployment and elevation practices
service v1 service v2
HTTP ingress control plane
service v2
Deployment and elevation practices
service v1 service v2
HTTP ingress control plane
service v2
Deployment and elevation practices
service v1 service v2
HTTP ingress control plane
service v2
service v1
Deployment and elevation practices
service v2
HTTP ingress control plane
service v2
shared pulsar subscription
Deploying changes to Pulsar consumers is risky
service v1 service v2
service v1 service v2
Mismatch in tooling
Our platform for request-driven service deploys was well ahead of our Pulsar
platform, causing developer frustration
User frustration
Principle of least surprise
“In interface design, always do
the least surprising thing.”
- Basics of the Unix Philosophy
Elevations & deploys should
be safe, easy, uneventful!
Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast (the solution)
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
Pulsar operational tooling
Elevations & deploys weren’t easy on Pulsar
REST services Pulsar consumers
Can I validate my deploy
before prod traffic?
✅ ❌
Can I validate with a small
amount of prod traffic?
✅ ❌
Can I easily roll back? ✅ ❌
Can I easily roll forward? ✅ ❌
Contrast: REST services & Pulsar (in 2019)
Pulsar Consumer Elevation Requirements
1. Elevate traffic to new consumers as they are set to “active” in the control plane.
2. Avoid building a single point of failure.
3. Make this reusable for other background processes at Toast.
4. No performance hit or extra infrastructure.
Some options we considered
Message Router Pattern
incoming topic
Deploy
N
Deploy
N + 1
Router
Control
Plane
blue topic
green topic
Some options we considered
Message Router Pattern - Problems
incoming topic
Deploy
N
Deploy
N + 1
Router
Control
Plane
blue topic
green topic
● But, the router is a single
point of failure
● More infrastructure to
monitor
● Two hops per message
Some options we considered
Feature Flags
● Apps use a feature flag to
know whether to connect
● But, not integrated with our
control plane
● Requires more setup for
each consumer
incoming topic
Deploy
N
Deploy
N + 1
FF Off
FF On
Some options we considered
Pausing Inactive Consumers
● The Feature Flag approach
is close
○ No extra infrastructure
○ No extra hops
● But, we’d need to integrate
it into our control plane
● Is this possible with Pulsar?
incoming topic
Deploy
N
Deploy
N + 1
inactive
active
Let’s see what the Pulsar source code has to say about pausing consumers.
What does Pulsar provide?
In Consumer.java:
Will pause() and resume() work?
Pulsar consumers Pulsar consumers with
pause()
Can I validate my deploy
before prod traffic?
❌ ✅
Can I validate with a small
amount of prod traffic?
❌ ❌
Can I easily roll back? ❌ ✅
Can I easily roll forward? ❌ ✅
What do operations look like if inactive consumers call pause()?
How do we get each consumer to call pause() or resume() at the right time?
How Would You Solve This?
● Pausing pulsar consumers is
easy. Knowing when to pause is
hard.
● Central control plane component
owns this data
● Let’s just poll that service
● What would that look like?
control plane
service Z
What’s Wrong With This?
● Used to be the pattern for
service discovery at Toast
● Subject to thundering herd
● Now, we leverage Envoy
control plane
service Z
Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
How We Leverage Envoy
Envoy at Toast
Envoy is a reverse proxy
Deployed as a sidecar, forwards requests to their destination
Envoy acts as a proxy, forwarding requests upstream.
my-service menus
GET /menus/v2/menuItems GET /v2/menuItems
envoy
Envoy is eventually consistent
Routing changes are pushed asynchronously
Envoy sidecars across the fleet are pushed updates within ~1-2min of the
change.
Control Plane
…
Envoy knows service status
It gets a push each time any deploy goes active or inactive
We can leverage this to pause() or resume() consumers.
Envoy direct responses
Using an interesting Envoy feature to avoid single points of failure
It can intercept requests and reply with a direct response! This gets
the status info into the process where the Consumer is running.
*magic config*
GET /sidecar/v1/elevation/active
{ "active": true }
my-service envoy
Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
The Pulsar Toggle
“Pulsar Toggle” implementation
Leveraging our Envoy Control Plane to toggle Pulsar consumers
A thread polls the locally-running Envoy instance and
toggles the Pulsar consumer as needed
Some “gotchas”
Eventually consistent
Consumers don’t pause immediately - updates
propagate with some latency
Start paused
Wasn’t a way to subscribe in a paused state - we made a
patch to the Java client
More advanced elevation patterns
Currently we can’t support percent elevations of pulsar
traffic onto new deploys
Receiver queue size
Critically important to tune this parameter of consumers
Results
~30
Toggle users in Prod
across pulsar consumers &
background workers
0
Outages
No added load on any
critical systems
2
Contributions
To open source - the Java
client & the Camel
integration
Increased adoption
2x
New topics
Developers are adding
topics at twice the rate
since the Pulsar toggle was
released
user
adoption
(linear)
Users Love it!
65%
Increase
reported ease of use when
deploying pulsar consumer
changes
46%
Decrease
reported risk associated with
deploying pulsar consumer
changes
Positive feedback from satisfaction surveys with our users
Key Takeaways
Integration
Strong integration
with existing systems
is critical for org-wide
adoption.
Ease of Use
As we make our
Pulsar platform easier
to use, we see more
and more adoption.
Stability
Pulsar’s stability
through big growth
has been a killer
feature for us.
Kai Levy & Zach Walsh
Thank you!
klevy@toasttab.com
zachary.walsh@toasttab.com
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
We’re Hiring!
careers.toasttab.com

Contenu connexe

Tendances

Tendances (20)

When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Event Driven Systems with Spring Boot, Spring Cloud Streams and Kafka
Event Driven Systems with Spring Boot, Spring Cloud Streams and KafkaEvent Driven Systems with Spring Boot, Spring Cloud Streams and Kafka
Event Driven Systems with Spring Boot, Spring Cloud Streams and Kafka
 
Event driven architecture with Kafka
Event driven architecture with KafkaEvent driven architecture with Kafka
Event driven architecture with Kafka
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Grafana.pptx
Grafana.pptxGrafana.pptx
Grafana.pptx
 

Similaire à Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosystem - Pulsar Summit SF 2022

From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
HostedbyConfluent
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
StreamNative
 
Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)
Rick Hightower
 

Similaire à Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosystem - Pulsar Summit SF 2022 (20)

OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, GoogleBringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Orchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQOrchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQ
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
 
Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNow
 
Faster, Higher, Stronger – Accelerating Fault Management to the Next Level
Faster, Higher, Stronger – Accelerating Fault Management to the Next LevelFaster, Higher, Stronger – Accelerating Fault Management to the Next Level
Faster, Higher, Stronger – Accelerating Fault Management to the Next Level
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
 
MRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
MRA AMA Part 10: Kubernetes and the Microservices Reference ArchitectureMRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
MRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
 
Microservices summit talk 1/31
Microservices summit talk   1/31Microservices summit talk   1/31
Microservices summit talk 1/31
 
Confluent Messaging Modernization Forum
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forum
 
Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)
 
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 

Plus de StreamNative

Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 

Plus de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
 
The Evolution History of RoP(RocketMQ-on-Pulsar) - Pulsar Summit Asia 2021
The Evolution History of RoP(RocketMQ-on-Pulsar) - Pulsar Summit Asia 2021The Evolution History of RoP(RocketMQ-on-Pulsar) - Pulsar Summit Asia 2021
The Evolution History of RoP(RocketMQ-on-Pulsar) - Pulsar Summit Asia 2021
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosystem - Pulsar Summit SF 2022

  • 1. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Use Case Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosystem Kai Levy & Zach Walsh Toast, Inc.
  • 2. Kai and Zach both work on Toast’s Scale team, building shared infrastructure and solving problems of messaging, routing and persistence at scale. Kai Levy Senior Software Engineer Toast Zach Walsh Senior Software Engineer Toast
  • 3. Agenda Toast’s microservice ecosystem + Pulsar Blue/green deployments at Toast Driving Pulsar adoption Our Envoy Proxy control plane “The Pulsar Toggle”
  • 4. We empower the restaurant community to delight their guests, do what they love, and thrive
  • 6. Toast’s microservice ecosystem How it started How it’s going
  • 7. How it’s going (with Pulsar)
  • 8. 2018 Asynchronous messaging with RabbitMQ ● Order syncing between devices ● Change Data Capture (CDC) A History of Pulsar at Toast
  • 9. 2018 Asynchronous messaging with RabbitMQ ● Order syncing between devices ● Change Data Capture (CDC) A History of Pulsar at Toast 2019 Pulsar pilot ● Initial exploration & testing ● Cluster productionalization ● First features, such as migrating change data capture
  • 10. Persistence & Stability Seamless Pulsar failover ● RabbitMQ: potential stability issues + in-memory data-storage = lost messages ○ Manual maintenance was a big burden ● Pulsar’s data replication & automatic topic balancing eliminated these concerns
  • 11. Horizontal Scalability broker 0 … ● Supports adding more topics without manual provisioning ● Throughput has grown more than 5x without any change in architecture broker 1 broker 2 broker 3 broker n
  • 12. 2018 Asynchronous messaging with RabbitMQ ● Order syncing between devices ● Change Data Capture (CDC) A History of Pulsar at Toast 2019 Pulsar pilot ● Initial exploration & testing ● Cluster productionalization ● First features, such as migrating change data capture 2020 Full-fledged adoption ● Teams across Toast rapidly built features on top of Pulsar to help restaurants survive the pandemic ● Decorated streams built on Pulsar, which enabled more scalable consumers
  • 13. CDC notify-topic Domain service (Source of Truth) service2 service1 service3 Full-fledged adoption … serviceN
  • 14. CDC data decorator service notify-topic decorated-stream Domain service (Source of Truth) service1 … serviceN Full-fledged adoption
  • 15. Order status notifications Delivery & curbside arrival notifications for consumers - helping restaurants pivot to digital Full-fledged adoption Tip pool tracking Tip pooling information is kept up-to-date with orders information Loyalty points accrual Consumer-facing loyalty programs help Toast restaurants thrive Restaurant availability Third party platforms are notified when a restaurant goes offline
  • 16. 2018 Asynchronous messaging with RabbitMQ ● Order syncing between devices ● Change data capture (CDC) A History of Pulsar at Toast 2019 Pulsar pilot ● Initial exploration & testing ● Cluster productionalization ● First features, such as migrating change data capture 2020 Full-fledged adoption ● Teams across Toast rapidly built features on top of Pulsar to help restaurants survive the pandemic ● Decorated streams built on Pulsar, which enabled more scalable consumers 2022 Next-gen order processing ● Critical replatforming projects in development will help Toast reach the next level of scale ● Event-driven architecture being widely used for new features
  • 17. Agenda Toast’s microservice ecosystem + Pulsar Blue/green deployments at Toast Driving Pulsar adoption Our Envoy Proxy control plane “The Pulsar Toggle”
  • 18. Pulsar adoption has grown steadily user adoption (linear)
  • 19. Toast client libraries Providing Toast-specific functionality for free 1. Out-of-box authentication 2. Dead-letter topic guidance (+ topic registries) 3. Metric instrumentation 4. Message parsing 5. Pulsar client configuration +
  • 20. Authentication & authorization ● Automatic service authentication provided by the client libraries ○ Easy to use with any of our supported application frameworks ● Contributed a patch into the public Java client library
  • 21. Dead-Letter Topics ● Standards for undeliverable messages ○ Per-subscription DLQs, or automatic acknowledgement after redelivery ○ Integrated with service configuration
  • 22. Topic registries with terraform ● Started with in-house provider ○ Now migrating to StreamNative provider ● Lets us manage namespace authorization ● Provide defaults for retention & persistence ● Central place for discovering events Developers write infrastructure as code
  • 23. Metrics ● Automatically report over 2 dozen metrics ○ Consistent across services ● Critical for operations & monitoring ● Added our own custom metrics ● Adding APM integrations ackLatency ackTimeouts auto-acknowledgements
  • 24. Message Parsing We parse Protobuf messages into friendly Kotlin data classes ● Our open-source, Kotlin-first protocol buffer compiler ● One-line usage for engineers building on our client
  • 25. Configuration recommendations Providing guidance around client settings ● Producer batching ● Acknowledgement timeout ● Receiver queue size ● Redelivery delay ● Unique consumer & producer names Starting Pulsar consumer status recorder with config: { "topicNames" : [ "persistent://…" ], "topicsPattern" : null, "subscriptionName" : "...", "subscriptionType" : "Shared", "subscriptionMode" : "Durable", "receiverQueueSize" : 1000, "acknowledgementsGroupTimeMicros" : 100000, "negativeAckRedeliveryDelayMicros" : 500000, "maxTotalReceiverQueueSizeAcrossPartitions" : 50000, "consumerName" : null, "ackTimeoutMillis" : 30000, "tickDurationMillis" : 1000, "priorityLevel" : 0, "maxPendingChuckedMessage" : 10, "autoAckOldestChunkedMessageOnQueueFull" : false, "expireTimeOfIncompleteChunkedMessageMillis" : 60000, "cryptoFailureAction" : "FAIL", "properties" : { }, "readCompacted" : false, "subscriptionInitialPosition" : "Latest", "patternAutoDiscoveryPeriod" : 60, "regexSubscriptionMode" : "PersistentOnly",
  • 26. But something is still missing…
  • 27. Agenda Toast’s microservice ecosystem + Pulsar Blue/green deployments at Toast (the problem) Driving Pulsar adoption Our Envoy Proxy control plane “The Pulsar Toggle”
  • 28. Deployment and elevation practices service v1 service v2 HTTP ingress control plane
  • 29. Deployment and elevation practices service v2 service v1 HTTP ingress control plane
  • 30. Deployment and elevation practices service v1 service v2 HTTP ingress control plane service v2
  • 31. Deployment and elevation practices service v1 service v2 HTTP ingress control plane service v2
  • 32. Deployment and elevation practices service v1 service v2 HTTP ingress control plane service v2
  • 33. Deployment and elevation practices service v1 service v2 HTTP ingress control plane service v2
  • 34. Deployment and elevation practices service v1 service v2 HTTP ingress control plane service v2
  • 35. service v1 Deployment and elevation practices service v2 HTTP ingress control plane service v2
  • 36. shared pulsar subscription Deploying changes to Pulsar consumers is risky service v1 service v2 service v1 service v2
  • 37. Mismatch in tooling Our platform for request-driven service deploys was well ahead of our Pulsar platform, causing developer frustration
  • 39. Principle of least surprise “In interface design, always do the least surprising thing.” - Basics of the Unix Philosophy
  • 40. Elevations & deploys should be safe, easy, uneventful!
  • 41. Agenda Toast’s microservice ecosystem + Pulsar Blue/green deployments at Toast (the solution) Driving Pulsar adoption Our Envoy Proxy control plane “The Pulsar Toggle”
  • 42. Pulsar operational tooling Elevations & deploys weren’t easy on Pulsar REST services Pulsar consumers Can I validate my deploy before prod traffic? ✅ ❌ Can I validate with a small amount of prod traffic? ✅ ❌ Can I easily roll back? ✅ ❌ Can I easily roll forward? ✅ ❌ Contrast: REST services & Pulsar (in 2019)
  • 43. Pulsar Consumer Elevation Requirements 1. Elevate traffic to new consumers as they are set to “active” in the control plane. 2. Avoid building a single point of failure. 3. Make this reusable for other background processes at Toast. 4. No performance hit or extra infrastructure.
  • 44. Some options we considered Message Router Pattern incoming topic Deploy N Deploy N + 1 Router Control Plane blue topic green topic
  • 45. Some options we considered Message Router Pattern - Problems incoming topic Deploy N Deploy N + 1 Router Control Plane blue topic green topic ● But, the router is a single point of failure ● More infrastructure to monitor ● Two hops per message
  • 46. Some options we considered Feature Flags ● Apps use a feature flag to know whether to connect ● But, not integrated with our control plane ● Requires more setup for each consumer incoming topic Deploy N Deploy N + 1 FF Off FF On
  • 47. Some options we considered Pausing Inactive Consumers ● The Feature Flag approach is close ○ No extra infrastructure ○ No extra hops ● But, we’d need to integrate it into our control plane ● Is this possible with Pulsar? incoming topic Deploy N Deploy N + 1 inactive active
  • 48. Let’s see what the Pulsar source code has to say about pausing consumers. What does Pulsar provide? In Consumer.java:
  • 49. Will pause() and resume() work? Pulsar consumers Pulsar consumers with pause() Can I validate my deploy before prod traffic? ❌ ✅ Can I validate with a small amount of prod traffic? ❌ ❌ Can I easily roll back? ❌ ✅ Can I easily roll forward? ❌ ✅ What do operations look like if inactive consumers call pause()?
  • 50. How do we get each consumer to call pause() or resume() at the right time? How Would You Solve This? ● Pausing pulsar consumers is easy. Knowing when to pause is hard. ● Central control plane component owns this data ● Let’s just poll that service ● What would that look like? control plane service Z
  • 51. What’s Wrong With This? ● Used to be the pattern for service discovery at Toast ● Subject to thundering herd ● Now, we leverage Envoy control plane service Z
  • 52. Agenda Toast’s microservice ecosystem + Pulsar Blue/green deployments at Toast Driving Pulsar adoption Our Envoy Proxy control plane “The Pulsar Toggle”
  • 53. How We Leverage Envoy Envoy at Toast
  • 54. Envoy is a reverse proxy Deployed as a sidecar, forwards requests to their destination Envoy acts as a proxy, forwarding requests upstream. my-service menus GET /menus/v2/menuItems GET /v2/menuItems envoy
  • 55. Envoy is eventually consistent Routing changes are pushed asynchronously Envoy sidecars across the fleet are pushed updates within ~1-2min of the change. Control Plane …
  • 56. Envoy knows service status It gets a push each time any deploy goes active or inactive We can leverage this to pause() or resume() consumers.
  • 57. Envoy direct responses Using an interesting Envoy feature to avoid single points of failure It can intercept requests and reply with a direct response! This gets the status info into the process where the Consumer is running. *magic config* GET /sidecar/v1/elevation/active { "active": true } my-service envoy
  • 58. Agenda Toast’s microservice ecosystem + Pulsar Blue/green deployments at Toast Driving Pulsar adoption Our Envoy Proxy control plane “The Pulsar Toggle”
  • 60. “Pulsar Toggle” implementation Leveraging our Envoy Control Plane to toggle Pulsar consumers A thread polls the locally-running Envoy instance and toggles the Pulsar consumer as needed
  • 61. Some “gotchas” Eventually consistent Consumers don’t pause immediately - updates propagate with some latency Start paused Wasn’t a way to subscribe in a paused state - we made a patch to the Java client More advanced elevation patterns Currently we can’t support percent elevations of pulsar traffic onto new deploys Receiver queue size Critically important to tune this parameter of consumers
  • 62. Results ~30 Toggle users in Prod across pulsar consumers & background workers 0 Outages No added load on any critical systems 2 Contributions To open source - the Java client & the Camel integration
  • 63. Increased adoption 2x New topics Developers are adding topics at twice the rate since the Pulsar toggle was released user adoption (linear)
  • 64. Users Love it! 65% Increase reported ease of use when deploying pulsar consumer changes 46% Decrease reported risk associated with deploying pulsar consumer changes Positive feedback from satisfaction surveys with our users
  • 65. Key Takeaways Integration Strong integration with existing systems is critical for org-wide adoption. Ease of Use As we make our Pulsar platform easier to use, we see more and more adoption. Stability Pulsar’s stability through big growth has been a killer feature for us.
  • 66. Kai Levy & Zach Walsh Thank you! klevy@toasttab.com zachary.walsh@toasttab.com Pulsar Summit San Francisco Hotel Nikko August 18 2022 We’re Hiring! careers.toasttab.com