SlideShare a Scribd company logo
1 of 60
Download to read offline
Lessons Learned from 2000
event-driven microservices
natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil
Natan Silnitsky Backend Infra TL, Wix
May 2023
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Unique
visitors use
Wix platform
every month
~1B
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Unique
visitors use
Wix platform
every month
~1B
Daily HTTP
Transactions
~500B
Kafka
messages a
day
~70B
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Unique
visitors use
Wix platform
every month
~1B
Daily HTTP
Transactions
~500B
Kafka
messages a
day
~70B
GAs every
day
> 600
Microservices in
production
2500
* scale, resilience. issues
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Challenges
of event-driven architecture,
that we’ve bumped into
1 Producing message failures
Processing out-of-order & duplicates
2
4 Troubleshooting production
3 Sending large payloads
* success, tools, faster
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
How Event-driven Architecture Works
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Service-to-Service Communication
Cart
Service
User
Service
Inventory
Service
Catalog
Service
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Request-Reply Communication
HTTP RPC
HTTP RPC
HTTP RPC
Cart
Service
User
Service
Inventory
Service
Catalog
Service
* issue scale
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
slow
Cart
Service
* slow, bottleneck, cache
HTTP RPC
HTTP RPC
HTTP RPC
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
unreliable
Cart
Service
* unreliable, cascade, retr
HTTP RPC
HTTP RPC
HTTP RPC
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Event-driven Communication
Producer
Broker Product Updated Topic
Event
* improve, broker, scale
Catalog Service
Kafka
Azure
Service Bus
Azure
Event Hubs
RabbitMQ
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Broker
more robust
* DB, decoupling, no impact
Cart Service
Producer Consumer
Kafka
Catalog Service
Product Updated Topic
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Broker
Event processing is guaranteed
Producer Consumer
Kafka
Catalog Service Cart Service
Product Updated Topic
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
The following is based on a true story
*Dates and products were changed for clarity :)
* ecom simple linear
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
2016
Wix starts using
event-driven
We can work event-driven!!
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
It all began when
Ecom experienced
data issues
Data does NOT reflect
actual catalog
Risk: show wrong
prices in cart
Cart
DB
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
2. Produce
“Product Updated”
Event
Broker
Cart
Service
4. Show updated
prices in cart
3. Update
Product Price
Catalog
Service
1.
Update
status
After investigating
Cart
DB
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Challenge #1
Producing message failure
Kafka
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Broker
Cart
Service
Catalog
Service
Make DB Update & Event Producing Atomic
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Produce event to S3
Broker
Catalog
Service
Resilient
Producer
Catch Unsent Events
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Produce event to S3
Broker
Produce to
Kafka
Healer
Service
Catalog
Service
Poll
Resilient
Producer
Fallback to S3 and Heal
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Kafka Broker
Service A Service B
Greyhound Producer
Kafka Producer
Greyhound Consumer
Kafka Consumer
Wrap Kafka with Greyhound*
* Open source: https://github.com/wix/greyhound
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
➔ Resilient Producer
➔ Parallel Consumption
➔ Batch Consumer
➔ Consumer Retry Strategies
➔ Context Propagation
➔ Metrics reporting
Developer Self-Service:
Wrap Kafka with Greyhound*
* Open source: https://github.com/wix/greyhound
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
2016
Wix starts using
event-driven
2018
Greyhound
Resilient producer
& Consumer retries
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Produce event to S3
Broker
Produce to
Kafka
Healer
Service
Catalog
Service
Poll
Resilient
Producer
Fallback to S3 and Heal
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Broker
Catalog
Service
Healer
Service
Remove
Discount Introduce
Discount
Then ‘out-of-order’ happened
Cart
Service
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Challenge #2
Out-of-order & duplicates processing
Kafka
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Catalog
Service
Broker
Healer
Service
Introduce
Discount
Mitigating out-of-order with revision ID
# 10
# 9
Cart
Service
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Catalog
Service
Broker
Healer
Service
Remove
Discount Introduce
Discount
Mitigating out-of-order with revision ID
# 11
# 10
# 9
Cart
Service
* item itself
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Scan the
binlog. For
each entry
produce a
‘status
update’ event
Cart
Service
Broker
Catalog
Service
Mitigating out-of-order with Debezium connector
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
More Ecom data
issues
Data does NOT reflect
actual inventory
Risk: lose
potential customers
Inventory
DB
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Item 2
Item 1
Broker
Payments
Service
Investigation leads to duplicate processing
Payment for: Inventory
Service
Retry
Item 2 5 → 3
Item 1 9 → 7
* not idempotent
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Item 2 5 → 4
Item 1 9 → 8
Item 2
Item 1
Payment for:
Broker
txnId - a7g45
Mitigating duplicates with Transaction ID
Payments
Service
Inventory
Service
txnId - a7g45
txnId - a7g45
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
2016
Wix starts using
event-driven
2018
Greyhound
Resilient
producer &
Consumer retries
2019
Revisions &
Transaction IDs
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Broker
Product Catalog
Service
Product Update
event
Cart
Service
“Dude, I can’t produce large payloads”
...
"description": "An
apple mobile which is
nothing like apple",
...
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
* 1MB
Challenge #3
Failure to send large payloads
Broker
...
"description": "An
apple mobile which is
nothing like apple",
...
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Large Payloads
Remedy I
Compression
→ Try several compression types (lz4, snappy,
etc.)
→ Compression on Kafka level is usually
better than application level, as payloads
can be compressed in batches
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Large Payloads
Remedy II
Chunking
Broker
1. Split to chunks
& produce
2. Consume &
reassemble
Product
Catalog
Service
Cart
Service
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Large Payloads
Remedy III
Reference to
Object Store
2. Produce with S3
URL
3. Consume &
download from
S3
1. Upload to S3
Product
Catalog
Service Cart
Service
Broker
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
2016
Wix starts using
event-driven
2018
Greyhound Resilient
producer &
Consumer retries
2019
We use IDs for
ooo & duplicates
2020
Added
compression
by default
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
* bottlenecks
Challenge #4
It’s hard for developers to debug and maintain event-driven
microservices at scale in production
Our team
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Stream events with various filters
How do I investigate
this lag?
Our team
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Investigate consumer lag per partition
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
View a “stuck” event in some partition
How come this
side-effect didn’t
happen?
Our team
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Orders
Service
Propagate the Context
Broker Payments Topic
Orders Topic Inventory Topic
requestId
userId
Event Header
1. Greyhound
produce
* monitoring infra
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
2. Greyhound
consume
Propagate the Context
Payments
Service
Broker Payments Topic
Orders Topic Inventory Topic
3. produce
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
4. Greyhound
consume
Propagate the Context
Inventory
Service
Broker Payments Topic
Orders Topic Inventory Topic
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
So developers can track events’ route
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
View event details
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
2016
Wix starts using
event-driven
2018
We open source
Greyhound
2019
We use IDs for
ooo & duplicates
2020
Added
compression
by default
2021-22
Tools in
Production
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Wix developers have embraced
event-driven architecture.
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Meeting these challenges
made our microservices more
decoupled, resilient and scalable,
while keeping complexity low and
data consistent.
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
The Blog Post
https://medium.com/wix-engineerin
g/event-driven-architecture-5-pitfalls-t
o-avoid-b3ebf885bdb1
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
How to migrate 2000 microservices to Multi Cluster
Managed Kafka with 0 Downtime
The Next Step
https://www.youtube.com/watch?v=
XKbG8a-9NRE
Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
Greyhound
github.com/wix/greyhound
Thank You!
natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil
👉 slideshare.net/NatanSilnitsky
Any questions?

More Related Content

Similar to DevSum - Lessons Learned from 2000 microservices

Refacoring vs Rewriting WixStores
Refacoring vs Rewriting WixStoresRefacoring vs Rewriting WixStores
Refacoring vs Rewriting WixStores
Doron Rosenstock
 

Similar to DevSum - Lessons Learned from 2000 microservices (20)

Picos, CloudOS, and Connecting Things
Picos, CloudOS, and Connecting ThingsPicos, CloudOS, and Connecting Things
Picos, CloudOS, and Connecting Things
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
 
Refacoring vs Rewriting WixStores
Refacoring vs Rewriting WixStoresRefacoring vs Rewriting WixStores
Refacoring vs Rewriting WixStores
 
Building microservices with Scala, functional domain models and Spring Boot (...
Building microservices with Scala, functional domain models and Spring Boot (...Building microservices with Scala, functional domain models and Spring Boot (...
Building microservices with Scala, functional domain models and Spring Boot (...
 
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Building microservices with Scala, functional domain models and Spring Boot
Building microservices with Scala, functional domain models and Spring BootBuilding microservices with Scala, functional domain models and Spring Boot
Building microservices with Scala, functional domain models and Spring Boot
 
Building Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaBuilding Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache Kafka
 
D3SF17- Improving Our China Clients Performance
D3SF17- Improving Our China Clients PerformanceD3SF17- Improving Our China Clients Performance
D3SF17- Improving Our China Clients Performance
 
Building and deploying microservices with event sourcing, CQRS and Docker (Me...
Building and deploying microservices with event sourcing, CQRS and Docker (Me...Building and deploying microservices with event sourcing, CQRS and Docker (Me...
Building and deploying microservices with event sourcing, CQRS and Docker (Me...
 
Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServices
 
Building Event-Driven Integration Architectures with Azure Event Grid (GIB2019)
Building Event-Driven Integration Architectures with Azure Event Grid (GIB2019)Building Event-Driven Integration Architectures with Azure Event Grid (GIB2019)
Building Event-Driven Integration Architectures with Azure Event Grid (GIB2019)
 
Serverless Design Patterns
Serverless Design PatternsServerless Design Patterns
Serverless Design Patterns
 
In the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at MicroservicesIn the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at Microservices
 
Developing event-driven microservices with event sourcing and CQRS (phillyete)
Developing event-driven microservices with event sourcing and CQRS (phillyete)Developing event-driven microservices with event sourcing and CQRS (phillyete)
Developing event-driven microservices with event sourcing and CQRS (phillyete)
 
Building event-driven (Micro)Services with Apache Kafka Ecosystem
Building event-driven (Micro)Services with Apache Kafka EcosystemBuilding event-driven (Micro)Services with Apache Kafka Ecosystem
Building event-driven (Micro)Services with Apache Kafka Ecosystem
 
Build Amazing Mobile Apps using HTML5, CSS3 and JavaScript - - MeeGo Confere...
Build Amazing Mobile Apps using HTML5, CSS3 and JavaScript -  - MeeGo Confere...Build Amazing Mobile Apps using HTML5, CSS3 and JavaScript -  - MeeGo Confere...
Build Amazing Mobile Apps using HTML5, CSS3 and JavaScript - - MeeGo Confere...
 
Design Microservice Architectures the Right Way
Design Microservice Architectures the Right WayDesign Microservice Architectures the Right Way
Design Microservice Architectures the Right Way
 
OpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven appsOpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven apps
 
Is Technology an Asset or a Liability?
Is Technology an Asset or a Liability?Is Technology an Asset or a Liability?
Is Technology an Asset or a Liability?
 

More from Natan Silnitsky

More from Natan Silnitsky (20)

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
 
GeeCon - Lessons Learned from 2000 microservices
GeeCon - Lessons Learned from 2000 microservicesGeeCon - Lessons Learned from 2000 microservices
GeeCon - Lessons Learned from 2000 microservices
 
Migrating to Multi Cluster Managed Kafka - ApacheKafkaIL
Migrating to Multi Cluster Managed Kafka - ApacheKafkaILMigrating to Multi Cluster Managed Kafka - ApacheKafkaIL
Migrating to Multi Cluster Managed Kafka - ApacheKafkaIL
 
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven MicroservicesWix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
 
Devoxx Ukraine - Kafka based Global Data Mesh
Devoxx Ukraine - Kafka based Global Data MeshDevoxx Ukraine - Kafka based Global Data Mesh
Devoxx Ukraine - Kafka based Global Data Mesh
 
Devoxx UK - Migrating to Multi Cluster Managed Kafka
Devoxx UK - Migrating to Multi Cluster Managed KafkaDevoxx UK - Migrating to Multi Cluster Managed Kafka
Devoxx UK - Migrating to Multi Cluster Managed Kafka
 
Dev Days Europe - Kafka based Global Data Mesh at Wix
Dev Days Europe - Kafka based Global Data Mesh at WixDev Days Europe - Kafka based Global Data Mesh at Wix
Dev Days Europe - Kafka based Global Data Mesh at Wix
 
Kafka Summit London - Kafka based Global Data Mesh at Wix
Kafka Summit London - Kafka based Global Data Mesh at WixKafka Summit London - Kafka based Global Data Mesh at Wix
Kafka Summit London - Kafka based Global Data Mesh at Wix
 
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
 
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Migrating to Multi Cluster Managed Kafka - DevopStars 2022Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
 
Open sourcing a successful internal project - Reversim 2021
Open sourcing a successful internal project - Reversim 2021Open sourcing a successful internal project - Reversim 2021
Open sourcing a successful internal project - Reversim 2021
 
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
 
Advanced Caching Patterns used by 2000 microservices - Code Motion
Advanced Caching Patterns used by 2000 microservices - Code MotionAdvanced Caching Patterns used by 2000 microservices - Code Motion
Advanced Caching Patterns used by 2000 microservices - Code Motion
 
Advanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
Advanced Caching Patterns used by 2000 microservices - Devoxx UkraineAdvanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
Advanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
 
Advanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UKAdvanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UK
 
Advanced Caching Patterns used by 2000 microservices - Api World
Advanced Caching Patterns used by 2000 microservices - Api WorldAdvanced Caching Patterns used by 2000 microservices - Api World
Advanced Caching Patterns used by 2000 microservices - Api World
 
Kafka based Global Data Mesh at Wix
Kafka based Global Data Mesh at WixKafka based Global Data Mesh at Wix
Kafka based Global Data Mesh at Wix
 
Advanced Caching Patterns used by 2000 microservices - WeAreDevelopers 2021
Advanced Caching Patterns used by 2000 microservices - WeAreDevelopers 2021Advanced Caching Patterns used by 2000 microservices - WeAreDevelopers 2021
Advanced Caching Patterns used by 2000 microservices - WeAreDevelopers 2021
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

DevSum - Lessons Learned from 2000 microservices

  • 1. Lessons Learned from 2000 event-driven microservices natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil Natan Silnitsky Backend Infra TL, Wix May 2023
  • 2. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky
  • 3. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Unique visitors use Wix platform every month ~1B
  • 4. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Unique visitors use Wix platform every month ~1B Daily HTTP Transactions ~500B Kafka messages a day ~70B
  • 5. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Unique visitors use Wix platform every month ~1B Daily HTTP Transactions ~500B Kafka messages a day ~70B GAs every day > 600 Microservices in production 2500 * scale, resilience. issues
  • 6. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Challenges of event-driven architecture, that we’ve bumped into 1 Producing message failures Processing out-of-order & duplicates 2 4 Troubleshooting production 3 Sending large payloads * success, tools, faster
  • 7. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky How Event-driven Architecture Works
  • 8. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Service-to-Service Communication Cart Service User Service Inventory Service Catalog Service
  • 9. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Request-Reply Communication HTTP RPC HTTP RPC HTTP RPC Cart Service User Service Inventory Service Catalog Service * issue scale
  • 10. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky slow Cart Service * slow, bottleneck, cache HTTP RPC HTTP RPC HTTP RPC
  • 11. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky unreliable Cart Service * unreliable, cascade, retr HTTP RPC HTTP RPC HTTP RPC
  • 12. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Event-driven Communication Producer Broker Product Updated Topic Event * improve, broker, scale Catalog Service Kafka Azure Service Bus Azure Event Hubs RabbitMQ
  • 13. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Broker more robust * DB, decoupling, no impact Cart Service Producer Consumer Kafka Catalog Service Product Updated Topic
  • 14. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Broker Event processing is guaranteed Producer Consumer Kafka Catalog Service Cart Service Product Updated Topic
  • 15. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky The following is based on a true story *Dates and products were changed for clarity :) * ecom simple linear
  • 16. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 2016 Wix starts using event-driven We can work event-driven!!
  • 17. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky It all began when Ecom experienced data issues Data does NOT reflect actual catalog Risk: show wrong prices in cart Cart DB
  • 18. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 2. Produce “Product Updated” Event Broker Cart Service 4. Show updated prices in cart 3. Update Product Price Catalog Service 1. Update status After investigating Cart DB
  • 19. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Challenge #1 Producing message failure Kafka
  • 20. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Broker Cart Service Catalog Service Make DB Update & Event Producing Atomic
  • 21. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Produce event to S3 Broker Catalog Service Resilient Producer Catch Unsent Events
  • 22. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Produce event to S3 Broker Produce to Kafka Healer Service Catalog Service Poll Resilient Producer Fallback to S3 and Heal
  • 23. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Kafka Broker Service A Service B Greyhound Producer Kafka Producer Greyhound Consumer Kafka Consumer Wrap Kafka with Greyhound* * Open source: https://github.com/wix/greyhound
  • 24. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky ➔ Resilient Producer ➔ Parallel Consumption ➔ Batch Consumer ➔ Consumer Retry Strategies ➔ Context Propagation ➔ Metrics reporting Developer Self-Service: Wrap Kafka with Greyhound* * Open source: https://github.com/wix/greyhound
  • 25. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 2016 Wix starts using event-driven 2018 Greyhound Resilient producer & Consumer retries
  • 26. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Produce event to S3 Broker Produce to Kafka Healer Service Catalog Service Poll Resilient Producer Fallback to S3 and Heal
  • 27. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Broker Catalog Service Healer Service Remove Discount Introduce Discount Then ‘out-of-order’ happened Cart Service
  • 28. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Challenge #2 Out-of-order & duplicates processing Kafka
  • 29. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Catalog Service Broker Healer Service Introduce Discount Mitigating out-of-order with revision ID # 10 # 9 Cart Service
  • 30. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Catalog Service Broker Healer Service Remove Discount Introduce Discount Mitigating out-of-order with revision ID # 11 # 10 # 9 Cart Service * item itself
  • 31. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Scan the binlog. For each entry produce a ‘status update’ event Cart Service Broker Catalog Service Mitigating out-of-order with Debezium connector
  • 32. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky More Ecom data issues Data does NOT reflect actual inventory Risk: lose potential customers Inventory DB
  • 33. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Item 2 Item 1 Broker Payments Service Investigation leads to duplicate processing Payment for: Inventory Service Retry Item 2 5 → 3 Item 1 9 → 7 * not idempotent
  • 34. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Item 2 5 → 4 Item 1 9 → 8 Item 2 Item 1 Payment for: Broker txnId - a7g45 Mitigating duplicates with Transaction ID Payments Service Inventory Service txnId - a7g45 txnId - a7g45
  • 35. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 2016 Wix starts using event-driven 2018 Greyhound Resilient producer & Consumer retries 2019 Revisions & Transaction IDs
  • 36. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Broker Product Catalog Service Product Update event Cart Service “Dude, I can’t produce large payloads” ... "description": "An apple mobile which is nothing like apple", ...
  • 37. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky * 1MB Challenge #3 Failure to send large payloads Broker ... "description": "An apple mobile which is nothing like apple", ...
  • 38. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Large Payloads Remedy I Compression → Try several compression types (lz4, snappy, etc.) → Compression on Kafka level is usually better than application level, as payloads can be compressed in batches
  • 39. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Large Payloads Remedy II Chunking Broker 1. Split to chunks & produce 2. Consume & reassemble Product Catalog Service Cart Service
  • 40. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Large Payloads Remedy III Reference to Object Store 2. Produce with S3 URL 3. Consume & download from S3 1. Upload to S3 Product Catalog Service Cart Service Broker
  • 41. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 2016 Wix starts using event-driven 2018 Greyhound Resilient producer & Consumer retries 2019 We use IDs for ooo & duplicates 2020 Added compression by default
  • 42. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky * bottlenecks Challenge #4 It’s hard for developers to debug and maintain event-driven microservices at scale in production
  • 44. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Stream events with various filters
  • 45. How do I investigate this lag? Our team
  • 46. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Investigate consumer lag per partition
  • 47. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky View a “stuck” event in some partition
  • 48. How come this side-effect didn’t happen? Our team
  • 49. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Orders Service Propagate the Context Broker Payments Topic Orders Topic Inventory Topic requestId userId Event Header 1. Greyhound produce * monitoring infra
  • 50. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 2. Greyhound consume Propagate the Context Payments Service Broker Payments Topic Orders Topic Inventory Topic 3. produce
  • 51. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 4. Greyhound consume Propagate the Context Inventory Service Broker Payments Topic Orders Topic Inventory Topic
  • 52. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky So developers can track events’ route
  • 53. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky View event details
  • 54. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky 2016 Wix starts using event-driven 2018 We open source Greyhound 2019 We use IDs for ooo & duplicates 2020 Added compression by default 2021-22 Tools in Production
  • 55. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Wix developers have embraced event-driven architecture.
  • 56. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Meeting these challenges made our microservices more decoupled, resilient and scalable, while keeping complexity low and data consistent.
  • 57. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky The Blog Post https://medium.com/wix-engineerin g/event-driven-architecture-5-pitfalls-t o-avoid-b3ebf885bdb1
  • 58. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky How to migrate 2000 microservices to Multi Cluster Managed Kafka with 0 Downtime The Next Step https://www.youtube.com/watch?v= XKbG8a-9NRE
  • 59. Lessons Learned from 2000 Event-driven Microservices @NSilnitsky Greyhound github.com/wix/greyhound
  • 60. Thank You! natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil 👉 slideshare.net/NatanSilnitsky Any questions?