SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
bertjan@openvalue.eu
Debugging distributed systems
Bert Jan Schrijver
@bjschrijver
bertjan@openvalue.eu
Debugging distributed systems
Bert Jan Schrijver
@bjschrijver
Networking 101
How the internet works
Why?
Bert Jan Schrijver
L e t ’ s m e e t
@bjschrijver
Why are distributed
systems difficult?
Networking 101
What?
Why?
✅
Demo
War stories
Conclusion
W h a t ‘ s n e x t ?
Outline
A structured approach
@bjschrijver
• Level 1: Non-concurrency
• Level 2: Concurrency
• Level 3: Distribution
The hierarchy of complexity for debugging
Source:	https://maximilianmichels.com/2020/debugging-distributed-systems
What is a distributed system?
A distributed system is a system whose
components are located on different
networked computers, which
communicate and coordinate their actions
by passing messages to one another.
• Concurrency of components
• Lack of a global clock
• Independent failure of components

• Distributed systems are harder to reason
about
Characteristics of distributed systems
Source:	http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
Working with distributed systems is
fundamentally different from writing
software on a single computer - and the
main difference is that there are lots of new
and exciting ways for things to go wrong.
- Martin Kleppmann
“
”
Photo:	Dave	Lehl
Why do things go wrong?
“ ”
Photo:	Dave	Lehl
The fallacies of distributed computing
are a set of assertions made by L Peter
Deutsch and others at Sun Microsystems
describing false assumptions that
programmers new to distributed
applications invariably make.
1. The network is reliable;
2. Latency is zero;
3. Bandwidth is infinite;
4. The network is secure;
5. Topology doesn't change;
6. There is one administrator;
7. Transport cost is zero;
8. The network is homogeneous.
Fallacies of distributed computing
What could possibly go wrong?
“ ”
Photo:	Dave	Lehl
OSI & TCP/IP
Source:	https://www.guru99.com/difference-tcp-ip-vs-osi-model.html
.. in your browser’s address bar and press Enter
What happens when you type google.com…
Source:	https://github.com/alex/what-happens-when
17
In a distributed system, there may well be
some parts of the system that are broken in
some unpredictably way, even though other
parts of the system are working fine…
“
”
Source:	Martin	Kleppmann
“
”
Source:	Martin	Kleppmann
… but in a system with thousands of
nodes, it is reasonable to assume that
something is always broken.
- Martin Kleppmann
Are we done yet?
Source:	https://7216-presscdn-0-76-pagely.netdna-ssl.com/wp-content/uploads/2011/12/confused-man-single-good-men.jpg
Where do I start?
A structured approach
to debugging distributed systems
@bjschrijver
Check DNS & routing
Check connection
Debug client side
Create minimal reproducer
Debug server side
Observe & document
Wrap up & post mortem
Inspect traffic / messages
Step 1: Observe & document
• What do you know about the problem?
• Inspect logging, errors, metrics, tracing
• Draw the path from source to target - what’s
in between? Focus on details!
• Document what you know
• Can we reproduce in a test?
• By injecting errors, for example
Step 1: Observe & document
Step 2: Create minimal reproducer
• Goal: maximise the amount of debugging
cycles
• Focus on short development iterations /
feedback loops
• Get close to the action!
Step 3: Debug client side
• Focus on eliminating anything that could be
wrong on the client side
• Are we connecting to the right host?
• Do we send the right message?
• Do we receive a response?
• Not much different from local debugging
Step 4: Check DNS & routing
• DNS:
• Make sure you know what IP address the
hostname should resolve to
• Verify that this actually happens at the client
• Routing:
• Verify you can reach the target machine
Step 5: Check connection
• Can we connect to the port?
• If not, do we get a REJECT or a DROP?
• Does the connection open and stay open?
• Are we talking TLS?
• What is the connection speed between us?
Step 6: Inspect traffic / messages
• Do we send the right request?
• Do we receive the right response?
• How do we know?
• How do we handle TLS?
• Are there any load balancers or proxies in
between?
Step 7: Debug server side
• Inspect the remote host
• Can we attach a remote debugger?
• See https://youtube.com/OpenValue
• Profiling
• Strace
Step 8: Wrap up & post mortem
• Document the issue:
• Timeline
• What did we see?
• Why did it happen?
• What was the impact?
• How did we find out?
• What did we do to mitigate and fix?
• What should we do to prevent repetition?
If you really want a reliable system, you
have to understand what its failure modes
are. You have to actually have witnessed
it misbehaving.
- Jason Cahoon
“
”
Distributed systems war stories
The time where it worked half of the time…
The one where two services didn’t speak
the same language…
The one with expensive logging…
The one at a school…
The one where only one country was
affected…
The one where breaking news broke
something else…
Summary: a structured approach
to debugging distributed systems
@bjschrijver
Check DNS & routing
Check connection
Debug client side
Create minimal reproducer
Debug server side
Observe & document
Wrap up & post mortem
Inspect traffic / messages
Source:	https://cdn2.vox-cdn.com/thumbor/J9OqPYS7FgI9fjGhnF7AFh8foVY=/148x0:1768x1080/1280x854/cdn0.vox-cdn.com/uploads/chorus_image/image/46147742/cute-success-kid-1920x1080.0.0.jpg
THAT’S IT.
NOW GO KICK SOME ASS!
Questions?
@bjschrijver
Thanks for your time.
Got feedback? Tweet it!
All pictures belong
to their respective
authors
@bjschrijver

Contenu connexe

Tendances

Principles of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOpsPrinciples of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOpsBert Jan Schrijver
 
Software architecture in a DevOps world
Software architecture in a DevOps worldSoftware architecture in a DevOps world
Software architecture in a DevOps worldBert Jan Schrijver
 
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Bert Jan Schrijver
 
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...Bert Jan Schrijver
 
DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.Bert Jan Schrijver
 
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...Bert Jan Schrijver
 
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...Bert Jan Schrijver
 
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...Bert Jan Schrijver
 
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...Bert Jan Schrijver
 
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...Bert Jan Schrijver
 
Introduction to devops 2016
Introduction to devops 2016Introduction to devops 2016
Introduction to devops 2016gjdevos
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Kris Buytaert
 
DevOps - Understanding Core Concepts
DevOps - Understanding Core ConceptsDevOps - Understanding Core Concepts
DevOps - Understanding Core ConceptsNitin Bhide
 
DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)Soshi Nemoto
 
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...Skelton Thatcher Consulting Ltd
 
What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...
What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...
What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...Simplilearn
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Patrick McDonnell
 
Walk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOpsWalk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOpsNathen Harvey
 

Tendances (20)

Principles of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOpsPrinciples of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOps
 
Software architecture in a DevOps world
Software architecture in a DevOps worldSoftware architecture in a DevOps world
Software architecture in a DevOps world
 
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
 
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
 
DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.
 
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
 
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
 
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
 
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
 
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
 
Introduction to devops 2016
Introduction to devops 2016Introduction to devops 2016
Introduction to devops 2016
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
DevOps - Understanding Core Concepts
DevOps - Understanding Core ConceptsDevOps - Understanding Core Concepts
DevOps - Understanding Core Concepts
 
DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)
 
Devops
DevopsDevops
Devops
 
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...
Continuous Delivery antipatterns from the wild - Matthew Skelton - Continuous...
 
What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...
What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...
What is DevOps? | DevOps Introduction | DevOps Tools | DevOps Tutorial For Be...
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Walk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOpsWalk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOps
 

Similaire à JUG CH September 2021 - Debugging distributed systems

Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsBert Jan Schrijver
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsBert Jan Schrijver
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsBert Jan Schrijver
 
Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsBert Jan Schrijver
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservicesWilliam Brander
 
Incident Response Fails
Incident Response FailsIncident Response Fails
Incident Response FailsMichael Gough
 
Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions? Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions? apricotbyctk
 
Hyperledger Blockchain
Hyperledger BlockchainHyperledger Blockchain
Hyperledger BlockchainAfraz Khan
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorialduleepa
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosCharity Majors
 
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial EmulationScott Sutherland
 
6 Scope & 7 Live Data Collection
6 Scope & 7 Live Data Collection6 Scope & 7 Live Data Collection
6 Scope & 7 Live Data CollectionSam Bowne
 
Enter The back|track Linux Dragon
Enter The back|track Linux DragonEnter The back|track Linux Dragon
Enter The back|track Linux DragonAndrew Kozma
 
IBWAS 2010: Web Security From an Auditor's Standpoint
IBWAS 2010: Web Security From an Auditor's StandpointIBWAS 2010: Web Security From an Auditor's Standpoint
IBWAS 2010: Web Security From an Auditor's StandpointLuis Grangeia
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of FailSteve Poole
 
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"Aaron Rinehart
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012Nick Galbreath
 

Similaire à JUG CH September 2021 - Debugging distributed systems (20)

Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systems
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systems
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systems
 
Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systems
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservices
 
Heartbleed
HeartbleedHeartbleed
Heartbleed
 
Incident Response Fails
Incident Response FailsIncident Response Fails
Incident Response Fails
 
Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions? Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions?
 
Hyperledger Blockchain
Hyperledger BlockchainHyperledger Blockchain
Hyperledger Blockchain
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
 
6 Scope & 7 Live Data Collection
6 Scope & 7 Live Data Collection6 Scope & 7 Live Data Collection
6 Scope & 7 Live Data Collection
 
Enter The back|track Linux Dragon
Enter The back|track Linux DragonEnter The back|track Linux Dragon
Enter The back|track Linux Dragon
 
IBWAS 2010: Web Security From an Auditor's Standpoint
IBWAS 2010: Web Security From an Auditor's StandpointIBWAS 2010: Web Security From an Auditor's Standpoint
IBWAS 2010: Web Security From an Auditor's Standpoint
 
Luis Grangeia IBWAS
Luis Grangeia IBWASLuis Grangeia IBWAS
Luis Grangeia IBWAS
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
 
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 

Dernier

IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 

Dernier (11)

IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 

JUG CH September 2021 - Debugging distributed systems

  • 2. bertjan@openvalue.eu Debugging distributed systems Bert Jan Schrijver @bjschrijver Networking 101 How the internet works
  • 4. Bert Jan Schrijver L e t ’ s m e e t @bjschrijver
  • 5. Why are distributed systems difficult? Networking 101 What? Why? ✅ Demo War stories Conclusion W h a t ‘ s n e x t ? Outline A structured approach @bjschrijver
  • 6. • Level 1: Non-concurrency • Level 2: Concurrency • Level 3: Distribution The hierarchy of complexity for debugging Source: https://maximilianmichels.com/2020/debugging-distributed-systems
  • 7. What is a distributed system?
  • 8. A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.
  • 9. • Concurrency of components • Lack of a global clock • Independent failure of components
 • Distributed systems are harder to reason about Characteristics of distributed systems Source: http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
  • 10. Working with distributed systems is fundamentally different from writing software on a single computer - and the main difference is that there are lots of new and exciting ways for things to go wrong. - Martin Kleppmann “ ” Photo: Dave Lehl
  • 11. Why do things go wrong? “ ” Photo: Dave Lehl
  • 12. The fallacies of distributed computing are a set of assertions made by L Peter Deutsch and others at Sun Microsystems describing false assumptions that programmers new to distributed applications invariably make.
  • 13. 1. The network is reliable; 2. Latency is zero; 3. Bandwidth is infinite; 4. The network is secure; 5. Topology doesn't change; 6. There is one administrator; 7. Transport cost is zero; 8. The network is homogeneous. Fallacies of distributed computing
  • 14. What could possibly go wrong? “ ” Photo: Dave Lehl
  • 16. .. in your browser’s address bar and press Enter What happens when you type google.com… Source: https://github.com/alex/what-happens-when
  • 17. 17
  • 18. In a distributed system, there may well be some parts of the system that are broken in some unpredictably way, even though other parts of the system are working fine… “ ” Source: Martin Kleppmann
  • 19. “ ” Source: Martin Kleppmann … but in a system with thousands of nodes, it is reasonable to assume that something is always broken. - Martin Kleppmann
  • 20. Are we done yet?
  • 22. A structured approach to debugging distributed systems @bjschrijver Check DNS & routing Check connection Debug client side Create minimal reproducer Debug server side Observe & document Wrap up & post mortem Inspect traffic / messages
  • 23. Step 1: Observe & document • What do you know about the problem? • Inspect logging, errors, metrics, tracing • Draw the path from source to target - what’s in between? Focus on details! • Document what you know • Can we reproduce in a test? • By injecting errors, for example
  • 24. Step 1: Observe & document
  • 25. Step 2: Create minimal reproducer • Goal: maximise the amount of debugging cycles • Focus on short development iterations / feedback loops • Get close to the action!
  • 26. Step 3: Debug client side • Focus on eliminating anything that could be wrong on the client side • Are we connecting to the right host? • Do we send the right message? • Do we receive a response? • Not much different from local debugging
  • 27. Step 4: Check DNS & routing • DNS: • Make sure you know what IP address the hostname should resolve to • Verify that this actually happens at the client • Routing: • Verify you can reach the target machine
  • 28. Step 5: Check connection • Can we connect to the port? • If not, do we get a REJECT or a DROP? • Does the connection open and stay open? • Are we talking TLS? • What is the connection speed between us?
  • 29. Step 6: Inspect traffic / messages • Do we send the right request? • Do we receive the right response? • How do we know? • How do we handle TLS? • Are there any load balancers or proxies in between?
  • 30. Step 7: Debug server side • Inspect the remote host • Can we attach a remote debugger? • See https://youtube.com/OpenValue • Profiling • Strace
  • 31. Step 8: Wrap up & post mortem • Document the issue: • Timeline • What did we see? • Why did it happen? • What was the impact? • How did we find out? • What did we do to mitigate and fix? • What should we do to prevent repetition?
  • 32. If you really want a reliable system, you have to understand what its failure modes are. You have to actually have witnessed it misbehaving. - Jason Cahoon “ ”
  • 34. The time where it worked half of the time…
  • 35. The one where two services didn’t speak the same language…
  • 36. The one with expensive logging…
  • 37. The one at a school…
  • 38.
  • 39. The one where only one country was affected…
  • 40. The one where breaking news broke something else…
  • 41. Summary: a structured approach to debugging distributed systems @bjschrijver Check DNS & routing Check connection Debug client side Create minimal reproducer Debug server side Observe & document Wrap up & post mortem Inspect traffic / messages
  • 44. Thanks for your time. Got feedback? Tweet it! All pictures belong to their respective authors @bjschrijver