[Test Bash Manchester] Observability and Testing

•

1 j'aime•302 vues

The document discusses observability and testing systems. It notes that pre-production testing only covers known failure modes and observability is needed to understand unknown issues. Observability involves collecting metrics, structured logging, and distributed tracing to understand system behavior and troubleshoot problems. When combined with testing, observability can help reduce the need for extensive pre-production testing and allow issues to be discovered and debugged after deployment.

Logiciels

@PierreVincent
Observability & Testing
Explore what’s really happening
under the hood
March 8th, 2018 – QCon London
@PierreVincent pvincent.io

@PierreVincent
Pierre Vincent
Infra. & Reliability Manager
@PierreVincent
pvincent.io

@PierreVincent
Reaching production is
only the beginning

@PierreVincent
No system is immune to failure
Be ready to recover

@PierreVincent
When distributing a system,
we’re also distributing the places
where things might go wrong

@PierreVincent
Pre-Prod Testing & Monitoring
only cover known failure modes.
What about everything else?

@PierreVincent
Monitoring tells you whether the
system works. Observability lets
you ask why it's not working.
“
”– Baron Schwarz

@PierreVincent
System
metrics
Application
metrics
Business
metrics
CPU usage Error rates Customer conversions
Metrics

@PierreVincent
Watch out for over-reliance on metrics
Limitations at high-cardinality
Not every metric
deserves an alert
Real-time querying means
some trade-offs on retention
Limit alerting to
user-impacting symptoms
Poor fine-grained debugging
e.g. CustomerId
Not suitable for long-term
trend analysis

@PierreVincent
Searchable Correlated
Logging
Making sense of (a lot of) logs
Centralised

@PierreVincent
A
F
H
D
J
B
E
C
G
a1b2c3
a1b2c3
a1b2c3
ERROR [svc=H][trace=a1b2c3] Failed to save order
Cause: Cassandra timeout exception
ERROR [svc=F][trace=a1b2c3] Failed to complete order
Cause: Shipping service responded with 500
ERROR [svc=A][trace=a1b2c3] Failed to process order
Cause: Order process manager responded with 500
a1b2c3
INFO [svc=G][trace=a1b2c3] Items verified in stock
Log Correlation

@PierreVincent
JSON
2018-02-20T16:38:23+00:00 ERROR Read timed out
timestamp 2018-02-20T16:38:23+00:00
requestID ec667cb45
level ERROR
team eventsservice registration-service
commit 542a8b8e build 542a8b8e.7
node node_e79f3e52
log Read timed out
region europe-west2
runtime java-1.8.0_161
When did it happen?
Can I trace it?
What is it?
What is it running?
Where is it?
What is the message?
customerID 55123Who caused it? userID 458
... ...Any other info?
Hmmm thanks… ?

@PierreVincent
Structured logs unleash high-cardinality exploration
Error rate spike isolated
by build version
Activity spike isolated
for single customer

@PierreVincent
Tracing
get_confirmed_attendees
get_attendees
get_confirm_status
cassandra/select
mysql/select
event-mgt-api
attendees-service
registration-service
Trace
Spans
0ms 50ms 100ms 150ms 172ms
172ms
73ms
54ms
78ms
41ms

@PierreVincent
Unknown
Unknowns
Known
Unknowns
Healthchecks
Metrics
(Alerting)
Logs
Metrics
(Queries)
Tracing
Events
Monitoring & Resiliency Debugging & Exploration

@PierreVincent
Observability is the key to
unlock new ways of testing

@PierreVincent
Test (a little bit*) lessTest (a little bit*) lessDon’t spend all your time testing
...keep some for instrumenting
Thank you!
@PierreVincent
pvincent.io

Contenu connexe

Similaire à [Test Bash Manchester] Observability and Testing

IIOT on Variable Frequency Drivesmuthamizh adhithan

eBook-IoTPracticeShargeel sohaib

VmTurboDealmaker Media

Design Of Cooling Tower With IotIRJET Journal

B3948Bryan Borra

Btpro-Penetration Testing ServiceBtpro BilgiTeknolojileri

ThirdEye - LinkedIn's Business-wide monitoring platformAkshay Rai

NetIQ AppManager & NetIQ Operations Center. NCU LtdNCU Ltd

How Can We Use Big Data in the Food Supply Chain EtQ, Inc.

2018 ISPE Tieghi OT/ICS CyberSecurity per Pharma 4.0Enzo M. Tieghi

IRJET-Managing Security of Systems by Data CollectionIRJET Journal

Why Use Open Source to Gain More Visibility into Network MonitoringDevOps.com

An Identity Crisis at the Center of Every IoT ProductSalesforce Developers

Industrial Internet of Things by Edy Liongosari of Accenturegogo6

IoT Based Project for Submersible Motor controlling , monitoring, & Updating ...IRJET Journal

Entersoft MAPTEntersoft Information Systems Pvt Ltd

From reactive to automated reducing costs through mature security processes i...NetIQ

IoT Slam Healthcare 12-02-2016 Great Bay Software

Penetration and Vulnerability.pdfinfosec train

FREQUENTLY ASKED QUESTION IN A TESTER INTERVIEW PENETRATION AND VULNERABILITYInfosec Train

Similaire à [Test Bash Manchester] Observability and Testing (20)

IIOT on Variable Frequency Drives

eBook-IoTPractice

VmTurbo

Design Of Cooling Tower With Iot

B3948

Btpro-Penetration Testing Service

ThirdEye - LinkedIn's Business-wide monitoring platform

NetIQ AppManager & NetIQ Operations Center. NCU Ltd

How Can We Use Big Data in the Food Supply Chain

2018 ISPE Tieghi OT/ICS CyberSecurity per Pharma 4.0

IRJET-Managing Security of Systems by Data Collection

Why Use Open Source to Gain More Visibility into Network Monitoring

An Identity Crisis at the Center of Every IoT Product

Industrial Internet of Things by Edy Liongosari of Accenture

IoT Based Project for Submersible Motor controlling , monitoring, & Updating ...

Entersoft MAPT

From reactive to automated reducing costs through mature security processes i...

IoT Slam Healthcare 12-02-2016

Penetration and Vulnerability.pdf

FREQUENTLY ASKED QUESTION IN A TESTER INTERVIEW PENETRATION AND VULNERABILITY

Plus de Pierre Vincent

[Test bash NL] Contract testing in practice with PactPierre Vincent

DevOpsDays Galway 2019 - Zero-downtime deploymentsPierre Vincent

[Test bash manchester] contract testing in practicePierre Vincent

QCon London - How to build observable distributed systemsPierre Vincent

Deploying microservices in a fast-paced customer-centric environment: How and...Pierre Vincent

Improve collaboration and confidence with Consumer-driven contractsPierre Vincent

Consumer-driven contracts: avoid microservices integration hell! (MuCon Londo...Pierre Vincent

Consumer-driven contracts: avoid microservices integration hell! (LondonCD - ...Pierre Vincent

Agile at Newsweaver (Agile Cork March 2016)Pierre Vincent

Plus de Pierre Vincent (9)

[Test bash NL] Contract testing in practice with Pact

DevOpsDays Galway 2019 - Zero-downtime deployments

[Test bash manchester] contract testing in practice

QCon London - How to build observable distributed systems

Deploying microservices in a fast-paced customer-centric environment: How and...

Improve collaboration and confidence with Consumer-driven contracts

Consumer-driven contracts: avoid microservices integration hell! (MuCon Londo...

Consumer-driven contracts: avoid microservices integration hell! (LondonCD - ...

Agile at Newsweaver (Agile Cork March 2016)

Dernier

VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale

Define the academic and professional writing..pdfPearlKirahMaeRagusta1

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

TECUNIQUE: Success Stories: IT Service providermohitmore19

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8

Right Money Management App For Your Financial GoalsJhone kinadey

AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Software Quality Assurance Interview QuestionsArshad QA

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Exploring the Best Video Editing App.pdfproinshot.com

Dernier (20)

VTU technical seminar 8Th Sem on Scikit-learn

Define the academic and professional writing..pdf

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

TECUNIQUE: Success Stories: IT Service provider

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Diamond Application Development Crafting Solutions with Precision

HR Software Buyers Guide in 2024 - HRSoftware.com

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

Right Money Management App For Your Financial Goals

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

10 Trends Likely to Shape Enterprise Technology in 2024

Unlocking the Future of AI Agents with Large Language Models

Microsoft AI Transformation Partner Playbook.pdf

Software Quality Assurance Interview Questions

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Exploring the Best Video Editing App.pdf

[Test Bash Manchester] Observability and Testing

1. @PierreVincent Observability & Testing Explore what’s really happening under the hood March 8th, 2018 – QCon London @PierreVincent pvincent.io

2. @PierreVincent

3. @PierreVincent Pierre Vincent Infra. & Reliability Manager @PierreVincent pvincent.io

4. @PierreVincent Reaching production is only the beginning

5. @PierreVincent No system is immune to failure Be ready to recover

6. @PierreVincent When distributing a system, we’re also distributing the places where things might go wrong

7. @PierreVincent Pre-Prod Testing & Monitoring only cover known failure modes. What about everything else?

8. @PierreVincent Monitoring tells you whether the system works. Observability lets you ask why it's not working. “ ”– Baron Schwarz

9. @PierreVincent Metrics Logging Tracing

10. @PierreVincent System metrics Application metrics Business metrics CPU usage Error rates Customer conversions Metrics

11. @PierreVincent Metrics

12. @PierreVincent Watch out for over-reliance on metrics Limitations at high-cardinality Not every metric deserves an alert Real-time querying means some trade-offs on retention Limit alerting to user-impacting symptoms Poor fine-grained debugging e.g. CustomerId Not suitable for long-term trend analysis

13. @PierreVincent Searchable Correlated Logging Making sense of (a lot of) logs Centralised

14. @PierreVincent A F H D J B E C G a1b2c3 a1b2c3 a1b2c3 ERROR [svc=H][trace=a1b2c3] Failed to save order Cause: Cassandra timeout exception ERROR [svc=F][trace=a1b2c3] Failed to complete order Cause: Shipping service responded with 500 ERROR [svc=A][trace=a1b2c3] Failed to process order Cause: Order process manager responded with 500 a1b2c3 INFO [svc=G][trace=a1b2c3] Items verified in stock Log Correlation

15. @PierreVincent JSON 2018-02-20T16:38:23+00:00 ERROR Read timed out timestamp 2018-02-20T16:38:23+00:00 requestID ec667cb45 level ERROR team eventsservice registration-service commit 542a8b8e build 542a8b8e.7 node node_e79f3e52 log Read timed out region europe-west2 runtime java-1.8.0_161 When did it happen? Can I trace it? What is it? What is it running? Where is it? What is the message? customerID 55123Who caused it? userID 458 ... ...Any other info? Hmmm thanks… ?

16. @PierreVincent Structured logs unleash high-cardinality exploration Error rate spike isolated by build version Activity spike isolated for single customer

17. @PierreVincent Tracing get_confirmed_attendees get_attendees get_confirm_status cassandra/select mysql/select event-mgt-api attendees-service registration-service Trace Spans 0ms 50ms 100ms 150ms 172ms 172ms 73ms 54ms 78ms 41ms

18. @PierreVincent

19. @PierreVincent Unknown Unknowns Known Unknowns Healthchecks Metrics (Alerting) Logs Metrics (Queries) Tracing Events Monitoring & Resiliency Debugging & Exploration

20. @PierreVincent Observability is the key to unlock new ways of testing

21. @PierreVincent Test (a little bit*) lessTest (a little bit*) lessDon’t spend all your time testing ...keep some for instrumenting Thank you! @PierreVincent pvincent.io

[Test Bash Manchester] Observability and Testing

Recommandé

Recommandé

Contenu connexe

Similaire à [Test Bash Manchester] Observability and Testing

Similaire à [Test Bash Manchester] Observability and Testing (20)

Plus de Pierre Vincent

Plus de Pierre Vincent (9)

Dernier

Dernier (20)

[Test Bash Manchester] Observability and Testing