Monitoring microservices

•Télécharger en tant que PPTX, PDF•

0 j'aime•121 vues

Microservices are a great way to design your system so that it can scale. But once those pieces are in production, how do you know if all the different pieces are working properly? Are some metrics more important than others, and what story can each of the metrics tell you? This talk shows you some tools and techniques to monitor distributed systems

Ingénierie

Techniques for
monitoring
Microservices
William Brander
@williambza
Particular Software

An average production system
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?

What are you actually monitoring?
Business
Capability
Application
Infrastructure
Are my servers running?Is my application process running?Can users place an order?
Monitoring Area

Monitoring Concerns
Capacity
Performance
Health
Is the server up?Is there high CPU?Do I have enough disk space?
Is my application generating exceptions?
How quickly is my system processing messages?
Can I handle month end batch jobs?
Is the server up?
Is there high CPU?
Do I have enough disk space?
Application
Infrastructure
Can users access the checkout cart?
Are we meeting our SLAs?
What is the impact of adding another customer?
Business
Capability

Interaction Type
Proactive
Reactive
Passive
The monitoring system can display
metrics
The monitoring system alerts me when
something happens
The monitoring system automatically
takes actions to repair the system

A Monitoring Philosophy
Business
Capability
Application
Infrastructure
Capacity
Performance
Health
Monitoring Area Monitoring Concern
Proactive
Reactive
Passive
Interaction Type

Recap: What are we monitoring?
Database
• Is the web server up?
• Is the database up?
• Can the webserver talk
to the db?
Infrastructure PassiveHealth

27 28 29 30 31 32 33 34 35 37 40 41 42 43 45

Recap: What are we monitoring?
• Warn me with the queue
length exceeds 50
Infrastructure ReactivePerformance

What happens when we
distribute the systems?

Queue Length
• Queue length is an indicator of work still outstanding
• High queue length doesn’t necessarily indicate a problem though
Stable or
decreasing
is good
Increasing
is bad

Processing Time
• Processing Time is the time taken to successfully process a message
• Processing Time does not include error handling time
• It is independent of queue wait time
Stable or decreasing could
be good
Increasing is bad

✔⌛
⏱️
Critical time
⏱️
Critical time = The entire time taken to process a
message successfully

• Critical Time is the total duration between when a message is created
to when it is processed
Critical Time = Time in Queue +
Processing Time +
Retry Time +
Network Latency Time
Critical Time
Stable or decreasing could
be good
Increasing is bad

Putting these together
• Each of these metrics presents a piece of the puzzle
• Look at them from an endpoint’s perspective, not per message
• Looking at them together gives great insight into your system
Critical Time Processing Time Queue LengthCritical Time Processing Time Queue LengthCritical Time Processing Time Queue Length

Detecting Connectivity
• Distributed systems typically work when other parts aren’t available
• How do you know the endpoint you’re sending messages to is
actually processing messages?

Detecting Connectivity
Peer-to-peer connectivity tells us if an endpoint is
actually processing messages from another

How do we collect all this info?
⏱️
• Processing Time
• Critical Time
• Queue Length
• Connectivity
• Reporting Metric
• Message Type
• Timestamp
• Value
• Reporting Metric (N bytes)
• Message Type (N bytes)
• Timestamp (8 bytes)
• Value (8 bytes)

How do we collect all this info?
• Epoch time (8 bytes)
• Dictionary of Metric Types (n* (N + 4) bytes)
• Dictionary of Message Types (n * (N + 4) bytes)
• An array of:
• Reporting Metric index (4 bytes)
• Message Type index (4 bytes)
• Epoch offset (4 bytes)
• Value (8 bytes)

Recommandé

Reactive Messaging Patterns.Knoldus Inc.

More Than Just URL Mappers - Proxies for Observation and ControlMark McBride

Storage Consistency for ECE536Husain Al Yusuf

Tef con2016 (1)ggarber

Service Levels and Error Budgets - Paweł KucharskiPROIDEA

Traffic Control with Envoy ProxyMark McBride

Customer-centric MetricsMark McBride

(DVO204) Monitoring Strategies: Finding Signal in the NoiseAmazon Web Services

Recommandé

Reactive Messaging Patterns.Knoldus Inc.

More Than Just URL Mappers - Proxies for Observation and ControlMark McBride

Storage Consistency for ECE536Husain Al Yusuf

Tef con2016 (1)ggarber

Service Levels and Error Budgets - Paweł KucharskiPROIDEA

Traffic Control with Envoy ProxyMark McBride

Customer-centric MetricsMark McBride

(DVO204) Monitoring Strategies: Finding Signal in the NoiseAmazon Web Services

C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...DataStax Academy

A Coherent Discussion About PerformanceTheo Schlossnagle

Performance is a Shape, Not a NumberDevOps.com

Donatas Mažionis, Building low latency web APIsTanya Denisyuk

Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries

Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...DataStax Academy

Using machine learning to determine drivers of bounce and conversionTammy Everts

Webisite globalization Clay TabletRDC

Performance Forensics - Understanding Application PerformanceAlois Reitbauer

Evolution of the Prometheus TSDB (Percona Live Europe 2017)Brian Brazil

What to consider when monitoring microservicesParticular Software

Monitoring and Managing Java ApplicationsAlois Reitbauer

JUG CH September 2021 - Debugging distributed systemsBert Jan Schrijver

How to improve your system monitoringAndrew White

Kanban to #003 - MetricsFernando Cuenca

Patterns of Distributed Application DesignGlobalLogic Ukraine

Debugging distributed systemsBert Jan Schrijver

JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver

GOTO night April 2022 - Debugging distributed systemsBert Jan Schrijver

Designing distributed, scalable and reliable systems using NServiceBusMauro Servienti

Debugging distributed systemsBert Jan Schrijver

Mastering Microservices 2022 - Debugging distributed systemsBert Jan Schrijver

Contenu connexe

Tendances

C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...DataStax Academy

A Coherent Discussion About PerformanceTheo Schlossnagle

Performance is a Shape, Not a NumberDevOps.com

Donatas Mažionis, Building low latency web APIsTanya Denisyuk

Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries

Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...DataStax Academy

Using machine learning to determine drivers of bounce and conversionTammy Everts

Webisite globalization Clay TabletRDC

Performance Forensics - Understanding Application PerformanceAlois Reitbauer

Evolution of the Prometheus TSDB (Percona Live Europe 2017)Brian Brazil

Tendances (10)

C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...

A Coherent Discussion About Performance

Performance is a Shape, Not a Number

Donatas Mažionis, Building low latency web APIs

Just In Time Scalability Agile Methods To Support Massive Growth Presentation

Cassandra Day SV 2014: A Netflix Experiment Eventual Consistency != Hopeful C...

Using machine learning to determine drivers of bounce and conversion

Webisite globalization Clay Tablet

Performance Forensics - Understanding Application Performance

Evolution of the Prometheus TSDB (Percona Live Europe 2017)

Similaire à Monitoring microservices

What to consider when monitoring microservicesParticular Software

Monitoring and Managing Java ApplicationsAlois Reitbauer

JUG CH September 2021 - Debugging distributed systemsBert Jan Schrijver

How to improve your system monitoringAndrew White

Kanban to #003 - MetricsFernando Cuenca

Patterns of Distributed Application DesignGlobalLogic Ukraine

Debugging distributed systemsBert Jan Schrijver

JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver

GOTO night April 2022 - Debugging distributed systemsBert Jan Schrijver

Designing distributed, scalable and reliable systems using NServiceBusMauro Servienti

Debugging distributed systemsBert Jan Schrijver

Mastering Microservices 2022 - Debugging distributed systemsBert Jan Schrijver

Patterns of Distributed Application DesignOrkhan Gasimov

See through softwareMatthew Mark Miller

Devoxx Belgium 2022 - Debugging distributed systemsBert Jan Schrijver

Arnhem JUG March 2023 - Debugging distributed systemsBert Jan Schrijver

How Can Monitoring Save Your Bacon - build stuff 2018Sean Farmar

Scaling Systems: Architectures that growGibraltar Software

Building data intensive applicationsAmit Kejriwal

Using Machine Learning to Optimize DevOps PracticesPeter Varhol

Similaire à Monitoring microservices (20)

What to consider when monitoring microservices

Monitoring and Managing Java Applications

JUG CH September 2021 - Debugging distributed systems

How to improve your system monitoring

Kanban to #003 - Metrics

Patterns of Distributed Application Design

Debugging distributed systems

JavaLand 2022 - Debugging distributed systems

GOTO night April 2022 - Debugging distributed systems

Designing distributed, scalable and reliable systems using NServiceBus

Debugging distributed systems

Mastering Microservices 2022 - Debugging distributed systems

Patterns of Distributed Application Design

See through software

Devoxx Belgium 2022 - Debugging distributed systems

Arnhem JUG March 2023 - Debugging distributed systems

How Can Monitoring Save Your Bacon - build stuff 2018

Scaling Systems: Architectures that grow

Building data intensive applications

Using Machine Learning to Optimize DevOps Practices

Dernier

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

Porous Ceramics seminar and technical writingrakeshbaidya232001

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Extrusion Processes and Their Limitations120cr0395

result management system report for college projectTonystark477637

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

Dernier (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

UNIT-II FMM-Flow Through Circular Conduits

Roadmap to Membership of RICS - Pathways and Routes

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

Processing & Properties of Floor and Wall Tiles.pptx

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

Introduction to IEEE STANDARDS and its different types.pptx

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

Porous Ceramics seminar and technical writing

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Extrusion Processes and Their Limitations

result management system report for college project

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Monitoring microservices

1. Techniques for monitoring Microservices William Brander @williambza Particular Software

3. An average production system Database • Is the web server up? • Is the database up? • Can the webserver talk to the db?

4. What are you actually monitoring? Business Capability Application Infrastructure Are my servers running?Is my application process running?Can users place an order? Monitoring Area

5. Monitoring Concerns Capacity Performance Health Is the server up?Is there high CPU?Do I have enough disk space? Is my application generating exceptions? How quickly is my system processing messages? Can I handle month end batch jobs? Is the server up? Is there high CPU? Do I have enough disk space? Application Infrastructure Can users access the checkout cart? Are we meeting our SLAs? What is the impact of adding another customer? Business Capability

6. Interaction Type Proactive Reactive Passive The monitoring system can display metrics The monitoring system alerts me when something happens The monitoring system automatically takes actions to repair the system

7. A Monitoring Philosophy Business Capability Application Infrastructure Capacity Performance Health Monitoring Area Monitoring Concern Proactive Reactive Passive Interaction Type

8. Recap: What are we monitoring? Database • Is the web server up? • Is the database up? • Can the webserver talk to the db? Infrastructure PassiveHealth

10. 27 28 29 30 31 32 33 34 35 37 40 41 42 43 45

11. Recap: What are we monitoring? • Warn me with the queue length exceeds 50 Infrastructure ReactivePerformance

12. A Monitoring Philosophy Business Capability Application Infrastructure Capacity Performance Health Monitoring Area Monitoring Concern Proactive Reactive Passive Interaction Type

13. What happens when we distribute the systems?

14. Going Distributed EmailPDF CRM

15.

16.

17.

18. Let’s look at queue length

19. Queue Length • Queue length is an indicator of work still outstanding • High queue length doesn’t necessarily indicate a problem though Stable or decreasing is good Increasing is bad

20. Infrastructure Performance

21. Processing Time ⏱️ ⌛✔

22. Processing Time • Processing Time is the time taken to successfully process a message • Processing Time does not include error handling time • It is independent of queue wait time Stable or decreasing could be good Increasing is bad

23. PerformanceApplication

24.

25.

26. ✔⌛ ⏱️ Critical time ⏱️ Critical time = The entire time taken to process a message successfully

27. • Critical Time is the total duration between when a message is created to when it is processed Critical Time = Time in Queue + Processing Time + Retry Time + Network Latency Time Critical Time Stable or decreasing could be good Increasing is bad

28. Putting these together • Each of these metrics presents a piece of the puzzle • Look at them from an endpoint’s perspective, not per message • Looking at them together gives great insight into your system Critical Time Processing Time Queue LengthCritical Time Processing Time Queue LengthCritical Time Processing Time Queue Length

29.

30. Detecting Connectivity • Distributed systems typically work when other parts aren’t available • How do you know the endpoint you’re sending messages to is actually processing messages?

31. Detecting Connectivity Peer-to-peer connectivity tells us if an endpoint is actually processing messages from another

32.

33.

34. How do we collect all this info? ⏱️ • Processing Time • Critical Time • Queue Length • Connectivity • Reporting Metric • Message Type • Timestamp • Value • Reporting Metric (N bytes) • Message Type (N bytes) • Timestamp (8 bytes) • Value (8 bytes)

35. How do we collect all this info? • Epoch time (8 bytes) • Dictionary of Metric Types (n* (N + 4) bytes) • Dictionary of Message Types (n * (N + 4) bytes) • An array of: • Reporting Metric index (4 bytes) • Message Type index (4 bytes) • Epoch offset (4 bytes) • Value (8 bytes)

36. Getting all the data

37. Getting all the data

38. Getting all the data

39.

40. Techniques for monitoring Microservices William Brander @williambza Particular Software