Monitoring involves collecting logs, metrics and alerts to detect issues, while observability provides insight into internal system states. The presenter faced problems determining causes of performance drops. They will discuss starting with monitoring basics like logging, tracing and metrics. They will then explain how to transition to domain-oriented observability through techniques like aspect-oriented programming to better understand the system. Observability aims to answer any questions about internal states using monitoring tools.
2. 2. About me
● Took part in developing of microservice architecture based on
Event Sourcing and CQRS
● I have obtained a position of Tech Lead. I implemented Canary
release and Feature Toggles (aka Feature Flags), migrated
microservice from REST to Event-Driven
● Made a webinar about microservices testing.
● I am trying to apply best engineering practices to Safeguard Cyber project
● I want to build such a process in a company (team) in which it will be pleasant to work and develop
professionally. Where ideas will be heard, where a person will not need to sacrifice his family or
health for professional growth.
3. 3. Parts of presentation
● Why you should start thinking about monitoring and our case
● Theoretical minimum about monitoring
● How to switch from monitoring to Domain-oriented Observability
4. 4. Part 1
● Why you should start thinking about monitoring and our case
14. 14. Problems we faced with
● What is the cause of performance drop?
● What can lead to poor system performance?
● How can certain changes influence the system?
● Our product is difficult to fit in SLA
15. 15. Part 2
● Theoretical minimum about monitoring
21. “A trace is a representation of series
of causally related distributed events
that encode the end-to-end request
flow through a distributed system”
21. Tracing
28. ● Rate - the number of requests, per second, you services are serving.
● Errors - the number of failed requests per second.
● Duration - distributions of the amount of time each request takes.
28. Three key metrics by RED methodology
31. Automated alerts are essential to
monitoring. They allow you to spot
problems anywhere in your
infrastructure, so that you can
rapidly identify their causes and
minimize service degradation and
disruption. Alerts draw human
attention to the particular systems
that require observation,
inspection, and intervention.
31. Alerts
32. ● There should be people’s reaction
● Alert should have priority
● There should be possibility to disable notifications
● Alert should provide further instructions
32. Alert rules
34. 34. Part 3
● How to switch from monitoring to Domain-oriented Observability
35. Definition:
“In control theory, observability is a measure of how well internal states of a
system can be inferred from knowledge of its external outputs. The observability
and controllability of a system are mathematical duals.”
- Wikipedia
In English:
Can you understand what’s happening inside your code and system, simply by
asking questions using your tools? Can you answer any new question you think
of, or only the ones you prepared for?
35. Observability
47. ● The RED Method
● Monitoring Distributed Systems
● Domain-Oriented Observability
● Distributed Systems Observability by Cindy Sridharan
● Testing in Production, the safe way
● Deploy != Release part1 and part2
● SRE: Observability: Metric Namespaces and Structures
● Observability: Metric, Logging, and Tracing
● Decorator
● Monitoring in the time of Cloud Native
● https://www.elastic.co/learn
46. Resources: