On parle d’observabilité des services lorsque ceux-ci exposent des états et métriques internes pour améliorer la disponibilité globale.
Qu’en est-il de l’observabilité des infrastructures sur lesquelles ils sont déployés, configurés et maintenus ?
Les différents logs (centralisés, agrégés) permettent un bon début d’analyse mais il faut aussi observer les systèmes au fil de l’eau pour tracer chaque changement et les corréler avec le monitoring. Aujourd’hui, ces étapes de configuration IT devraient être prises en charge par les outils de gestion de configuration, qui deviennent la passerelle vers l’observabilité des opérations.
Nous montrerons l'intérêt de cette approche pour la gestion IT moderne avec un retour d’expérience sur les challenges de leur mise en place dans Rudder, notre solution libre d’audit et de gestion de configuration en continu.
Project Based Learning (A.I).pptx detail explanation
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par Nicolas Charles
1. OSIS 2019
THE OPEN SOURCE
INNOVATION SPRING 2019
@nico_charles
nicolas@rudder.io
Qu’apporte l’observabilité à la
gestion de configuration ?
2. OSIS 2019How are the systems?
Does no error nor change in logs mean success?
Aren’t we missing something?
3. OSIS 2019Definition
Configuration management is a systems
engineering process for establishing and
maintaining consistency of a product [...]
throughout its life.
Configuration_management
“
4. OSIS 2019Let's remember: What does configuration management do?
configuration
target state
feedbackconfiguration
5. OSIS 2019Let's remember: What does configuration management do?
configuration
target state
feedbackconfiguration
feedbackconfiguration
feedbackconfiguration
6. OSIS 2019Main challenges faced nowadays
DEV QA PRODUCTION RECOVERY
DEV SEC OPSMGMT EXTERN
Multiple teams, diluted expertise, harder reporting
Heterogeneous systems, reduced visibility, ease of use and understanding
7. OSIS 2019Getting and understanding the info is complex
Operators, Managers, Experts, APIs have differents needs
Frustration when we need a third party to obtain relevant data
We mistrust what we don’t understand
8. OSIS 2019Definition (again)
Observability is a measure of how well
internal states of a system can be inferred
from knowledge of its external outputs.
Observability
“
9. OSIS 2019Monitoring VS Observability: having a factual & deep insight
monitoring observabilityVS
10. OSIS 2019Why we need Observability in Configuration Management?
Causality AgencyPerspective
trust and prove
configuration states
provide insights
relevant to different needs
help teams find
the best levers
for their job
A
B
27. OSIS 2019Causality and dependencies of events
Diagnostic on infrastructures is hard
● Many systems
● Dependencies across systems
● Many actors involved
An issue on one component can impact hundred systems
We need to separate the causes from the symptoms
28. OSIS 2019Causality and dependencies of events
Monitoring can only correlate
Events happen on the whole infrastructure
Causes and precedences help root cause analysis
29. OSIS 2019Event sourcing & Tracing
Terminology (Dapper & OpenTracing)
Trace: Description of a “transaction” as it moves through systems
Span: Named and timed operation, piece of workflow (+ tags and logs)
Span context: Trace information that accompanies the transaction
30. OSIS 2019Event sourcing & Tracing
What’s in a span?
Operation name
Start & end timestamps
Tags: Set of key:value
Logs: Set of key:value
SpanContext
31. OSIS 2019Event sourcing & Tracing
Temporal relationships between Spans in a single Trace
https://www.jaegertracing.io/docs/1.9/architecture/
32. OSIS 2019Event sourcing & Tracing
Configuration Management: What would be the traces?
Defining the infrastructure state is a trace
Each changes before validation is a span
Validating results in a change request closes the trace
Computing the nodes configurations is a trace
Computing targets, overrides and generating files are spans
Closes with the serialization of the nodes configurations in database
Each run on an node is a trace
Each configuration check is a span
33. OSIS 2019Event sourcing & Tracing
PARAM
RULE
● Id
DIRECTIVE
● Id
● (Components)
GROUP
● Id
Environmental
context
● Id : . . .
● Generated : . . .
Files
Node configuration
Commit Id
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
RUN
● Reports
● Reports
● ...
● ...
METADATA
● node id
● config id
● run timestamp
● Signature
Get config
Send configuration
reports
Expected reports
(node id, config id,
timestamp)
Run reports
Historisation
Compliance
historised
Store expected reportsMetadata
● Integrity
● CommitId
● Signature
Config
● For Rule R,
Directive D1,
Component C
Event logs
Change request
Defining state
Trace + Spans
Trace
Run: Trace
Each step: span
Message
bus
Message
bus
34. OSIS 2019Event sourcing & Tracing
Store Traces & Events:
● Integrate with systems in place
● Many tools are compatible with OpenTracing
Correlate with non-observable systems
35. OSIS 2019What to do of these billions events?
Reactive approach
Query, search and analyze traces in case of problems
Proactive approach
Process mining: Machine Learning on these events
Detect unusual behaviours
Outliers
Inconsistencies across systems
37. OSIS 2019
THE OPEN SOURCE
INNOVATION SPRING 2019
@nico_charles
nicolas@rudder.io
Thank you !
Any questions ?
38. OSIS 2019Security?
Events, trace and logs hold critical data
Within a simple system, security can be built-in
AuthN/AuthZ
For distributed system, it’s much harder
Who can see what?
Who defines and enforces the authorizations?
Partial visibility of events/traces
Tags on events for authorizations