SlideShare une entreprise Scribd logo
1  sur  39
© 2020 SPLUNK INC.
The Whats and Whys
of Tracing
OpenTelemetry FTW
Dave McAllister
© 2020 SPLUNK INC.
Day 115 Senior Technical Evangelist
Dave McAllister
© 2020 SPLUNK INC.
“ Data is the driving
factor for
Observability
© 2020 SPLUNK INC.
EXAMPLE MICROSERVICE ARCHITECTURE
© 2020 SPLUNK INC.
DISTRIBUTED TRACE VISUALIZATION
© 2020 SPLUNK INC.
DISTRIBUTED TRACE DETAILS
© 2020 SPLUNK INC.
METRICS IN MICROSERVICE ARCHITECTURE
Code push
5x num
requests
CPU at 100%
for > 5 minutes
© 2020 SPLUNK INC.
LOGS IN MICROSERVICE ARCHITECTURE
Increase in
HTTP 503
© 2 0 2 0 S P L U N K I N C .
“ “Observability is not the
microscope. It’s the clarity of the
slide under the microscope.”
“ Baron Schwartz
© 2020 SPLUNK INC.
Data Collection
Standards-based agents, cloud-integration
Automated code instrumentation
Support for developer frameworks
Any code, any time
No cardinality limits
© 2020 SPLUNK INC.
What is OpenTelemetry?
OpenCensus
+ =
OpenTelemetry: the next major version
of both OpenTracing and OpenCensus
© 2020 SPLUNK INC.
Cloud Native Telemetry
foreach(language)
foreach(language)
collectors, sidecars, etc
w3c trace-context, wire formats for
trace data, metrics, logs, etc
Telemetry “verticals”
Telemetry
“layers”
Tracing Metrics Logs, etc
Instrumentation APIs
Canonical
implementations
Data infrastructure
Interop formats
© 2020 SPLUNK INC.
CNCF DevStats
• General: 149 companies
• Contributors: 660+ unique contributors and 60K+ contributions
Community Stats
• Cloud Providers: Azure and GCP
• Users (and contributors): Mailchimp, Postmates, Shopify, Zillow
CNCF Project Collaboration
• Fluentbit: Potential log agent for OpenTelemetry
• Jaeger: Plan to leverage client libraries and collector (collector
already announced)
Project Stats
© 2020 SPLUNK INC.
is the second most active
project in CNCF today!
(per CNCF DevStats)
© 2020 SPLUNK INC.
Why do you
want to trace?
© 2020 SPLUNK INC.
What problems are you trying to solve?
- Performance issues
- Mean time to resolution and detection is too high
- Metrics and logs are missing valuable context.
- More data types can provide better answers
A tiny subset of answers*
© 2020 SPLUNK INC.
Gaps in current observability method
Difficult to correlate observed behavior with what is operating.
Difficult to collect data in different formats
No (observable) gaps and want to try something new
A tiny subset of answers*
© 2020 SPLUNK INC.
Goals of tracing
Which teams/components would benefit the most?
How much resources are available to this effort?
What is the short term goal and long term goal?
If already in use, how is adoption? What are possible roadblocks to higher
adoption?
Why tracing now?
Questions to ask yourself and of your organization
© 2020 SPLUNK INC.
Architecture
© 2020 SPLUNK INC.
Components
1. Specifications
a. API
b. SDK
c. Data
2. Collector
a. Vendor-agnostic way to receive,
process, and export data
b. Default way to collect instrumented apps
c. Can be deployed as an agent or service
3. Client Libraries
a. Vendor-agnostic app instrumentation
b. Support for traces and metrics
c. Automatic trace instrumentation
4. Incubating: Logging
Status = Beta for Traces + Metrics:
● Collector
● Erlang
● Go
● Java (including auto instrumentation)
● Javascript (including web)
● Python (auto instrumentation planned)
Coming soon:
● .NET (auto instrumentation planned)
● Ruby (auto instrumentation planned)
© 2020 SPLUNK INC.
Reference Architecture: OpenTelemetry
Host
Application
Otel Library
Otel Collector
(Agent)
Traces + Metrics
Metrics
Back-end 1
Back-end 2
Host
Application
Otel Library
Otel Collector
(Agent)
Otel Collector
(Service)
© 2020 SPLUNK INC.
Specifications
© 2020 SPLUNK INC.
Tracing Basics
● Context: W3C trace-context, B3, etc.
● Tracer: get context
● Spans: “call” in a trace
○ Kind: client/server, producer/consumer, internal
○ Attributes: key/value pairs; tags; metadata
○ Events: named strings
○ Links: useful for batch operations
● Sampler: always, probabilistic, etc.
● Span processor: simple, batch, etc.
● Exporter: OTLP, Jaeger, Prometheus, etc.
© 2020 SPLUNK INC.
Tracing and Semantic Conventions
In OpenTelemetry, spans can be created freely and it’s up to the implementor to
annotate them with attributes specific to the represented operation.
Some span operations represent calls that use well-known protocols like HTTP or
database calls. It is important to unify attribution.
● HTTP: http.method, http.status_code
● Database: db.type, db.instance, db.statement
● Messaging: messaging.system, messaging.destination
● FaaS: faas.trigger
© 2020 SPLUNK INC.
Metrics Basics
● Context: span and correlation
● Meter: used to record a measurement
● Raw Measurement
○ Measure: name, description, unit of values
○ Measurement: single value of a measure
● Metric: a measurement
○ Kind: counter, measure, observer
○ Label: key/value pair; tag; metadata
● Aggregation
● Time
© 2020 SPLUNK INC.
Resource SDK + Semantic Conventions
A Resource is an immutable representation of the entity producing telemetry.
All of these
• Environment: Attributes defining a running environment (e.g. cloud)
• Compute instance: Attributes defining a computing instance (e.g. host)
• Deployment service: Attributes defining a deployment service (e.g. k8s).
• Compute unit: Attributes defining a compute unit (e.g. container, process)
© 2020 SPLUNK INC.
OpenTelemetry and Logs (Incubating!)
● The Log Data Model Specification : https://github.com/open-telemetry/oteps/blob/master/text/logs/0097-log-data-
model.md#motivation
● Designed to map existing log formats and be semantically meaningful
● Mapping between log formats should be possible
● Three sorts of logs and events
○ System Formats
○ Third-party applications
○ First-party applications
© 2020 SPLUNK INC.
OpenTelemetry and Logs
Field Name Description
Timestamp Time when the event occurred.
TraceId Request trace id.
SpanId Request span id.
TraceFlags W3C trace flag.
SeverityText The severity text (also known as log level).
SeverityNumber Numerical value of the severity.
Name Short event identifier.
Body The body of the log record.
Resource Describes the source of the log.
Attributes Additional information about the event.
Two Field Kinds:
● Named top-level fields
● Fields stored in key/value
pairs
© 2020 SPLUNK INC.
Debugging for complex systems is iterative
● Start with a high-level metric
● Drill down and detangle based on
fine-grained data/observations
● Make the right deductions
based on the evidence
Observability drives Evidence-
based Debugging
© 2020 SPLUNK INC.
Collector
© 2020 SPLUNK INC.
Objectives
The OpenTelemetry Collector offers a vendor-agnostic implementation on how
to receive, process, and export telemetry data in a seamless way.
● Usable: Reasonable default configuration, supports popular protocols, runs
and collects out of the box.
● Performant: Highly performant under varying loads and configurations.
● Observable: An exemplar of an observable service.
● Extensible: Customizable without touching the core code.
● Unified: Single codebase, deployable as an agent or collector with support
for traces, metrics, and logs (future).
© 2020 SPLUNK INC.
But Why?
● Offload responsibility from the application
○ Compression
○ Encryption
○ Retry
○ Tagging / Redaction
○ Vendor-specific exporting
● Time-to-value
○ Language-agnostic; makes changes easier
○ Set it and forget it; instrumentation that is ready for the Collector
○ Vendor-agnostic and easily extensible
© 2020 SPLUNK INC.
Architecture
Otel Collector
Receivers
Exporters
Batch ...
Queued
Retry
Processors
Extensions: health, pprof, zpages
OTLP
Jaeger
Prometheus
OTLP
Jaeger
Prometheus
Batch ...
Queued
Retry
Processors
© 2020 SPLUNK INC.
Getting Started: Traces (Automatic)
java -javaagent:path/to/opentelemetry-auto-<version>.jar 
-Dota.exporter.jar=path/to/opentelemetry-auto-exporters-otlp-<version>.jar 
-Dota.exporter.otlp.endpoint=localhost:55680 
-Dota.exporter.otlp.service.name=shopping 
-jar myapp.jar
● Instruments known libraries with no code (only runtime) changes
● Adheres to semantic conventions
● Configurable via environment and/or runtime variables
● Can co-exist with manual instrumentation
WARNING: Do not use two different auto-instrumentation solutions on the same service.
© 2020 SPLUNK INC.
Roadmap
● Rest of client libraries to beta ASAP
● Move to GA later this year for traces and metrics
● Tracing auto instrumentation for all languages
● Add initial log support (goal of beta later this year)
● Improve documentation
● Increase adoption; get case studies
● Make getting started really easy
© 2020 SPLUNK INC.
Problem solving
What questions do you ask yourself when you are being paged?
How do you filter out noise?
How do you determine what isn’t causing an issue?
How do you determine impact?
Imagine being paged
© 2020 SPLUNK INC.
A span for everything or bare minimum
Remember why you want to trace
Rule of thumb: Start with service boundaries and 3rd party calls
Iterative process
There is information overload
Make it easy for teams to get tracing (for free or almost free)
It depends
© 2020 SPLUNK INC.
Next Steps
● Join the conversation: https://gitter.im/open-telemetry/community
● Join a SIG: https://github.com/open-telemetry/community#special-interest-groups
● Submit a PR (consider good-first-issue and help-wanted labels)
Thank You
© 2020 SPLUNK INC.

Contenu connexe

Similaire à Tracing-for-fun-and-profit.pptx

5G-USA-Telemetry
5G-USA-Telemetry5G-USA-Telemetry
5G-USA-Telemetry
snrism
 

Similaire à Tracing-for-fun-and-profit.pptx (20)

OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
Big Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and AnalyticsBig Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and Analytics
 
EDA for QAs
EDA for QAsEDA for QAs
EDA for QAs
 
MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
5G-USA-Telemetry
5G-USA-Telemetry5G-USA-Telemetry
5G-USA-Telemetry
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
Getting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of Concepts
 
Introduction to Data Models & Cisco's NextGen Device Level APIs: an overview
Introduction to Data Models & Cisco's NextGen Device Level APIs: an overviewIntroduction to Data Models & Cisco's NextGen Device Level APIs: an overview
Introduction to Data Models & Cisco's NextGen Device Level APIs: an overview
 
Performance testingfromthecloud_usingBlazemeter
Performance testingfromthecloud_usingBlazemeterPerformance testingfromthecloud_usingBlazemeter
Performance testingfromthecloud_usingBlazemeter
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing
 
Observability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptxObservability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptx
 
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdfSwisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
 
Guru charan_Resume
Guru charan_ResumeGuru charan_Resume
Guru charan_Resume
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Tracing-for-fun-and-profit.pptx

  • 1. © 2020 SPLUNK INC. The Whats and Whys of Tracing OpenTelemetry FTW Dave McAllister
  • 2. © 2020 SPLUNK INC. Day 115 Senior Technical Evangelist Dave McAllister
  • 3. © 2020 SPLUNK INC. “ Data is the driving factor for Observability
  • 4. © 2020 SPLUNK INC. EXAMPLE MICROSERVICE ARCHITECTURE
  • 5. © 2020 SPLUNK INC. DISTRIBUTED TRACE VISUALIZATION
  • 6. © 2020 SPLUNK INC. DISTRIBUTED TRACE DETAILS
  • 7. © 2020 SPLUNK INC. METRICS IN MICROSERVICE ARCHITECTURE Code push 5x num requests CPU at 100% for > 5 minutes
  • 8. © 2020 SPLUNK INC. LOGS IN MICROSERVICE ARCHITECTURE Increase in HTTP 503
  • 9. © 2 0 2 0 S P L U N K I N C . “ “Observability is not the microscope. It’s the clarity of the slide under the microscope.” “ Baron Schwartz
  • 10. © 2020 SPLUNK INC. Data Collection Standards-based agents, cloud-integration Automated code instrumentation Support for developer frameworks Any code, any time No cardinality limits
  • 11. © 2020 SPLUNK INC. What is OpenTelemetry? OpenCensus + = OpenTelemetry: the next major version of both OpenTracing and OpenCensus
  • 12. © 2020 SPLUNK INC. Cloud Native Telemetry foreach(language) foreach(language) collectors, sidecars, etc w3c trace-context, wire formats for trace data, metrics, logs, etc Telemetry “verticals” Telemetry “layers” Tracing Metrics Logs, etc Instrumentation APIs Canonical implementations Data infrastructure Interop formats
  • 13. © 2020 SPLUNK INC. CNCF DevStats • General: 149 companies • Contributors: 660+ unique contributors and 60K+ contributions Community Stats • Cloud Providers: Azure and GCP • Users (and contributors): Mailchimp, Postmates, Shopify, Zillow CNCF Project Collaboration • Fluentbit: Potential log agent for OpenTelemetry • Jaeger: Plan to leverage client libraries and collector (collector already announced) Project Stats
  • 14. © 2020 SPLUNK INC. is the second most active project in CNCF today! (per CNCF DevStats)
  • 15. © 2020 SPLUNK INC. Why do you want to trace?
  • 16. © 2020 SPLUNK INC. What problems are you trying to solve? - Performance issues - Mean time to resolution and detection is too high - Metrics and logs are missing valuable context. - More data types can provide better answers A tiny subset of answers*
  • 17. © 2020 SPLUNK INC. Gaps in current observability method Difficult to correlate observed behavior with what is operating. Difficult to collect data in different formats No (observable) gaps and want to try something new A tiny subset of answers*
  • 18. © 2020 SPLUNK INC. Goals of tracing Which teams/components would benefit the most? How much resources are available to this effort? What is the short term goal and long term goal? If already in use, how is adoption? What are possible roadblocks to higher adoption? Why tracing now? Questions to ask yourself and of your organization
  • 19. © 2020 SPLUNK INC. Architecture
  • 20. © 2020 SPLUNK INC. Components 1. Specifications a. API b. SDK c. Data 2. Collector a. Vendor-agnostic way to receive, process, and export data b. Default way to collect instrumented apps c. Can be deployed as an agent or service 3. Client Libraries a. Vendor-agnostic app instrumentation b. Support for traces and metrics c. Automatic trace instrumentation 4. Incubating: Logging Status = Beta for Traces + Metrics: ● Collector ● Erlang ● Go ● Java (including auto instrumentation) ● Javascript (including web) ● Python (auto instrumentation planned) Coming soon: ● .NET (auto instrumentation planned) ● Ruby (auto instrumentation planned)
  • 21. © 2020 SPLUNK INC. Reference Architecture: OpenTelemetry Host Application Otel Library Otel Collector (Agent) Traces + Metrics Metrics Back-end 1 Back-end 2 Host Application Otel Library Otel Collector (Agent) Otel Collector (Service)
  • 22. © 2020 SPLUNK INC. Specifications
  • 23. © 2020 SPLUNK INC. Tracing Basics ● Context: W3C trace-context, B3, etc. ● Tracer: get context ● Spans: “call” in a trace ○ Kind: client/server, producer/consumer, internal ○ Attributes: key/value pairs; tags; metadata ○ Events: named strings ○ Links: useful for batch operations ● Sampler: always, probabilistic, etc. ● Span processor: simple, batch, etc. ● Exporter: OTLP, Jaeger, Prometheus, etc.
  • 24. © 2020 SPLUNK INC. Tracing and Semantic Conventions In OpenTelemetry, spans can be created freely and it’s up to the implementor to annotate them with attributes specific to the represented operation. Some span operations represent calls that use well-known protocols like HTTP or database calls. It is important to unify attribution. ● HTTP: http.method, http.status_code ● Database: db.type, db.instance, db.statement ● Messaging: messaging.system, messaging.destination ● FaaS: faas.trigger
  • 25. © 2020 SPLUNK INC. Metrics Basics ● Context: span and correlation ● Meter: used to record a measurement ● Raw Measurement ○ Measure: name, description, unit of values ○ Measurement: single value of a measure ● Metric: a measurement ○ Kind: counter, measure, observer ○ Label: key/value pair; tag; metadata ● Aggregation ● Time
  • 26. © 2020 SPLUNK INC. Resource SDK + Semantic Conventions A Resource is an immutable representation of the entity producing telemetry. All of these • Environment: Attributes defining a running environment (e.g. cloud) • Compute instance: Attributes defining a computing instance (e.g. host) • Deployment service: Attributes defining a deployment service (e.g. k8s). • Compute unit: Attributes defining a compute unit (e.g. container, process)
  • 27. © 2020 SPLUNK INC. OpenTelemetry and Logs (Incubating!) ● The Log Data Model Specification : https://github.com/open-telemetry/oteps/blob/master/text/logs/0097-log-data- model.md#motivation ● Designed to map existing log formats and be semantically meaningful ● Mapping between log formats should be possible ● Three sorts of logs and events ○ System Formats ○ Third-party applications ○ First-party applications
  • 28. © 2020 SPLUNK INC. OpenTelemetry and Logs Field Name Description Timestamp Time when the event occurred. TraceId Request trace id. SpanId Request span id. TraceFlags W3C trace flag. SeverityText The severity text (also known as log level). SeverityNumber Numerical value of the severity. Name Short event identifier. Body The body of the log record. Resource Describes the source of the log. Attributes Additional information about the event. Two Field Kinds: ● Named top-level fields ● Fields stored in key/value pairs
  • 29. © 2020 SPLUNK INC. Debugging for complex systems is iterative ● Start with a high-level metric ● Drill down and detangle based on fine-grained data/observations ● Make the right deductions based on the evidence Observability drives Evidence- based Debugging
  • 30. © 2020 SPLUNK INC. Collector
  • 31. © 2020 SPLUNK INC. Objectives The OpenTelemetry Collector offers a vendor-agnostic implementation on how to receive, process, and export telemetry data in a seamless way. ● Usable: Reasonable default configuration, supports popular protocols, runs and collects out of the box. ● Performant: Highly performant under varying loads and configurations. ● Observable: An exemplar of an observable service. ● Extensible: Customizable without touching the core code. ● Unified: Single codebase, deployable as an agent or collector with support for traces, metrics, and logs (future).
  • 32. © 2020 SPLUNK INC. But Why? ● Offload responsibility from the application ○ Compression ○ Encryption ○ Retry ○ Tagging / Redaction ○ Vendor-specific exporting ● Time-to-value ○ Language-agnostic; makes changes easier ○ Set it and forget it; instrumentation that is ready for the Collector ○ Vendor-agnostic and easily extensible
  • 33. © 2020 SPLUNK INC. Architecture Otel Collector Receivers Exporters Batch ... Queued Retry Processors Extensions: health, pprof, zpages OTLP Jaeger Prometheus OTLP Jaeger Prometheus Batch ... Queued Retry Processors
  • 34. © 2020 SPLUNK INC. Getting Started: Traces (Automatic) java -javaagent:path/to/opentelemetry-auto-<version>.jar -Dota.exporter.jar=path/to/opentelemetry-auto-exporters-otlp-<version>.jar -Dota.exporter.otlp.endpoint=localhost:55680 -Dota.exporter.otlp.service.name=shopping -jar myapp.jar ● Instruments known libraries with no code (only runtime) changes ● Adheres to semantic conventions ● Configurable via environment and/or runtime variables ● Can co-exist with manual instrumentation WARNING: Do not use two different auto-instrumentation solutions on the same service.
  • 35. © 2020 SPLUNK INC. Roadmap ● Rest of client libraries to beta ASAP ● Move to GA later this year for traces and metrics ● Tracing auto instrumentation for all languages ● Add initial log support (goal of beta later this year) ● Improve documentation ● Increase adoption; get case studies ● Make getting started really easy
  • 36. © 2020 SPLUNK INC. Problem solving What questions do you ask yourself when you are being paged? How do you filter out noise? How do you determine what isn’t causing an issue? How do you determine impact? Imagine being paged
  • 37. © 2020 SPLUNK INC. A span for everything or bare minimum Remember why you want to trace Rule of thumb: Start with service boundaries and 3rd party calls Iterative process There is information overload Make it easy for teams to get tracing (for free or almost free) It depends
  • 38. © 2020 SPLUNK INC. Next Steps ● Join the conversation: https://gitter.im/open-telemetry/community ● Join a SIG: https://github.com/open-telemetry/community#special-interest-groups ● Submit a PR (consider good-first-issue and help-wanted labels)
  • 39. Thank You © 2020 SPLUNK INC.