SlideShare a Scribd company logo
1 of 32
OpenTelemetry For
Architects
Presented by Kevin Brockhoff
Apache 2.0 Licensed
Our
Agenda
● Where are current observability patterns
falling short?
● Who is OpenTelemetry and why should I
care?
● What are some recommended
OpenTelemetry deployment
architectures?
● How can I use OpenTelemetry to
incrementally improve telemetry
collection in applications?
Level
Setting
● Have you used ELK stack or other log
aggregator?
● Have you used an APM system?
● Have you used distributed tracing
before?
● Have you used OpenCensus?
● Have you used OpenTracing?
Who am I?
● Kevin Brockhoff - Senior
Consultant, Daugherty Business
Solutions
○ Solving difficult cloud adoption
challenges for Daugherty's
Fortune 500 clients
○ OpenTelemetry committer since
early stages of the project
○ Github:
https://github.com/kbrockhoff
○ Linkedin:
https://www.linkedin.com/in/kevi
n-brockhoff-a557877/
5
Observability 2.0
6
Why observability?
● Microservices create complex interactions.
● Failures don't exactly repeat.
● Debugging multi-tenancy is painful.
● Monitoring no longer can help us.
Cynefin Framework
Complex
7
Observability 1.0
8
Metrics Concepts
● Gauges
○ Instantaneous point-in-time value (e.g.
CPU utilization)
● Cumulative counters
○ Cumulative sums of data since process
start (e.g. request counts)
● Cumulative histogram
○ Grouped counters for a range of buckets
(e.g. 0-10ms, 11-20ms)
● Rates
○ The derivative of a counter, typically. (e.g.
requests per second)
9
Basic Observability Metrics Methods
● USE - Utilization, Saturation, and Errors
○ Resource-scoped
● RED - Rate, Errors, and Duration
○ Request-scoped
10
Tracing Concepts
● Span
○ Represents a single unit of work in a
system.
● Trace
○ Defined implicitly by its spans. A trace
can be thought of as a directed acyclic
graph of spans where the edges
between spans are defined as
parent/child relationships.
● Distributed Context
○ Contains the tracing identifiers, tags, and
options that are propagated from parent
to child spans.
11
Observability 1.0 Limitations
● Data ends up in 3 different datastores.
● Different types of data not correlated with each other.
● Observability is not necessarily insight.
12
Operational Complexity Growth
2010 2020
Circuit Breaker Homegrown w/ 3 configs Resilience4J w/ 14 configs
Retries End user clicks submit again Resilience4J w/ 7 configs
Health Check HTTP server and DB are live Kubernetes liveness,
readiness, and startup probes
with 5 timing configs per probe
Alerts Unread count on circuit
breaker opened email folder
???
From Observability 1.0 to 2.0
14
Observability 2.0 - PoC
● Deep Linking Metrics and Traces with OpenTelemetry, OpenMetrics and
M3 - Rob Skillington (Presentation @ KubeCon North America 2019)
○ Click on point in metrics graph to get representative traces
○ Click on trace span to get system metrics from server that produced the span
○ Click on trace span to get all application logs emitted during span
15
OpenTelemetry Project
Sandbox Project
OpenCensus + OpenTracing = OpenTelemetry
● OpenCensus:
○ Provides APIs and instrumentation that allow you to collect application metrics and
distributed tracing.
○ Provides oc-service and oc-agent middleware.
● OpenTracing:
○ Provides APIs for distributed tracing with implementations provided by tracing backend
vendors.
● OpenTelemetry:
○ An effort to combine distributed tracing, metrics and logging into a single set of system
components and language-specific libraries.
17
OpenTelemetry Project
● Specification
○ API (for application developers)
○ SDK Implementations
○ Transport Protocol (Protobuf - gRPC)
● Collector (middleware)
● SDK’s (various stages of maturity)
○ C++
○ C# (Auto-instrument/Manual)
○ Erlang
○ Go
○ JavaScript (Browser/Node)
○ Java (Auto-instrument/Manual)
■ Android compatibility
○ PHP
○ Python (Auto-instrument/Manual)
○ Ruby
○ Rust
○ Swift
Open Source Observability Platforms Supported
Contributors
20
W3C Distributed Tracing Working Group
● Trace Context – Level 1 -
Recommendation
● Propagation format for distributed trace
context: Baggage (rec-track)
● Trace Context: AMQP protocol (rec-
track)
● Trace Context: MQTT protocol (rec-
track)
● Trace Response Headers (rec-track)
● Trace Context Protocols Registry –
Group Note
● Trace Context: binary protocol (rec-
track)
● Trace Interchange Format (rec-track)
● Trace State Ids Registry (note)
21
Trace Context HTTP Headers
traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01
tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
version trace-id (128 bit) parent-id (64 bit) trace-flags (8 bit)
vendor-specific key/value pairs
Baggage: userId=sergey,serverNode=DF:28,isProduction=false
Draft Baggage header specification
Architecture
23
Deployment Architectures
Kubernetes Deployment - Proof of Concept
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter]
exporters: [prometheus]
Kubernetes Deployment - External Backends
service:
pipelines:
traces:
receivers: [otlp, zipkin]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
Kubernetes Deployment - Service Mesh
service:
pipelines:
traces:
receivers: [zipkin]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
metrics:
receivers: [statsd, prometheus]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
Application Server on VM Deployment
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
metrics:
receivers: [statsd, otlp]
processors: [memory_limiter, resource, ...]
exporters: [otlp]
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, queued_retry]
exporters: [commercial...]
28
Instrumentation Strategies
29
Greenfield Project Evolution
● Proof of Concept Demos
○ Sample App w/auto-instrumentation & direct exporters -> Jaeger & Prometheus
● Initial Development
○ Application libraries w/manual instrumentation -> In-memory and/or logging exporter
● Deployments during Development
○ Application w/SDK -> Collector (OTLP receiver) -> Cloud platform native monitoring
● Production
○ Applications w/SDK on hybrid cloud -> Collector (OTLP receiver) -> Latest and greatest
enterprise-wide observability platform
30
Already Instrumented Applications
● OpenCensus
○ Application -> Collector (OpenCensus receiver) -> Backend
● OpenTracing
○ Application w/OT + OpenTracing shim + SDK -> Collector (OTLP receiver) -> Backend
● Spring Boot
○ Application w/Micrometer -> Collector (Prometheus receiver) -> Backend
○ Application w/Spring Cloud Sleuth -> Collector (Zipkin receiver) -> Backend
● AWS
○ Application w/X-Ray SDK -> Collector (X-Ray receiver) -> Backend(s)
31
Non-instrumented Applications
● Java
○ Launch with OpenTelemetry Java Agent (support for 61 widely-used frameworks and
libraries)
● Javascript/Typescript
○ Add handlers/wrappers at key places or Node auto-instrumentation
● Microservice in any language
○ Deploy Envoy proxy as sidecar
● Infrastructure
○ Move to public cloud. AWS, Azure, GCP are all incorporating the OpenTelemety collector
in their infrastructure
32
Thank you!

More Related Content

What's hot

Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 

What's hot (20)

OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetry
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
 
Kubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoKubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with Demo
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 
Distributed Tracing with Jaeger
Distributed Tracing with JaegerDistributed Tracing with Jaeger
Distributed Tracing with Jaeger
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
KCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdfKCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdf
 
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetryJuraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 

Similar to OpenTelemetry For Architects

Similar to OpenTelemetry For Architects (20)

Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdf
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
Tracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxTracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptx
 
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
 
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
Manage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityManage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with Observability
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
 
FIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media Server
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
 
Go at uber
Go at uberGo at uber
Go at uber
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

OpenTelemetry For Architects

  • 1. OpenTelemetry For Architects Presented by Kevin Brockhoff Apache 2.0 Licensed
  • 2. Our Agenda ● Where are current observability patterns falling short? ● Who is OpenTelemetry and why should I care? ● What are some recommended OpenTelemetry deployment architectures? ● How can I use OpenTelemetry to incrementally improve telemetry collection in applications?
  • 3. Level Setting ● Have you used ELK stack or other log aggregator? ● Have you used an APM system? ● Have you used distributed tracing before? ● Have you used OpenCensus? ● Have you used OpenTracing?
  • 4. Who am I? ● Kevin Brockhoff - Senior Consultant, Daugherty Business Solutions ○ Solving difficult cloud adoption challenges for Daugherty's Fortune 500 clients ○ OpenTelemetry committer since early stages of the project ○ Github: https://github.com/kbrockhoff ○ Linkedin: https://www.linkedin.com/in/kevi n-brockhoff-a557877/
  • 6. 6 Why observability? ● Microservices create complex interactions. ● Failures don't exactly repeat. ● Debugging multi-tenancy is painful. ● Monitoring no longer can help us. Cynefin Framework Complex
  • 8. 8 Metrics Concepts ● Gauges ○ Instantaneous point-in-time value (e.g. CPU utilization) ● Cumulative counters ○ Cumulative sums of data since process start (e.g. request counts) ● Cumulative histogram ○ Grouped counters for a range of buckets (e.g. 0-10ms, 11-20ms) ● Rates ○ The derivative of a counter, typically. (e.g. requests per second)
  • 9. 9 Basic Observability Metrics Methods ● USE - Utilization, Saturation, and Errors ○ Resource-scoped ● RED - Rate, Errors, and Duration ○ Request-scoped
  • 10. 10 Tracing Concepts ● Span ○ Represents a single unit of work in a system. ● Trace ○ Defined implicitly by its spans. A trace can be thought of as a directed acyclic graph of spans where the edges between spans are defined as parent/child relationships. ● Distributed Context ○ Contains the tracing identifiers, tags, and options that are propagated from parent to child spans.
  • 11. 11 Observability 1.0 Limitations ● Data ends up in 3 different datastores. ● Different types of data not correlated with each other. ● Observability is not necessarily insight.
  • 12. 12 Operational Complexity Growth 2010 2020 Circuit Breaker Homegrown w/ 3 configs Resilience4J w/ 14 configs Retries End user clicks submit again Resilience4J w/ 7 configs Health Check HTTP server and DB are live Kubernetes liveness, readiness, and startup probes with 5 timing configs per probe Alerts Unread count on circuit breaker opened email folder ???
  • 14. 14 Observability 2.0 - PoC ● Deep Linking Metrics and Traces with OpenTelemetry, OpenMetrics and M3 - Rob Skillington (Presentation @ KubeCon North America 2019) ○ Click on point in metrics graph to get representative traces ○ Click on trace span to get system metrics from server that produced the span ○ Click on trace span to get all application logs emitted during span
  • 16. OpenCensus + OpenTracing = OpenTelemetry ● OpenCensus: ○ Provides APIs and instrumentation that allow you to collect application metrics and distributed tracing. ○ Provides oc-service and oc-agent middleware. ● OpenTracing: ○ Provides APIs for distributed tracing with implementations provided by tracing backend vendors. ● OpenTelemetry: ○ An effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries.
  • 17. 17 OpenTelemetry Project ● Specification ○ API (for application developers) ○ SDK Implementations ○ Transport Protocol (Protobuf - gRPC) ● Collector (middleware) ● SDK’s (various stages of maturity) ○ C++ ○ C# (Auto-instrument/Manual) ○ Erlang ○ Go ○ JavaScript (Browser/Node) ○ Java (Auto-instrument/Manual) ■ Android compatibility ○ PHP ○ Python (Auto-instrument/Manual) ○ Ruby ○ Rust ○ Swift
  • 18. Open Source Observability Platforms Supported
  • 20. 20 W3C Distributed Tracing Working Group ● Trace Context – Level 1 - Recommendation ● Propagation format for distributed trace context: Baggage (rec-track) ● Trace Context: AMQP protocol (rec- track) ● Trace Context: MQTT protocol (rec- track) ● Trace Response Headers (rec-track) ● Trace Context Protocols Registry – Group Note ● Trace Context: binary protocol (rec- track) ● Trace Interchange Format (rec-track) ● Trace State Ids Registry (note)
  • 21. 21 Trace Context HTTP Headers traceparent: 00-0af7651916cd43dd8448eb211c80319c-00f067aa0ba902b7-01 tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE version trace-id (128 bit) parent-id (64 bit) trace-flags (8 bit) vendor-specific key/value pairs Baggage: userId=sergey,serverNode=DF:28,isProduction=false Draft Baggage header specification
  • 24. Kubernetes Deployment - Proof of Concept service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, resource, ...] exporters: [otlp] metrics: receivers: [otlp, prometheus] processors: [memory_limiter, resource, ...] exporters: [otlp] service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch, queued_retry] exporters: [jaeger] metrics: receivers: [otlp] processors: [memory_limiter] exporters: [prometheus]
  • 25. Kubernetes Deployment - External Backends service: pipelines: traces: receivers: [otlp, zipkin] processors: [memory_limiter, resource, ...] exporters: [otlp] metrics: receivers: [otlp, prometheus] processors: [memory_limiter, resource, ...] exporters: [otlp] service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch, queued_retry] exporters: [commercial...] metrics: receivers: [otlp] processors: [memory_limiter, batch, queued_retry] exporters: [commercial...]
  • 26. Kubernetes Deployment - Service Mesh service: pipelines: traces: receivers: [zipkin] processors: [memory_limiter, resource, ...] exporters: [otlp] metrics: receivers: [statsd, prometheus] processors: [memory_limiter, resource, ...] exporters: [otlp] service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch, queued_retry] exporters: [commercial...] metrics: receivers: [otlp] processors: [memory_limiter, batch, queued_retry] exporters: [commercial...]
  • 27. Application Server on VM Deployment service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, resource, ...] exporters: [otlp] metrics: receivers: [statsd, otlp] processors: [memory_limiter, resource, ...] exporters: [otlp] service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch, queued_retry] exporters: [commercial...] metrics: receivers: [otlp] processors: [memory_limiter, batch, queued_retry] exporters: [commercial...]
  • 29. 29 Greenfield Project Evolution ● Proof of Concept Demos ○ Sample App w/auto-instrumentation & direct exporters -> Jaeger & Prometheus ● Initial Development ○ Application libraries w/manual instrumentation -> In-memory and/or logging exporter ● Deployments during Development ○ Application w/SDK -> Collector (OTLP receiver) -> Cloud platform native monitoring ● Production ○ Applications w/SDK on hybrid cloud -> Collector (OTLP receiver) -> Latest and greatest enterprise-wide observability platform
  • 30. 30 Already Instrumented Applications ● OpenCensus ○ Application -> Collector (OpenCensus receiver) -> Backend ● OpenTracing ○ Application w/OT + OpenTracing shim + SDK -> Collector (OTLP receiver) -> Backend ● Spring Boot ○ Application w/Micrometer -> Collector (Prometheus receiver) -> Backend ○ Application w/Spring Cloud Sleuth -> Collector (Zipkin receiver) -> Backend ● AWS ○ Application w/X-Ray SDK -> Collector (X-Ray receiver) -> Backend(s)
  • 31. 31 Non-instrumented Applications ● Java ○ Launch with OpenTelemetry Java Agent (support for 61 widely-used frameworks and libraries) ● Javascript/Typescript ○ Add handlers/wrappers at key places or Node auto-instrumentation ● Microservice in any language ○ Deploy Envoy proxy as sidecar ● Infrastructure ○ Move to public cloud. AWS, Azure, GCP are all incorporating the OpenTelemety collector in their infrastructure

Editor's Notes

  1. Copyright 2020, The OpenTelemetry Authors Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.