SlideShare une entreprise Scribd logo
1  sur  32
OpenTelemetry For
Operators
Presented by Kevin Brockhoff
Apache 2.0 Licensed
Our
Agenda
● Why are current observability platforms
falling short?
● What OpenTelemetry features address
these issues?
● How do I run OpenTelemetry
components in production?
● Who are the innovators in the
observability space?
Level
Setting
● Have you used ELK stack or other log
aggregator?
● Have you used an APM system?
● Have you used distributed tracing
before?
Who am I?
● Kevin Brockhoff - Senior
Consultant, Daugherty Business
Solutions
○ Solving difficult cloud adoption
challenges for Daugherty's
Fortune 500 clients
○ OpenTelemetry committer since
early stages of the project
○ Github:
https://github.com/kbrockhoff
○ Linkedin:
https://www.linkedin.com/in/kevi
n-brockhoff-a557877/
5
Observability Today
6
Enterprise Applications
● Only instrumented with logging during initial development.
○ Logging oriented toward development, not operations
● Metrics and tracing only added later if at all as a separate project.
○ Each team creates their own system using familiar tools
○ Or enterprise commits to a specific APM vendor
● Logs, metrics and traces are never connected.
7
First Generation Observability Platforms
Search logs in ELK,
Lack context
Homegrown tracing per
app mainly accessible by
developers
Customer experience
metrics
Low-level metrics
and alerts
8
OpenTelemetry Project
OpenCensus + OpenTracing = OpenTelemetry
● OpenTracing:
○ Provides APIs and instrumentation for distributed tracing
● OpenCensus:
○ Provides APIs and instrumentation that allow you to collect application metrics and
distributed tracing.
● OpenTelemetry:
○ An effort to combine distributed tracing, metrics and logging into a single set of system
components and language-specific libraries.
10
OpenTelemetry Project
● Specification
○ API (for application developers)
○ SDK Implementations
○ Transport Protocol (Protobuf)
● Collector (middleware)
● SDK’s (various stages of maturity)
○ C++
○ C# (Auto-instrument/Manual)
○ Erlang
○ Go
○ JavaScript (Browser/Node)
○ Java (Auto-instrument/Manual)
■ Android compatibility
○ PHP
○ Python (Auto-instrument/Manual)
○ Ruby
○ Rust
○ Swift
From Observability 1.0 to 2.0
12
OpenTelemetry Collector
13
OpenTelemetry Collector
● Offers a vendor-agnostic implementation on how to receive, process and
export telemetry data.
● Removes the need to run, operate and maintain multiple
agents/collectors.
● Support open-source telemetry data formats (e.g. OTLP, Jaeger,
Prometheus, etc.) sending to multiple open-source or commercial back-
ends.
14
Collector Concepts
● Telemetry data processing pipelines
○ Per pipeline: Receiver(s) -> Processors -> Exporter(s)
○ Currently only single telemetry type pipelines supported
● Extensions
○ Supporting functionality
○ Core collector extensions
■ health_check - HTTP endpoint for load balancer or k8s controller
■ zpages - Internal processing metrics and traces accessible via HTTP
■ pprof - Performance profiler enables the golang net/http/pprof endpoint
Collector Bundled Receivers
Traces
● Jaeger
○ Compact Thrift, Binary Thrift, HTTP,
gRPC
○ Sampling strategy configuration server
● Kafka
○ OTLP, Jaeger, Zipkin data structures
● OpenCensus
● OTLP (OpenTelemetry Protocol)
○ gRPC, HTTP
● Zipkin
○ v1, v1 Thrift, v2, v2 Protobuf
Metrics
● Host metrics scrapper
○ cpu, disk, load, filesystem, memory,
network, processes, swap, process
● Kafka
○ OTLP
● OpenCensus
● OTLP (OpenTelemetry Protocol)
○ gRPC, HTTP
● Prometheus
○ Full discovery and polling capabilities
Logs
● Fluent Forward
○ Spec compliant except no mTLS
Collector Contrib Receivers
Traces
● AWS X-Ray
● SignalFX APM v1
Metrics
● AWS ECS Container
● Carbon
● CollectD (JSON only)
● Docker Stats
● Kubernetes Cluster
● Kubernetes Kubelet
● Prometheus Exporters
● Redis INFO
● SignalFX
● Splunk HEC
● StatsD
● Wavefront
Logs
● SignalFX (Events)
● Stanza
Collector Bundled Processors
● Attributes
○ Modifies span attributes
● Batch
○ Groups data into batches
● Filter
○ Include/exclude metrics by name
● Group by Trace
○ Holds all spans for a trace for a set time
and then sends to next processor
● Memory Limiter
○ Prevents out-of-memory issues by
triggering GC
○ Configuration must be matched with
ballast setting collector is launched with
● Queued Retry
○ Deprecated, each exporter now
implements
● Resource
○ Applies changes to Resource attributes
● Probabilistic Sampling
○ Adjusts TraceID hash-based sampling
decisions by sampling.priority
attribute value
● Tail Sampling
○ Sampling decisions based on configured
attribute values and rate limits
● Span
○ Modifies span name or attributes based
on span name
18
Recommended Processor Configuration
Traces
memory_limiter
any sampling processors
batch
any other processors
Metrics
memory_limiter
any filtering processors
batch
any other processors
Memory limiter ballast_size_mib must match --mem-ballast-size-mib command line
parameter. Trigger GC with either limit_mib / spike_limit_mib or limit_percentage /
spike_limit_percentage.
Collector Contrib Processors
● Kubernetes
○ Adds metadata from pod
● Metrics Transform
○ Renames/aggregations within individual
metrics
● Resource Detection
○ OTEL_RESOURCE environment variable
○ GCE metadata server
○ EC2 instance metadata server
● Routing
○ Route to particular exporter based on
incoming header value
TODO
● Span data sharding by TraceID
Collector Bundled Exporters
Traces
● File
○ JSON format
● Jaeger
○ v2 gRPC
● Kafka
○ OTLP, Jaeger, Zipkin
● Logging
○ Debugging
● OpenCensus
● OTLP (OpenTelemetry Protocol)
● Zipkin
○ v2 JSON or Protobuf
Metrics
● File
○ JSON format
● Logging
○ Debugging
● OpenCensus
● OTLP (OpenTelemetry Protocol)
● Prometheus
○ Metrics endpoint for Prometheus to pull
from
● Prometheus Remote Write
○ Pushes metrics in Prometheus
TimeSeries format (Cortex, etc.)
Collector Contrib Exporters
Traces
● AlibabaCloud LogService
● AWS X-Ray
● Azure Monitor
● Datadog
● Elastic
● Honeycomb
● Jaeger v1 Thrift
● AWS Kinesis (Jaeger proto)
● New Relic
● SignalFX APM
● Sentry
● Stackdriver
Metrics
● AlibabaCloud LogService
● AWS CloudWatch EMF
● Carbon
● Datadog
● Elastic
● New Relic
● SignalFX
● Splunk HEC
● Stackdriver
Vendor Hosted Exporters
Traces
● Dynatrace OneAgent
● Lightstep Launchers
Metrics
● Dynatrace OneAgent
● Lightstep Launchers
receivers:
otlp:
protocols:
grpc:
max_recv_msg_size_mib: 32
max_concurrent_streams: 16
read_buffer_size: 1024
write_buffer_size: 1024
keepalive:
server_parameters:
max_connection_idle: 10s
processors:
memory_limiter:
ballast_size_mib: 192
check_interval: 5s
limit_mib: 448
spike_limit_mib: 64
batch:
send_batch_size: 64
timeout: 15s
exporters:
jaeger:
endpoint: jaeger.monitoring.svc.storefront-development.local.:14250
timeout: 10s
sending_queue:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
prometheusremotewrite:
namespace: "monitoring"
sending_queue:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
endpoint: ":8888"
ca_file: "/etc/pki/tls/certs/carbon-lb.pem"
write_buffer_size: 524288
headers:
Prometheus-Remote-Write-Version: "0.1.0"
X-Scope-OrgID: 234
extensions:
health_check:
port: 13133
zpages:
endpoint: :55679
service:
extensions: [zpages, health_check]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
Full Configuration File Example
Collector Command Line Example
/usr/local/bin/otelcol 
--config=/usr/local/etc/otel-collector-config.yaml 
--mem-ballast-size-mib=192 
--log-level=DEBUG
25
Collector Docker Images
● otel/opentelemetry-collector
○ Core receivers, processors, and exporters bundled in
● otel/opentelemetry-collector-contrib
○ All core and contrib receivers, processors, and exporters bundled in
● OpenTelemetry Collector builder
○ https://github.com/observatorium/opentelemetry-collector-builder
26
Other Collector Installs
● RPM
○ Produced by opentelemetry-collector build
● Debian
○ Produced by opentelemetry-collector build
27
Observing the Collector
● health_check
○ http://<hostname>:13133/ returns basic
pipeline availability
● zpages
○ RPC metric aggregations at
http://<hostname>:55679/debug/rpcz
○ Trace summaries at
http://<hostname>:55679/debug/tracez
● prometheus
○ Pipeline metrics scrap endpoint at
http://<hostname>:8888/metrics
28
Current Gotchas
● Errors propagated back through pipelines and instances in the chain
○ Errors reported by SDK exporters in the applications may be coming from two hops
downstream
● TraceID sharding not working correctly
○ Can only do tail-based sampling if running single instance of collector
29
Observability Platform Innovations
30
Latest Innovations
● Dynatrace automates manual quality validation processes using AI-
assisted SLI/SLO-based quality gates.
● New Relic Incident Intelligence continuously analyzes alerts and incident
data to find patterns in event sequences and offers suggested correlation
decisions that merge incidents to reduce alert noise further.
● Splunk SignalFX provides high cardinality exploration of traces across
different regions, hosts, versions or users.
● Lightstep provides rapid root cause analysis using unlimited cardinality
and a high-fidelity dataset uncompromised by head or tail sampling,
31
Latest Innovations
● Datadog provides automated tagging and correlation of logs so can jump
from any log entry to related metrics.
● Honeycomb lets you break down on every dimension in your data both
the obvious fields, and the surprising ones.
● Grafana Loki datasource provides switching from metrics to logs with
preserved label filters.
● Elastic Observability bring your logs, metrics, and APM traces together at
scale in a single stack.
32
Thank you!

Contenu connexe

Tendances

Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...LibbySchulze
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioDevOpsDays Tel Aviv
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingAmuhinda Hungai
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry IntroDimitrisFinas1
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backendSebastian Poxhofer
 
Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryDevOps.com
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryEric D. Schabell
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesRed Hat Developers
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)Sebastian Poxhofer
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...HostedbyConfluent
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsMarco Pracucci
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Thomas Riley
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 

Tendances (20)

Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetry
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 

Similaire à OpenTelemetry For Operators

[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fastDenis Karpenko
 
Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Jay Bryant
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingJulien Pivotto
 
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018Jay Bryant
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Bob Cotton
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017Jay Bryant
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotPaul V. Novarese
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
 
20180503 kube con eu kubernetes metrics deep dive
20180503 kube con eu   kubernetes metrics deep dive20180503 kube con eu   kubernetes metrics deep dive
20180503 kube con eu kubernetes metrics deep diveBob Cotton
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingMarian Marinov
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsLuca Mazzaferro
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019🔧 Loïc BLOT
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka ReplicatorMichael Hongliang Xu
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Jay Bryant
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13Thibault Charbonnier
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...Linaro
 
LCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at LinaroLCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at LinaroLinaro
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga
 

Similaire à OpenTelemetry For Operators (20)

[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fast
 
Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-Pilot
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
20180503 kube con eu kubernetes metrics deep dive
20180503 kube con eu   kubernetes metrics deep dive20180503 kube con eu   kubernetes metrics deep dive
20180503 kube con eu kubernetes metrics deep dive
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & Logging
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 
LCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at LinaroLCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at Linaro
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheus
 

Dernier

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 

Dernier (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

OpenTelemetry For Operators

  • 1. OpenTelemetry For Operators Presented by Kevin Brockhoff Apache 2.0 Licensed
  • 2. Our Agenda ● Why are current observability platforms falling short? ● What OpenTelemetry features address these issues? ● How do I run OpenTelemetry components in production? ● Who are the innovators in the observability space?
  • 3. Level Setting ● Have you used ELK stack or other log aggregator? ● Have you used an APM system? ● Have you used distributed tracing before?
  • 4. Who am I? ● Kevin Brockhoff - Senior Consultant, Daugherty Business Solutions ○ Solving difficult cloud adoption challenges for Daugherty's Fortune 500 clients ○ OpenTelemetry committer since early stages of the project ○ Github: https://github.com/kbrockhoff ○ Linkedin: https://www.linkedin.com/in/kevi n-brockhoff-a557877/
  • 6. 6 Enterprise Applications ● Only instrumented with logging during initial development. ○ Logging oriented toward development, not operations ● Metrics and tracing only added later if at all as a separate project. ○ Each team creates their own system using familiar tools ○ Or enterprise commits to a specific APM vendor ● Logs, metrics and traces are never connected.
  • 7. 7 First Generation Observability Platforms Search logs in ELK, Lack context Homegrown tracing per app mainly accessible by developers Customer experience metrics Low-level metrics and alerts
  • 9. OpenCensus + OpenTracing = OpenTelemetry ● OpenTracing: ○ Provides APIs and instrumentation for distributed tracing ● OpenCensus: ○ Provides APIs and instrumentation that allow you to collect application metrics and distributed tracing. ● OpenTelemetry: ○ An effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries.
  • 10. 10 OpenTelemetry Project ● Specification ○ API (for application developers) ○ SDK Implementations ○ Transport Protocol (Protobuf) ● Collector (middleware) ● SDK’s (various stages of maturity) ○ C++ ○ C# (Auto-instrument/Manual) ○ Erlang ○ Go ○ JavaScript (Browser/Node) ○ Java (Auto-instrument/Manual) ■ Android compatibility ○ PHP ○ Python (Auto-instrument/Manual) ○ Ruby ○ Rust ○ Swift
  • 13. 13 OpenTelemetry Collector ● Offers a vendor-agnostic implementation on how to receive, process and export telemetry data. ● Removes the need to run, operate and maintain multiple agents/collectors. ● Support open-source telemetry data formats (e.g. OTLP, Jaeger, Prometheus, etc.) sending to multiple open-source or commercial back- ends.
  • 14. 14 Collector Concepts ● Telemetry data processing pipelines ○ Per pipeline: Receiver(s) -> Processors -> Exporter(s) ○ Currently only single telemetry type pipelines supported ● Extensions ○ Supporting functionality ○ Core collector extensions ■ health_check - HTTP endpoint for load balancer or k8s controller ■ zpages - Internal processing metrics and traces accessible via HTTP ■ pprof - Performance profiler enables the golang net/http/pprof endpoint
  • 15. Collector Bundled Receivers Traces ● Jaeger ○ Compact Thrift, Binary Thrift, HTTP, gRPC ○ Sampling strategy configuration server ● Kafka ○ OTLP, Jaeger, Zipkin data structures ● OpenCensus ● OTLP (OpenTelemetry Protocol) ○ gRPC, HTTP ● Zipkin ○ v1, v1 Thrift, v2, v2 Protobuf Metrics ● Host metrics scrapper ○ cpu, disk, load, filesystem, memory, network, processes, swap, process ● Kafka ○ OTLP ● OpenCensus ● OTLP (OpenTelemetry Protocol) ○ gRPC, HTTP ● Prometheus ○ Full discovery and polling capabilities Logs ● Fluent Forward ○ Spec compliant except no mTLS
  • 16. Collector Contrib Receivers Traces ● AWS X-Ray ● SignalFX APM v1 Metrics ● AWS ECS Container ● Carbon ● CollectD (JSON only) ● Docker Stats ● Kubernetes Cluster ● Kubernetes Kubelet ● Prometheus Exporters ● Redis INFO ● SignalFX ● Splunk HEC ● StatsD ● Wavefront Logs ● SignalFX (Events) ● Stanza
  • 17. Collector Bundled Processors ● Attributes ○ Modifies span attributes ● Batch ○ Groups data into batches ● Filter ○ Include/exclude metrics by name ● Group by Trace ○ Holds all spans for a trace for a set time and then sends to next processor ● Memory Limiter ○ Prevents out-of-memory issues by triggering GC ○ Configuration must be matched with ballast setting collector is launched with ● Queued Retry ○ Deprecated, each exporter now implements ● Resource ○ Applies changes to Resource attributes ● Probabilistic Sampling ○ Adjusts TraceID hash-based sampling decisions by sampling.priority attribute value ● Tail Sampling ○ Sampling decisions based on configured attribute values and rate limits ● Span ○ Modifies span name or attributes based on span name
  • 18. 18 Recommended Processor Configuration Traces memory_limiter any sampling processors batch any other processors Metrics memory_limiter any filtering processors batch any other processors Memory limiter ballast_size_mib must match --mem-ballast-size-mib command line parameter. Trigger GC with either limit_mib / spike_limit_mib or limit_percentage / spike_limit_percentage.
  • 19. Collector Contrib Processors ● Kubernetes ○ Adds metadata from pod ● Metrics Transform ○ Renames/aggregations within individual metrics ● Resource Detection ○ OTEL_RESOURCE environment variable ○ GCE metadata server ○ EC2 instance metadata server ● Routing ○ Route to particular exporter based on incoming header value TODO ● Span data sharding by TraceID
  • 20. Collector Bundled Exporters Traces ● File ○ JSON format ● Jaeger ○ v2 gRPC ● Kafka ○ OTLP, Jaeger, Zipkin ● Logging ○ Debugging ● OpenCensus ● OTLP (OpenTelemetry Protocol) ● Zipkin ○ v2 JSON or Protobuf Metrics ● File ○ JSON format ● Logging ○ Debugging ● OpenCensus ● OTLP (OpenTelemetry Protocol) ● Prometheus ○ Metrics endpoint for Prometheus to pull from ● Prometheus Remote Write ○ Pushes metrics in Prometheus TimeSeries format (Cortex, etc.)
  • 21. Collector Contrib Exporters Traces ● AlibabaCloud LogService ● AWS X-Ray ● Azure Monitor ● Datadog ● Elastic ● Honeycomb ● Jaeger v1 Thrift ● AWS Kinesis (Jaeger proto) ● New Relic ● SignalFX APM ● Sentry ● Stackdriver Metrics ● AlibabaCloud LogService ● AWS CloudWatch EMF ● Carbon ● Datadog ● Elastic ● New Relic ● SignalFX ● Splunk HEC ● Stackdriver
  • 22. Vendor Hosted Exporters Traces ● Dynatrace OneAgent ● Lightstep Launchers Metrics ● Dynatrace OneAgent ● Lightstep Launchers
  • 23. receivers: otlp: protocols: grpc: max_recv_msg_size_mib: 32 max_concurrent_streams: 16 read_buffer_size: 1024 write_buffer_size: 1024 keepalive: server_parameters: max_connection_idle: 10s processors: memory_limiter: ballast_size_mib: 192 check_interval: 5s limit_mib: 448 spike_limit_mib: 64 batch: send_batch_size: 64 timeout: 15s exporters: jaeger: endpoint: jaeger.monitoring.svc.storefront-development.local.:14250 timeout: 10s sending_queue: enabled: true num_consumers: 2 queue_size: 10 retry_on_failure: enabled: true initial_interval: 10s max_interval: 60s max_elapsed_time: 10m prometheusremotewrite: namespace: "monitoring" sending_queue: enabled: true num_consumers: 2 queue_size: 10 retry_on_failure: enabled: true initial_interval: 10s max_interval: 60s max_elapsed_time: 10m endpoint: ":8888" ca_file: "/etc/pki/tls/certs/carbon-lb.pem" write_buffer_size: 524288 headers: Prometheus-Remote-Write-Version: "0.1.0" X-Scope-OrgID: 234 extensions: health_check: port: 13133 zpages: endpoint: :55679 service: extensions: [zpages, health_check] pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [jaeger] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheusremotewrite] Full Configuration File Example
  • 24. Collector Command Line Example /usr/local/bin/otelcol --config=/usr/local/etc/otel-collector-config.yaml --mem-ballast-size-mib=192 --log-level=DEBUG
  • 25. 25 Collector Docker Images ● otel/opentelemetry-collector ○ Core receivers, processors, and exporters bundled in ● otel/opentelemetry-collector-contrib ○ All core and contrib receivers, processors, and exporters bundled in ● OpenTelemetry Collector builder ○ https://github.com/observatorium/opentelemetry-collector-builder
  • 26. 26 Other Collector Installs ● RPM ○ Produced by opentelemetry-collector build ● Debian ○ Produced by opentelemetry-collector build
  • 27. 27 Observing the Collector ● health_check ○ http://<hostname>:13133/ returns basic pipeline availability ● zpages ○ RPC metric aggregations at http://<hostname>:55679/debug/rpcz ○ Trace summaries at http://<hostname>:55679/debug/tracez ● prometheus ○ Pipeline metrics scrap endpoint at http://<hostname>:8888/metrics
  • 28. 28 Current Gotchas ● Errors propagated back through pipelines and instances in the chain ○ Errors reported by SDK exporters in the applications may be coming from two hops downstream ● TraceID sharding not working correctly ○ Can only do tail-based sampling if running single instance of collector
  • 30. 30 Latest Innovations ● Dynatrace automates manual quality validation processes using AI- assisted SLI/SLO-based quality gates. ● New Relic Incident Intelligence continuously analyzes alerts and incident data to find patterns in event sequences and offers suggested correlation decisions that merge incidents to reduce alert noise further. ● Splunk SignalFX provides high cardinality exploration of traces across different regions, hosts, versions or users. ● Lightstep provides rapid root cause analysis using unlimited cardinality and a high-fidelity dataset uncompromised by head or tail sampling,
  • 31. 31 Latest Innovations ● Datadog provides automated tagging and correlation of logs so can jump from any log entry to related metrics. ● Honeycomb lets you break down on every dimension in your data both the obvious fields, and the surprising ones. ● Grafana Loki datasource provides switching from metrics to logs with preserved label filters. ● Elastic Observability bring your logs, metrics, and APM traces together at scale in a single stack.

Notes de l'éditeur

  1. Copyright 2020, The OpenTelemetry Authors Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.