Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Exploring the power of OpenTelemetry on Kubernetes

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
OpenTelemetry For Operators
OpenTelemetry For Operators
Chargement dans…3
×

Consultez-les par la suite

1 sur 47 Publicité

Exploring the power of OpenTelemetry on Kubernetes

Télécharger pour lire hors ligne

Modern cloud-native applications are incredibly complex systems. Keeping the systems healthy and meeting SLAs for our customers is crucial for long-term success. In this session, we will dive into the three pillars of observability - metrics, logs, tracing - the foundation of successful troubleshooting in distributed systems. You'll learn the gotchas and pitfalls of rolling out the OpenTelemetry stack on Kubernetes to effectively collect all your signals without worrying about a vendor lock in. Additionally we will replace parts of the Prometheus stack to scrape metrics with OpenTelemetry collector and operator.

Modern cloud-native applications are incredibly complex systems. Keeping the systems healthy and meeting SLAs for our customers is crucial for long-term success. In this session, we will dive into the three pillars of observability - metrics, logs, tracing - the foundation of successful troubleshooting in distributed systems. You'll learn the gotchas and pitfalls of rolling out the OpenTelemetry stack on Kubernetes to effectively collect all your signals without worrying about a vendor lock in. Additionally we will replace parts of the Prometheus stack to scrape metrics with OpenTelemetry collector and operator.

Publicité
Publicité

Plus De Contenu Connexe

Similaire à Exploring the power of OpenTelemetry on Kubernetes (20)

Plus par Red Hat Developers (20)

Publicité

Plus récents (20)

Exploring the power of OpenTelemetry on Kubernetes

  1. 1. 1 Exploring the power of OpenTelemetry on Kubernetes Pavol Loffay Principal Software Engineer Benedikt Bongartz Senior Software Engineer
  2. 2. 2 ● Pavol Loffay ● Principal Software Engineer ● OpenTelemetry, Jaeger maintainer ● Follow me @ploffay Bio
  3. 3. 3 ● Benedikt Bongartz ● Senior Software Engineer ● Working on OpenTelemetry ● Contact ○ @frzifus@fosstodon.org ○ frzifus:matrix.org Bio
  4. 4. 4 ● History observability ● Intro to OpenTelemetry (OTEL) ○ Instrumentation ○ Collector ● OpenTelemetry Kubernetes operator ○ Instrumentation CR ○ Collector CR ● Data collection use-cases - Traces, Metrics, Logs ● What is next? Agenda
  5. 5. A peek into history of OSS observability technology advancement and less vendor locking
  6. 6. 6 OSS distributed tracing history X-Trace 2007 Dapper paper 2010 Zipkin 2012 OpenTracing 2015 Jaeger 2017 OpenTelemetry DataDog 2019 New Relic 2020 OpenCensus 2016 Hypetrace Grafana Tempo 2020 SigNoz 2021 Apache SkyWalking 2015 Data model bespoke APIs SDK Collector auto-instrumentation Spec API Instrumentation libraries some instrumentation reusable instrumentation libraries Agents/ auto-instrumentation
  7. 7. OpenTelemetry a.k.a. OTEL Open-source data collection project
  8. 8. 8 ● Open source! ● Cloud Native Computing Foundation (CNCF) ● Vendor neutral telemetry data collection ● Specification, API, SDK, data model - OTLP, auto-instrumentation, collector ● Helm chart, Kubernetes operator
  9. 9. 9 OpenTelemetry is only data collection It is not platform, storage nor query API
  10. 10. OpenTelemetry instrumentation How telemetry data is created
  11. 11. 11 frontend logger.log() meter.record() tracer.span().start() driver logger.log() meter.record() tracer.span().start() collector customer logger.log() meter.record() tracer.span().start() HTTP Platform (Prometheus/Jaeger/Loki) Data collection Platform
  12. 12. ● Manual / Explicit ○ Better control ○ Might have better performance ○ Requires code change, recompilation, redeploy ● Direct integration in runtimes - e.g. Quarkus, Wildfly ● Auto-instrumentation / agent ○ Easy to roll out ○ Wide framework coverage 12 Instrumentation java -javaagent:/opt/javaagent.jar -jar target/demo-0.0.1-SNAPSHOT.jar tracing.enabled = true io.opentelemetry:opentelemetry-api:1.22.0 io.opentelemetry.instrumentation:opentelemetry-grpc-1.6:1.19.1
  13. 13. ● Java ● C# ● NodeJS ● Python ● PHP ● Golang 13 OpenTelemetry auto-instrumentation java -javaagent:/opt/javaagent.jar -jar target/demo-0.0.1-SNAPSHOT.jar JAVA_TOOL_OPTIONS = -javaagent:/otel-auto-instrumentation/javaagent.jar CORECLR_ENABLE_PROFILING, CORECLR_PROFILER_PATH, DOTNET_ADDITIONAL_DEPS, DOTNET_SHARED_STORE NODE_OPTIONS = --require /otel-auto-instrumentation/autoinstrumentation.js PYTHONPATH = /otel-auto-instrumentation/opentelemetry/instrumentation/auto_instrumentation//otel-auto-instrumentation
  14. 14. OpenTelemetry collector How telemetry data is collected and exported to a platform
  15. 15. ● Written in Golang ● Docs - opentelemetry.io/docs/collector/ ● Two distributions ○ Core - opentelemetry-collector-releases/distributions/otelcol ○ Contrib - opentelemetry-collector-releases/distributions/otelcol-contrib ○ Build your own ● Many components open-telemetry/opentelemetry-collector-contrib ● Distributed as: binary, container image, APK/DEB/RPM 15 OpenTelemetry collector
  16. 16. 16 Instrumentation and collector app/workload/pod (auto) Instrumentation API/SDK RPC framework (e.g. servlet) Jaeger OpenTelemetry Zipkin OpenCensus Batching PII filter Re-labeling Custom Splunk Kafka Jaeger Batching Data drop Collector Receivers Processors Exporters
  17. 17. OpenTelemetry Kubernetes Operator Manages OpenTelemetry collector and Instrumentation
  18. 18. ● https://github.com/open-telemetry/opentelemetry-operator ● Kubernetes operator Custom Resource Definitions (CRDs): ○ opentelemetrycollectors.opentelemetry.io / otelcol ○ instrumentation.opentelemetry.io / otelinst ● The operator can be installed via OLM, Helm chart or K8s manifests 18 OpenTelemetry operator
  19. 19. Collector CRD
  20. 20. ● Deployment modes: .spec.mode ○ sidecar, deployment, daemonset, statefulset ● Auto scaling or kubectl scale --replicas=5 otelcols/simplest ● Expose outside of the cluster .spec.ingress ● Use custom collector image .spec.image 20 OpenTelemetry Operator - Collector Kind
  21. 21. 21 OTEL-Collector configuration apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel spec: mode: deployment config: | receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:14250 exporters: logging: loglevel: debug service: extensions: [headers_setter, ...] pipelines: traces: processors: [batch,...] receivers: [otlp] exporters: [logging] # Data src: traces, metrics, logs processors: # <optional> batch: send_batch_size: 10000 timeout: 10s resourcedetection/openshift: detectors: openshift timeout: 2s override: false extensions: # <optional> headers_setter: - key: X-Scope-OrgID from_context: tenant_id - key: User-ID value: user_id sidecar.opentelemetry.io/inject: "true"
  22. 22. 22 OpenTelemetry Operator - Collector Kind kubectl apply -f - <<EOF apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel spec: config: | receivers: otlp: protocols: grpc: http: jaeger: protocols: grpc: thrift_compact: processors: exporters: jaeger: endpoint: jaeger-collector.headless:14250 service: pipelines: traces: receivers: [otlp, jaeger] processors: [] exporters: [jaeger] EOF sidecar.opentelemetry.io/inject: "true"
  23. 23. 23 OpenTelemetry Operator - Collector Kind - Processor processors: k8sattributes: # Add k8s attributes attributes: # Delete db.table attribute actions: - key: db.table action: delete resource: # Add k8s cluster name attributes: - key: k8s.cluster.name from_attribute: k8s-cluster action: insert Other use cases: ● Tail base sampling ● Derive RED metrics from traces
  24. 24. Instrumentation CRD
  25. 25. 25 OpenTelemetry Operator - Instrumentation Kind apiVersion: opentelemetry.io/v1alpha1 kind: Instrumentation metadata: name: my-instrumentation spec: exporter: endpoint: http://otel-collector:4317 propagators: - tracecontext - baggage - b3 sampler: type: parentbased_traceidratio argument: "0.25" resource: attributes: k8s.cluster.name: test-1-23 instrumentation.opentelemetry.io/inject-java: "true" instrumentation.opentelemetry.io/inject-python: "true" instrumentation.opentelemetry.io/inject-nodejs: "true" instrumentation.opentelemetry.io/inject-dotnet: "true" instrumentation.opentelemetry.io/inject-sdk: "true" instrumentation.opentelemetry.io/container-names: "app"
  26. 26. Collecting traces on Kubernetes
  27. 27. 27 OpenTelemetry Architecture - Tracing Pod Container 1 OTEL-Client OpenTelemetry API Instrumentation Application push Platform (Jaeger/Loki/Splunk…) Platform instrumentation.opentelemetry.io/inject-java: "true" apiVersion: opentelemetry.io/v1alpha1 kind: Instrumentation metadata: name: my–java-instrumentation spec: exporter: . . .
  28. 28. 28
  29. 29. 29
  30. 30. 30
  31. 31. 31
  32. 32. Collecting metrics on Kubernetes
  33. 33. 33 Collector Platform (Prometheus/Thanos/Splunk…) Platform Pod 2 Node 1 OTEL-Collector configuration Pod 1 Pull Push
  34. 34. 34 OTEL-Collector configuration apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel spec: targetAllocator: enabled: true allocationStrategy: least-weighted replicas: 3 serviceAccount: ta prometheusCR: enabled: true serviceMonitorSelector: team: backend-1 config: | receivers: prometheus: config: scrape_configs: - job_name: ‘otel-collector’ scrape_interval: 10s static_configs: ...
  35. 35. 35 OTEL-Collector configuration apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel spec: mode: statefulset targetAllocator: enabled: true allocationStrategy: least-weighted replicas: 3 serviceAccount: ta prometheusCR: {} config: | receivers: prometheus: config: scrape_configs: - job_name: ‘otel-collector’ scrape_interval: 10s static_configs: - targets: [`0.0.0.0:8888`] target_allocator: endpoint: http://my-ta-svc interval: 30s collector_id: collector-1
  36. 36. 36 OTEL-Collector configuration … receivers: prometheus: config: global: scrape_interval: 1m scrape_timeout: 10s evaluation_interval: 1m scrape_configs: - job_name: otel-collector honor_timestamps: true scrape_interval: 10s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true http_sd_configs: - follow_redirects: false url: http://metrics-ta-svc:80/jobs/otel-collector/targets?collector_id=$POD_NAME spec.targetAllocator.enabeld: "true"
  37. 37. 37 OTEL-Collector configuration Be aware of: ● Prometheus does not support UTF-8 labels [prometheus#11700] [grafana#42615] Therefore OTEL-Prometheus-Remote-Write converts metrics to Prom Naming convention. [otel#normalization] ● Other useful metric receiver [host-metrics]
  38. 38. Collecting logs on Kubernetes
  39. 39. 39 ● For metrics and traces OpenTelemetry takes the approach of a clean-sheet design, specifies a new API and provides full implementations of this API in multiple languages. ● Our approach with logs is somewhat different. For OpenTelemetry to be successful in logging space we need to support existing legacy of logs and logging libraries, while offering improvements and better integration with the rest of observability world where possible. OpenTelemetry logs
  40. 40. 40 OpenTelemetry Operator - collecting logs ● File log receiver ○ Available in the “contrib” docker image ○ Example config - opentelemetry-collector-contrib/otel-collector-config.yml ○ Includes: /var/log/pods/*/*/*.log ○ Set of operators to parse data: json_parser, regex_parser, move… ○ Collectors resource attributes: namespace, pod name/UID/restarts, container name ● Fluentforwardreceiver - receive data from
  41. 41. 41 DaemonSet collector Platform (Loki/OpenSearch/Splunk…) Platform Pod 1 Node 1 Pod 2 Pod 3 DaemonSet collector Pod 1 Node 2 Pod 2 Pod 3 DaemonSet collector Pod 1 Node 3 Pod 2 Pod 3 Kubernetes cluster
  42. 42. 42 OpenTelemetry Operator - collecting logs, CR apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel-logs spec: mode: daemonset image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0. 69.0 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers volumeMounts: - mountPath: /var/log name: varlog readOnly: true - mountPath: /var/lib/docker/containers name: varlibdockercontainers readOnly: true
  43. 43. 43 OpenTelemetry Operator - collecting logs, logging exporter 1 2023-01-19T16:28:48.675Z info ResourceLog #0 Resource SchemaURL: Resource attributes: -> k8s.namespace.name: Str(cert-manager) -> k8s.pod.name: Str(cert-manager-cainjector-857ff8f7cb-xmr6p) -> k8s.container.restart_count: Str(0) -> k8s.pod.uid: Str(53574bad-1d91-4880-ab87-8f5a88bfffee) -> k8s.container.name: Str(cert-manager) ScopeLogs #0 ScopeLogs SchemaURL: InstrumentationScope LogRecord #0 ObservedTimestamp: 2023-01-19 16:28:48.674124625 +0000 UTC Timestamp: 2023-01-18 12:58:21.446092114 +0000 UTC SeverityText: SeverityNumber: Unspecified(0) Body: Str({"log":"I0118 12:58:21.445823 1 controller.go:178] cert-manager/certificate/customresourcedefinition/controller/controller-for-certificate-customresourcede finition "msg"="Starting EventSource" "source"="u0026{{%!s(*v1.Certificate=u0026{{ } { 0 {{0 0 u 003cnilu003e}} u003cnilu003e u003cnilu003e map[] map[] [] [] []} {u003cnilu003e u003cnilu003e u003cnilu003e [] [] [] [] u003cnilu003e u003cnilu003e { } false [] u003cnilu003e u003cnilu003e u003cnilu003e []} {[] u003cnilu003e u003cnilu003e u003cnilu00 3e u003cnilu003e u003cnilu003e u003cnilu003e u003cnilu003e}}) %!s(*cache.informerCache=u0026{0xc000438380}) %!s(chan error=u003cnilu003e) %!s(func()=u003cnilu003e)}}"n","stream":"stderr","time":"2023-01-18T12:58:21.446092114Z"})
  44. 44. 44 OpenTelemetry Operator - collecting logs, logging exporter 2 Attributes: -> log.file.path: Str(/var/log/pods/cert-manager_cert-manager-cainjector-857ff8f7cb-xmr6p_53574bad-1d91-4880-ab 87-8f5a88bfffee/cert-manager/0.log) -> log: Str(I0118 12:58:21.445823 1 controller.go:178] cert-manager/certificate/customresourcedefinition/controller/controller-for-certificate-custo mresourcedefinition "msg"="Starting EventSource" "source"="&{{%!s(*v1.Certificate=&{{ } { 0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} {<nil> <nil> <nil> [] [] [] [] <nil> <nil> { } false [] <nil> <nil> <nil> []} {[] <nil> <nil> <nil> <nil> <nil> <nil> <nil>}}) %!s(*cache.informerCache=&{0xc000438380}) %!s(chan error=<nil>) %!s(func()=<nil>)}}" ) -> time: Str(2023-01-18T12:58:21.446092114Z) -> log.iostream: Str(stderr) Trace ID: Span ID: Flags: 0
  45. 45. What is next?
  46. 46. ● OpenTelemetry Operator ○ Auto-instrumentation for Webservers ● Profiling vision see 0212-profiling-vision.md ● OpAMP: Open Agent Management Protocol see open-telemetry/opamp-spec 46 What is next in OpenTelemetry?
  47. 47. Thank you 47 Pavol Loffay @ploffay Benedikt Bongartz @frzifus

×