SlideShare a Scribd company logo
1 of 36
Download to read offline
Monitoring Kubernetes with
Prometheus
Tom Wilkie, Oct 2018
❤
Tom Wilkie VP Product, Grafana Labs
Previously: Kausal, Weaveworks, Google, Acunu, Xensource
Twitter: @tom_wilkie Email: tom@grafana.com
Prometheus
Kubernetes
Monitoring & Alerting
Getting Started
Prometheus
• A monitoring & alerting system.

• Inspired by Google’s BorgMon

• Originally built by SoundCloud in
2012

• Open Source, now part of the CNCF

• Simple text-based metrics format

• Multidimensional datamodel

• Rich, concise query language
Prometheus’ data model is very simple:
<identifier> → [ (t0, v0), (t1, v1), ... ]
Timestamps are millisecond int64, values are float64
https://www.slideshare.net/Docker/monitoring-the-prometheus-way-julius-voltz-prometheus
Prometheus identifiers
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“200”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“200”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”}
Prometheus series selector
http_requests_total{job=“nginx”, status=~“5..”}
Building queries usually starts with a selector
PromQL: http_requests_total{job=“nginx”, status=~“5..”}
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 34
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 56
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 76
{job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“502”} 96
...
Can select vectors of values…
PromQL: http_requests_total{job=“nginx”, status=~“5..”}[1m]
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} [30, 31, 32, 34]
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} [4, 24, 56, 56]
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} [76, 76, 76, 76]
{job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“502”} [56, 106, 5, 96]
...
And apply functions…
PromQL: rate(http_requests_total{job=“nginx”, status=~“5..”}[1m])
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 0.0666
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 0.866
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 0.0
{job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“502”} 2.43
...
And aggregate by a dimension…
PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“5..”}[1m]))
{path=“/home”} 0.0666
{path=“/settings”} 3.3
...
Do binary operations…
PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“5..”}[1m]))
/
sum by (path) (rate(http_requests_total{job=“nginx”}[1m]))
{path=“/home”} 0.001
{path=“/settings”} 1.0
...
Kubernetes
• Platform for managing containerized
workloads and services

• “operating system for you datacenter”

• Inspired by Google’s Borg

• Also part of the CNCF

• Distributed, fault tolerant architecture

• Rich object model for you
applications
https://thenewstack.io/myth-cloud-native-portability/
kube-state-metrics
cAdvisor
Monitoring & Alerting
What should I monitor?
USE Method
● Utilisation, Saturation, Errors…

RED Method
● Requests, Errors, Duration…

??? Method
● Expected system state…
USE Method
• cluster and node level
metrics 

• node_exporter run as a
daemonset
USE Method
CPU Utilisation:
1 - avg(rate(node_cpu{mode=“idle"}[1m]))
CPU Saturation:
sum(node_load1)/ sum(node:node_num_cpu:sum)
USE Method
• Can also look at
container level metrics
from cAdvisor…

• …and combine them
with metadata from
kube-state-metrics.
USE Method
Container CPU usage by “app” label
sum by (namespace, label_name) (
sum by (pod_name, namespace (
rate(container_cpu_usage_seconds_total[5m])
)
* on (pod_name) group_left(label_name)
label_join(kube_pod_labels, "pod_name", ",", "pod")
)
RED Method
• Metrics exposed by
components for RED-
style monitoring
RED Method
Most useful alert I’ve found:

100 * sum by(instance, job) (
rate(rest_client_requests_total{code!~”2..”}[5m])
)
/
sum by(instance, job) (
rate(rest_client_requests_total[5m])
)
??? Method
Alert expressions are invariants that describe a healthy
system
kube_deployment_spec_replicas !=
kube_deployment_status_replicas_available
rate(kube_pod_container_status_restarts_total
[15m]) > 0
??? Method
Alert expressions are invariants that describe a healthy system
(kube_pod_status_phase{phase!~”Running|Succeeded”}) > 0
sum(kube_pod_container_resource_requests_cpu_cores)
/ sum(node:node_num_cpu:sum)
>
(count(node:node_num_cpu:sum) - 1)
/ count(node:node_num_cpu:sum)
Cortex
• Horizontally scalable, HA
Prometheus

• Now part of the CNCF Sandbox

• Distributed, fault tolerant architecture

• Long term storage

• Multitenant

https://github.com/cortexproject/cortex
Getting Started
Getting setup
• github.com/coreos/prometheus-operator - Job to look after running
Prometheus on Kubernetes

• github.com/coreos/kube-prometheus - Set of configs for running all there
other things you need.

• github.com/grafana/jsonnet-libs/tree/master/prometheus-ksonnet - My
configs for running Prometheus, Alertmanager, Grafana etc

• github.com/kubernetes-monitoring/kubernetes-mixin - Joint project to
unify and improve common alerts for Kubernetes.
More reading…
https://landing.google.com/sre/book.html
https://www.youtube.com/watch?v=1oJXMdVi0mM
http://www.brendangregg.com/usemethod.html
Questions?

More Related Content

What's hot

Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basicsJuraj Hantak
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheusKasper Nissen
 
Elastic Observability
Elastic Observability Elastic Observability
Elastic Observability FaithWestdorp
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For DevelopersKevin Brockhoff
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17Ryan Jarvinen
 
An introduction to terraform
An introduction to terraformAn introduction to terraform
An introduction to terraformJulien Pivotto
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?Wojciech Barczyński
 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewBob Killen
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetryVincent Behar
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Edureka!
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioDevOpsDays Tel Aviv
 

What's hot (20)

Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Elastic Observability
Elastic Observability Elastic Observability
Elastic Observability
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17Hands-On Introduction to Kubernetes at LISA17
Hands-On Introduction to Kubernetes at LISA17
 
Terraform
TerraformTerraform
Terraform
 
Terraform
TerraformTerraform
Terraform
 
An introduction to terraform
An introduction to terraformAn introduction to terraform
An introduction to terraform
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive Overview
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
What Is Helm
 What Is Helm What Is Helm
What Is Helm
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
Hashicorp Vault ppt
Hashicorp Vault pptHashicorp Vault ppt
Hashicorp Vault ppt
 

Similar to Monitoring Kubernetes with Prometheus

What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with PrometheusOpenStack Korea Community
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusGrafana Labs
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Anthony Dahanne
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014Amazon Web Services
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
Devoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsPatrick Chanezon
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusFabian Reinartz
 
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESTHE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESInfluxData
 
Oscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to ProductionOscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to ProductionPatrick Chanezon
 
Monitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusMonitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusWeaveworks
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleDatadog
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesCodemotion Tel Aviv
 

Similar to Monitoring Kubernetes with Prometheus (20)

What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
Prometheus course
Prometheus coursePrometheus course
Prometheus course
 
Devoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and Bolts
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
 
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICESTHE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
THE RED METHOD: HOW TO INSTRUMENT YOUR SERVICES
 
Oscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to ProductionOscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to Production
 
Monitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusMonitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with Prometheus
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 

Recently uploaded

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Monitoring Kubernetes with Prometheus

  • 2. Tom Wilkie VP Product, Grafana Labs Previously: Kausal, Weaveworks, Google, Acunu, Xensource Twitter: @tom_wilkie Email: tom@grafana.com
  • 4. Prometheus • A monitoring & alerting system. • Inspired by Google’s BorgMon • Originally built by SoundCloud in 2012 • Open Source, now part of the CNCF • Simple text-based metrics format • Multidimensional datamodel • Rich, concise query language
  • 5.
  • 6.
  • 7. Prometheus’ data model is very simple: <identifier> → [ (t0, v0), (t1, v1), ... ] Timestamps are millisecond int64, values are float64 https://www.slideshare.net/Docker/monitoring-the-prometheus-way-julius-voltz-prometheus
  • 8. Prometheus identifiers http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“200”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“200”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} Prometheus series selector http_requests_total{job=“nginx”, status=~“5..”}
  • 9. Building queries usually starts with a selector PromQL: http_requests_total{job=“nginx”, status=~“5..”} {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 34 {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 56 {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 76 {job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“502”} 96 ...
  • 10. Can select vectors of values… PromQL: http_requests_total{job=“nginx”, status=~“5..”}[1m] {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} [30, 31, 32, 34] {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} [4, 24, 56, 56] {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} [76, 76, 76, 76] {job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“502”} [56, 106, 5, 96] ...
  • 11. And apply functions… PromQL: rate(http_requests_total{job=“nginx”, status=~“5..”}[1m]) {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 0.0666 {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 0.866 {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 0.0 {job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“502”} 2.43 ...
  • 12. And aggregate by a dimension… PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“5..”}[1m])) {path=“/home”} 0.0666 {path=“/settings”} 3.3 ...
  • 13. Do binary operations… PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“5..”}[1m])) / sum by (path) (rate(http_requests_total{job=“nginx”}[1m])) {path=“/home”} 0.001 {path=“/settings”} 1.0 ...
  • 14. Kubernetes • Platform for managing containerized workloads and services • “operating system for you datacenter” • Inspired by Google’s Borg • Also part of the CNCF • Distributed, fault tolerant architecture • Rich object model for you applications
  • 15.
  • 20. What should I monitor? USE Method ● Utilisation, Saturation, Errors… RED Method ● Requests, Errors, Duration… ??? Method ● Expected system state…
  • 21. USE Method • cluster and node level metrics • node_exporter run as a daemonset
  • 22. USE Method CPU Utilisation: 1 - avg(rate(node_cpu{mode=“idle"}[1m])) CPU Saturation: sum(node_load1)/ sum(node:node_num_cpu:sum)
  • 23. USE Method • Can also look at container level metrics from cAdvisor… • …and combine them with metadata from kube-state-metrics.
  • 24. USE Method Container CPU usage by “app” label sum by (namespace, label_name) ( sum by (pod_name, namespace ( rate(container_cpu_usage_seconds_total[5m]) ) * on (pod_name) group_left(label_name) label_join(kube_pod_labels, "pod_name", ",", "pod") )
  • 25. RED Method • Metrics exposed by components for RED- style monitoring
  • 26. RED Method Most useful alert I’ve found: 100 * sum by(instance, job) ( rate(rest_client_requests_total{code!~”2..”}[5m]) ) / sum by(instance, job) ( rate(rest_client_requests_total[5m]) )
  • 27. ??? Method Alert expressions are invariants that describe a healthy system kube_deployment_spec_replicas != kube_deployment_status_replicas_available rate(kube_pod_container_status_restarts_total [15m]) > 0
  • 28. ??? Method Alert expressions are invariants that describe a healthy system (kube_pod_status_phase{phase!~”Running|Succeeded”}) > 0 sum(kube_pod_container_resource_requests_cpu_cores) / sum(node:node_num_cpu:sum) > (count(node:node_num_cpu:sum) - 1) / count(node:node_num_cpu:sum)
  • 29. Cortex • Horizontally scalable, HA Prometheus • Now part of the CNCF Sandbox • Distributed, fault tolerant architecture • Long term storage • Multitenant https://github.com/cortexproject/cortex
  • 31. Getting setup • github.com/coreos/prometheus-operator - Job to look after running Prometheus on Kubernetes • github.com/coreos/kube-prometheus - Set of configs for running all there other things you need. • github.com/grafana/jsonnet-libs/tree/master/prometheus-ksonnet - My configs for running Prometheus, Alertmanager, Grafana etc • github.com/kubernetes-monitoring/kubernetes-mixin - Joint project to unify and improve common alerts for Kubernetes.