SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Monitoring Kubernetes with
Prometheus
Tom Wilkie, July 2017
❤
Tom Wilkie VP Product, Grafana Labs
Previously: Kausal, Weaveworks, Google, Acunu, Xensource
Twitter: @tom_wilkie Email: tom@grafana.com
Prometheus
Kubernetes
Monitoring & Alerting
Getting Started
Prometheus
● A monitoring & alerting system.

● Inspired by Google’s BorgMon

● Originally built by SoundCloud in 2012

● Open Source, now part of the CNCF

● Simple text-based metrics format

● Multidimensional datamodel

● Rich, concise query language
Prometheus’ data model is very simple:

<identifier> → [ (t0, v0), (t1, v1), ... ]
Timestamps are millisecond int64, values are float64
https://www.slideshare.net/Docker/monitoring-the-prometheus-way-julius-voltz-prometheus
Prometheus identifiers

http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“200”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“200”}
http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”}
Prometheus series selector

http_requests_total{job=“nginx”, status=~“5..”}
Building queries usually starts with a selector

PromQL: http_requests_total{job=“nginx”, status=~“5..”}
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 34
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 56
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 76
{job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“502”} 96
...
Can select vectors of values…

PromQL: http_requests_total{job=“nginx”, status=~“502”}[1m]
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} [30, 31, 32, 34]
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} [4, 24, 56, 56]
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} [76, 76, 76, 76]
{job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“500”} [56, 106, 5, 96]
...
And apply functions…

PromQL: rate(http_requests_total{job=“nginx”, status=~“502”}[1m])
{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 0.0666
{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} 0.866
{job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 0.0
{job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“500”} 2.43
...
And aggregate by a dimension…

PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m]))
{path=“/home”} 0.0666
{path=“/settings”} 3.3
...
Do binary operations…

PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m]))
/
sum by (path) (rate(http_requests_total{job=“nginx”}[1m]))
{path=“/home”} 0.001
{path=“/settings”} 1.0
...
Kubernetes
● Platform for managing containerized
workloads and services

● “operating system for you datacenter”

● Inspired by Google’s Borg

● Also part of the CNCF

● Distributed, fault tolerant architecture

● Rich object model for you applications
https://thenewstack.io/myth-cloud-native-portability/
kube-state-metrics
cAdvisor
Monitoring & Alerting
What should I monitor?
USE Method
● Utilisation, Saturation, Errors…

RED Method
● Requests, Errors, Duration…

??? Method
● Expected system state…
USE Method
• cluster and node
level metrics 

• node_exporter run
as a daemonset
CPU Utilisation:
1 - avg(rate(node_cpu{mode=“idle"}[1m]))
CPU Saturation:
sum(node_load1)/ sum(node:node_num_cpu:sum)
USE Method
USE Method
● Can also look at container
level metrics from
cAdvisor…

● …and combine them with
metadata from kube-
state-metrics.
Container CPU usage by “app” label
sum by (namespace, label_name) (
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod
* on (pod_name) group_left(label_name)
label_join(kube_pod_labels, "pod_name", ",", "pod")
)
USE Method
RED Method
● Metrics exposed by
components for RED-
style monitoring
RED Method
Most useful alert I’ve found:

100 * sum by(instance, job) (
rate(rest_client_requests_total{code!~”2..”}[5m])
)
/
sum by(instance, job) (
rate(rest_client_requests_total[5m])
)
??? Method
Alert expressions are invariants that describe a healthy system
kube_deployment_spec_replicas !=
kube_deployment_status_replicas_available
rate(kube_pod_container_status_restarts_total[15m]) > 0
??? Method
Alert expressions are invariants that describe a healthy system
(kube_pod_status_phase{phase!~”Running|Succeeded”}) > 0
sum(kube_pod_container_resource_requests_cpu_cores)
/ sum(node:node_num_cpu:sum)
>
(count(node:node_num_cpu:sum) - 1)
/ count(node:node_num_cpu:sum)
Getting Started
Getting setup
● github.com/coreos/prometheus-operator - Job to look after running
Prometheus on Kubernetes

● github.com/coreos/kube-prometheus - Set of configs for running all there
other things you need.

● github.com/kausalco/public/tree/master/prometheus-ksonnet - My configs
for running Prometheus, Alertmanager, Grafana etc

● github.com/kubernetes-monitoring/kubernetes-mixin - Joint project to unify
and improve common alerts for Kubernetes.
More reading…
https://landing.google.com/sre/book.html
https://www.youtube.com/watch?v=1oJXMdVi0mM
http://www.brendangregg.com/usemethod.html
Questions?

Contenu connexe

Tendances

Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
Tim Vaillancourt
 

Tendances (20)

Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
 
Terraform
TerraformTerraform
Terraform
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
Terraform introduction
Terraform introductionTerraform introduction
Terraform introduction
 
Prometheus Storage
Prometheus StoragePrometheus Storage
Prometheus Storage
 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
 

Similaire à Monitoring Kubernetes with Prometheus

Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
QAware GmbH
 

Similaire à Monitoring Kubernetes with Prometheus (20)

[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
 
Monitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusMonitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheus
 
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Prometheus kubernetes tech talk
Prometheus kubernetes tech talkPrometheus kubernetes tech talk
Prometheus kubernetes tech talk
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !Get you Java application ready for Kubernetes !
Get you Java application ready for Kubernetes !
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
 
Devoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and Bolts
 
How Ansible Tower and Prometheus can help automate continuous deployments
How Ansible Tower and Prometheus can help automate continuous deployments How Ansible Tower and Prometheus can help automate continuous deployments
How Ansible Tower and Prometheus can help automate continuous deployments
 
Monitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with PrometheusMonitoring Weave Cloud with Prometheus
Monitoring Weave Cloud with Prometheus
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Monitoring Kubernetes with Prometheus

  • 2. Tom Wilkie VP Product, Grafana Labs Previously: Kausal, Weaveworks, Google, Acunu, Xensource Twitter: @tom_wilkie Email: tom@grafana.com
  • 4. Prometheus ● A monitoring & alerting system. ● Inspired by Google’s BorgMon ● Originally built by SoundCloud in 2012 ● Open Source, now part of the CNCF ● Simple text-based metrics format ● Multidimensional datamodel ● Rich, concise query language
  • 5.
  • 6.
  • 7. Prometheus’ data model is very simple: <identifier> → [ (t0, v0), (t1, v1), ... ] Timestamps are millisecond int64, values are float64 https://www.slideshare.net/Docker/monitoring-the-prometheus-way-julius-voltz-prometheus
  • 8. Prometheus identifiers http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“200”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“200”} http_requests_total{job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} Prometheus series selector http_requests_total{job=“nginx”, status=~“5..”}
  • 9. Building queries usually starts with a selector PromQL: http_requests_total{job=“nginx”, status=~“5..”} {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 34 {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“502”} 56 {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 76 {job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“502”} 96 ...
  • 10. Can select vectors of values… PromQL: http_requests_total{job=“nginx”, status=~“502”}[1m] {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} [30, 31, 32, 34] {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} [4, 24, 56, 56] {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} [76, 76, 76, 76] {job=“nginx”, instances=“2.3.4.5:80”, path=“/setting”, status=“500”} [56, 106, 5, 96] ...
  • 11. And apply functions… PromQL: rate(http_requests_total{job=“nginx”, status=~“502”}[1m]) {job=“nginx”, instances=“1.2.3.4:80”, path=“/home”, status=“500”} 0.0666 {job=“nginx”, instances=“1.2.3.4:80”, path=“/settings”, status=“500”} 0.866 {job=“nginx”, instances=“2.3.4.5:80”, path=“/home”, status=“500”} 0.0 {job=“nginx”, instances=“2.3.4.5:80”, path=“/settings”, status=“500”} 2.43 ...
  • 12. And aggregate by a dimension… PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m])) {path=“/home”} 0.0666 {path=“/settings”} 3.3 ...
  • 13. Do binary operations… PromQL: sum by (path) (rate(http_requests_total{job=“nginx”, status=~“502”}[1m])) / sum by (path) (rate(http_requests_total{job=“nginx”}[1m])) {path=“/home”} 0.001 {path=“/settings”} 1.0 ...
  • 14. Kubernetes ● Platform for managing containerized workloads and services ● “operating system for you datacenter” ● Inspired by Google’s Borg ● Also part of the CNCF ● Distributed, fault tolerant architecture ● Rich object model for you applications
  • 15.
  • 20. What should I monitor? USE Method ● Utilisation, Saturation, Errors… RED Method ● Requests, Errors, Duration… ??? Method ● Expected system state…
  • 21. USE Method • cluster and node level metrics • node_exporter run as a daemonset
  • 22. CPU Utilisation: 1 - avg(rate(node_cpu{mode=“idle"}[1m])) CPU Saturation: sum(node_load1)/ sum(node:node_num_cpu:sum) USE Method
  • 23. USE Method ● Can also look at container level metrics from cAdvisor… ● …and combine them with metadata from kube- state-metrics.
  • 24. Container CPU usage by “app” label sum by (namespace, label_name) ( sum(rate(container_cpu_usage_seconds_total[5m])) by (pod * on (pod_name) group_left(label_name) label_join(kube_pod_labels, "pod_name", ",", "pod") ) USE Method
  • 25. RED Method ● Metrics exposed by components for RED- style monitoring
  • 26. RED Method Most useful alert I’ve found: 100 * sum by(instance, job) ( rate(rest_client_requests_total{code!~”2..”}[5m]) ) / sum by(instance, job) ( rate(rest_client_requests_total[5m]) )
  • 27. ??? Method Alert expressions are invariants that describe a healthy system kube_deployment_spec_replicas != kube_deployment_status_replicas_available rate(kube_pod_container_status_restarts_total[15m]) > 0
  • 28. ??? Method Alert expressions are invariants that describe a healthy system (kube_pod_status_phase{phase!~”Running|Succeeded”}) > 0 sum(kube_pod_container_resource_requests_cpu_cores) / sum(node:node_num_cpu:sum) > (count(node:node_num_cpu:sum) - 1) / count(node:node_num_cpu:sum)
  • 30. Getting setup ● github.com/coreos/prometheus-operator - Job to look after running Prometheus on Kubernetes ● github.com/coreos/kube-prometheus - Set of configs for running all there other things you need. ● github.com/kausalco/public/tree/master/prometheus-ksonnet - My configs for running Prometheus, Alertmanager, Grafana etc ● github.com/kubernetes-monitoring/kubernetes-mixin - Joint project to unify and improve common alerts for Kubernetes.