SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
The hitchhiker’s guide to
Remco Overdijk
1
"A Metric, The Hitchhiker's Guide to Prometheus says, is
about the most massively useful thing someone doing
Monitoring can have. It has great practical value. You can
wave your Metric in emergencies as a distress signal, and
produce pretty Graphs at the same time."
1. The Landscape
What are we running and why?
2. Core Concepts
How does Prometheus work?
3. Demo Time!
It’s a Tools in Action talk after all, right?
4. Tips & Tricks
Getting the most out of your Prometheus Experience
5. Questions?
I’m probably going to answer “42” to most of them..
So many things to tell, so little time..
2
The Hitchhiker’s Guide to Prometheus
• Started out in TES, doing Metrics, Monitoring & Logging.
(Graphite, Statsd, Grafana, Nagios, Logstash, ElasticSearch, Kibana, etc. )
• Currently in DPI, doing CI/CD and bringing Gitlab/Spinnaker to the Cloud.
That requires a lot of monitoring…
• Member of the Cloud9 MML Circle, doing Prometheus
• Core Contributor to the R2D2 module that manages Prometheus and Monitoring/Alerting resources
within Cloud9
• Worked on implementing Prometheus and Grafana, while also using these stacks for monitoring
production systems.
• NightOwl for SRT Platform; I know how pagers work.
Who are you, and why are you telling us this?
3
Introduction
The Landscape
What are we running?
Data Center VS Cloud
VM’s and Servers VS containers in Kubernetes
5
Monitoring Prometheus
Metrics Prometheus (+
InfluxDB/Thanos)
Alerting AlertManager, Iris,
OnCall, Grafana
Visualization Grafana
Logging StackDriver,
ElasticSearch + Kibana
Monitoring Nagios + Thruk +
Lookingglass
Metrics Graphite + Statsd
Alerting SMS modems in
physical servers
Visualization Grafana
Logging ElasticSearch + Kibana
•Applications in Kubernetes are much more dynamic than we’re used to.
• No Static IP addresses.
• No Static amount servers (Well, pods actually..)
• Kubernetes can reschedule / relocate pods at will.
• Prometheus uses Service Discovery to find targets
•Both Nagios and Graphite have scaling issues and are too rigid.
• Prometheus is Pull instead of Push based and doesn’t require execution for every single check
• Combines Metrics & Monitoring into a single stack, but focuses on Monitoring.
•Being based on BorgMon, it works out of the box with a lot of Kubernetes /
Cloud native components and the services supporting them.
•StackDriver is not a full fledged alternative due to features, retention and cost.
Why didn’t you come up with something else?
6
So, why Prometheus?
•Out of the box, Prometheus also doesn’t scale endlessly without compromises
(But Thanos will)
•Scalability is solved through retention, manual sharding and vertical scaling,
which all have clear drawbacks.
•HA is solved through duplication (Polling twice from independent instances
with individual TSDB’s).
•Prometheus development is very focused, which shows in certain aspects.
Well.. No.
7
Is this the answer to everything then?
All the pods & services
8
Infrastructure Overview
Kubernetes {DEV, STG, PRO} Clusters
Datacenters
Prometheus
Prometheus
AlertManager
AlertManager
AlertManager
Grafana
PushGateway
IRIS
OnCall
SMS / Call
Provider
HipChat
Operator
Remote
Storage
Adapter
InfluxDB
YOUR App!
Kubernetes
Exporters
Core Concepts
How does it work and what makes it tick?
- Counters
- A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only
increase or be reset to zero on restart. (1, 2, 5, 9, 0, 2, 7)
- Gauges
- A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.
(1, 4, 2, 5, 8)
- Histograms
- A histogram samples observations (usually things like request durations or response sizes) and counts them in
configurable buckets. It also provides a sum of all observed values.
- Summaries
- Similar to a histogram, a summary samples observations (usually things like request durations and response
sizes). While it also provides a total count of observations and a sum of all observed values, it calculates
configurable quantiles over a sliding time window.
- Quantiles are convenient when (for example) expressing median (2-quantile) and 95th percentiles.
Supported Types
10
Making Metrics
- Instead of creating separate checks for every metric that should be monitored for your
application, you expose a single (or multiple..) HTTP Endpoint containing all metrics.
- It’s your responsibility to make this endpoint Available, Fast and Reliable.
- Multiple Frameworks and Libraries can help you provisioning and maintaining such an
endpoint.
- Axle Comes with built-in support for MicroMeter, which does everything for you.
- Backspin support is coming soon™.
- Example: http://localhost:30000/metrics
The concept of Scraping HTTP Metric Endpoints
11
Exposing Metrics: Push VS Pull
# HELP prometheus_tsdb_head_min_time Minimum time bound of the head block.
# TYPE prometheus_tsdb_head_min_time gauge
prometheus_tsdb_head_min_time 1.5282792e+12
# HELP prometheus_tsdb_head_samples_appended_total Total number of appended samples.
# TYPE prometheus_tsdb_head_samples_appended_total counter
prometheus_tsdb_head_samples_appended_total 2.9485092e+07
# HELP prometheus_tsdb_head_series Total number of series in the head block.
# TYPE prometheus_tsdb_head_series gauge
prometheus_tsdb_head_series 19956
# HELP prometheus_tsdb_head_series_created_total Total number of series created in the head
# TYPE prometheus_tsdb_head_series_created_total gauge
prometheus_tsdb_head_series_created_total 56888
- An actual Query Language that looks a lot more like SQL than Graphite.
- You’ll need to learn a new language, but it’s only a single language for creating Graphs and Alerts; for
monitoring and long term metrics.
- Allows for a lot of flexibility, but can be a bit harder to grasp when starting out.
- Supports functions, operators, regex, arithmetic and expressions.
- Four expression types are supported:
- Instant Vectors (like http_requests_total{environment=~"staging|testing|development", method!="GET"})
- Instant vector selectors allow the selection of a set of time series and a single sample value for each at a given timestamp
(instant): in the simplest form, only a metric name is specified. This results in an instant vector containing elements for all time
series that have this metric name.
- Range Vectors (like http_requests_total{job="prometheus"}[5m] )
- Range vector literals work like instant vector literals, except that they select a range of samples back from the current instant.
Syntactically, a range duration is appended in square brackets ([]) at the end of a vector selector to specify how far back in time
values should be fetched for each resulting range vector element.
- Scalars
- Strings
PromQL
12
Querying Metrics
- Custom Resource Type provided by Prometheus-operator
- Abstraction of Prometheus “job” and Service Discovery
- Allows for easy ingestion of new endpoints through their k8s service
- Example:
ServiceMonitors
13
Getting your endpoint monitored
Prometheus
Prometheus OperatorYOUR App! K8s Service ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
endpoints:
- bearerTokenFile:
/var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: k8s-app
selector:
matchLabels:
k8s-app: node-exporter
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: node-exporter
name: node-exporter
spec:
ports:
- name: https
port: 9100
protocol: TCP
targetPort: https
selector:
app: node-exporter
type: ClusterIP
- The same tool you were probably already using.
- The central interface for cloud insights
- Contains a specialized query editor for Prometheus data sources.
- Prometheus currently doesn’t store metrics older than one month for performance reasons.
- Multiple solutions for long term metrics exist, but it’s a work in progress.
Dashboarding with Grafana
14
Creating Insights
Prometheus
Prometheus Grafana
HipChat
Remote
Storage
Adapter
InfluxDB
Trouble in Paradise
Creating Alerts, choosing your weapon
15
WARNINGS – Notifications During workhours
- No direct intervention is required
- Usually picked up by members of the team
developing / maintaining a system.
- Alert delivery is NOT guaranteed.
Use Grafana with HipChat or Email alerts
CRITICALS – 24x7 Text Messages with Escalation
- Actionable events that require immediate attention
by an Engineer on Duty, who does not necessarily
have intimate knowledge of your system.
- Response is required to silence/end the alert.
- Provisioned through RuleList (R2D2 / Operator)
Use AlertManager / Iris / Oncall
Yes, It’s PromQL as well!
16
Alert Basics
%YAML 1.1
---
kind: PrometheusAlertRule
Data:
test.rules: |
Groups:
- name: Load
interval: 30s
Rules:
- alert: HighLoad
expr: rate(web_http_responses_total[1m]) > 1
for: 1m
Labels:
Severity: attention
Annotations:
description: The rate of HTTP requests is too high.
- Alerts should be actionable: Somebody has to do something, now.
- They should be simple: Someone without intimate knowledge of the system should ideally be
able to solve the alert.
- They should be urgent and require human intervention: No point in waking someone up if they
shouldn’t have to do something, or when tomorrow afternoon would be soon enough.
- Provide accurate descriptions and a playbook where possible.
- Basic system monitoring should be based on SLI/SLO’s rather than infra metrics.
- Prefer AM/Iris/OnCall if you’re serious about your alert.
Creating the perfect alert
17
Alert Perfection
Prometheus
AlertManager
AlertManager
AlertManager
Grafana
IRIS OnCall
SMS / Call
Provider
HipChat
• A long list of exporters is available at https://prometheus.io/docs/instrumenting/exporters/
• A number of these come preconfigured with our Kubernetes clusters and provide additional metrics
When artisanal endpoints don’t cut the cake
18
Exporters - Additional sources of metrics
Databases
Aerospike exporter
ClickHouse exporter
Consul exporter (official)
CouchDB exporter
ElasticSearch exporter
Memcached exporter (official)
MongoDB exporter
MSSQL server exporter
MySQL server exporter (official)
OpenTSDB Exporter
Oracle DB Exporter
PgBouncer exporter
PostgreSQL exporter
ProxySQL exporter
RavenDB exporter
Redis exporter
RethinkDB exporter
SQL exporter
Tarantool metric library
Hardware related
apcupsd exporter
Collins exporter
IoT Edison exporter
IPMI exporter
knxd exporter
Node/system metrics exporter (official)
Ubiquiti UniFi exporter
Messaging systems
Beanstalkd exporter
Gearman exporter
Kafka exporter
NATS exporter
NSQ exporter
Mirth Connect exporter
MQTT blackbox exporter
RabbitMQ exporter
RabbitMQ Management Plugin exporter
Storage
Ceph exporter
Ceph RADOSGW exporter
Gluster exporter
Hadoop HDFS FSImage exporter
Lustre exporter
ScaleIO exporter
HTTP
Apache exporter
HAProxy exporter (official)
Nginx metric library
Nginx VTS exporter
Passenger exporter
Tinyproxy exporter
Varnish exporter
WebDriver exporter
APIs
AWS ECS exporter
AWS Health exporter
AWS SQS exporter
Cloudflare exporter
DigitalOcean exporter
Docker Cloud exporter
Docker Hub exporter
GitHub exporter
InstaClustr exporter
Mozilla Observatory exporter
OpenWeatherMap exporter
Pagespeed exporter
Rancher exporter
Speedtest exporter
Logging
Fluentd exporter
Google's mtail log data extractor
Grok exporter
Other monitoring systems
Akamai Cloudmonitor exporter
AWS CloudWatch exporter (official)
Cloud Foundry Firehose exporter
Collectd exporter (official)
Google Stackdriver exporter
Graphite exporter (official)
Heka dashboard exporter
Heka exporter
InfluxDB exporter (official)
JavaMelody exporter
JMX exporter (official)
Munin exporter
Nagios / Naemon exporter
New Relic exporter
NRPE exporter
Osquery exporter
Pingdom exporter
scollector exporter
Sensu exporter
SNMP exporter (official)
StatsD exporter (official)
Miscellaneous
Bamboo exporter
BIG-IP exporter
BIND exporter
Bitbucket exporter
Blackbox exporter (official)
BOSH exporter
cAdvisor
Confluence exporter
Dovecot exporter
eBPF exporter
Jenkins exporter
JIRA exporter
Kannel exporter
Kemp LoadBalancer exporter
Meteor JS web framework exporter
Minecraft exporter module
PHP-FPM exporter
PowerDNS exporter
Process exporter
rTorrent exporter
SABnzbd exporter
Script exporter
Shield exporter
SMTP/Maildir MDA blackbox prober
SoftEther exporter
Transmission exporter
Unbound exporter
Xen exporter
• StackDriver Exporter- Get your GCP Project’s native metrics into Prometheus.
• Blackbox Exporter – Monitor Golden Signals on any system, without knowledge about the inner working
• Nginx exporter – used in Ingresses
• SNMP Exporter – Bring your own MIB’s.
• Statsd Exporter – Push your statsd metrics to a sidecar container
• Node Exporter – Provides system metrics for VM and Physical systems (like kubernetes nodes)
• cAdvisor – Get generic container metrics
• Etcd
• Kubernetes
• Minio (Gitlab Runner Caching)
The most commonly used
19
Exporters - Highlights
Prometheus
Prometheus OperatorExporter K8s Service ServiceMonitor
• For situations where you are unable to serve a HTTP metrics page for a reliable period of time.
• Ideal for short running tasks like Kubernetes CronJobs, Hadoop Jobs, Scripts, etc.
• Allows you to Push (through a HTTP call) Metrics to buffering service, which in turn exposes them to
Prometheus.
• Metrics will live forever on the Gateway, so be careful of what you push and how you name them.
• Avoid this route if possible, since it scales very badly and is NOT redundant. Bring your own endpoint if
and when possible.
• PRO-Tip: If you have an ephemeral job, also push the timestamp of last successful job completion.
The Push Gateway
20
Metrics for ephemeral jobs
Prometheus
PrometheusYOUR App! Push Gateway
echo ”ultimate_answer 42.0" | curl --data-binary @- http://gateway:9091/metrics/job/magrathea/instance/zaphod-001/group/vogon/opex/DPI
ultimate_answer{group=”vogon",instance=”zaphod-001",job=”magrathea",opex=”DPI"} 42.0
Demo Time!
• Kubernetes Running on Docker for macOS.
• Out of the box Prometheus on Kubernetes from https://github.com/coreos/prometheus-
operator/tree/master/contrib/kube-prometheus
• Services are running without an Ingress, so we’re accessing them directly, using NodePorts.
• We’re going to add our own Full Featured Axle Service by creating a Deployment and a Service to match
it, adding a ServiceMonitor, watching Service Discovery do it’s thing, graphing one of the metrics and
creating an alert for it.
• Prometheus: http://localhost:30000/graph
• AlertManager: http://localhost:31000/#/alerts
• Grafana: http://localhost:32000/d/9dP_FHImz/pods
Getting started in 5 minutes
22
Today’s Quick Demo
Tips & Tricks
Getting the most out of your Prometheus Experience
• Metrics in Prometheus are multi dimensional; They consist of names and labels.
• Names are generic identifiers to tell WHAT you are measuring, in what format.
• Metric Names SHOULD have a single (base!) unit, added as a suffix describing that unit. (bytes, seconds,
meters)
• Labels describe characteristics, and are usually used to identify WHERE those metrics are coming from,
and can be multi faceted.
• Prometheus saves a separate Time Series for each name/labels combination, so you have to ensure
label cardinality does not get too high, or you will kill Prometheus in the end. (Bad examples: usernames,
internet IP addresses, hashes).
• Read https://prometheus.io/docs/practices/naming/ before you start making your own!
Keep things running smoothly by not making a mess.
24
Metric Naming
api_http_requests_total { type="create|update|delete”, method=“GET|POST|DELETE” }
api_request_duration_seconds { stage="extract|transform|load” }
api_errors_total { endpoint=“listProducts|updatePricing”, code=“500|404|418 I'm a teapot” }
•An SLI is a service level indicator—a carefully defined quantitative measure of some aspect of
the level of service that is provided.
•An SLO is a service level objective: a target value or range of values for a service level that is
measured by an SLI. A natural structure for SLOs is thus
[SLI ≤ target], or [lower bound ≤ SLI ≤ upper bound].
•Symptoms vs Causes: Monitor things that users will notice when using your system.
•Latency - The time it takes to service a request.
•Traffic. - A measure of how much demand is being placed on your system, measured in a
high-level system-specific metric. For a web service, this measurement is usually HTTP
requests per second.
•Errors - The rate of requests that fail (like HTTP 500’s)
•Saturation- "How "full" your service is. A measure of your system fraction, emphasizing the
resources that are most constrained.
What should you be monitoring?
25
The Golden Signals
•BlackBox Exporter for period requests and their Metrics (Success, Latency, Errors)
•Nginx Ingress Metrics for a man-in-the-middle view of your application (Flow, Latency, Errors)
•Your own application’s Metrics for insights, details and under-the-hood view.
Combining Metric Sources for an unbiassed view
26
Bringing it all together
Your App
Blackbox
Exporter
Ingress
Poll Metrics
Ingress Metrics
App Metrics
- job_name: 'blackbox’
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- http://myapp.behindingress.io # Target to probe with http
Prometheus scrape
•Introducing the GenericServiceMonitor and DCServiceMonitor
•These types allow you to define endpoints outside of Kubernetes, and allow
you to monitor on-premise services.
•DCServiceMonitor works based on bol_applications and as such is bol.com
specific:
•GenericServiceMonitor works on static endpoints
My stuff runs in the DC and I want to keep it there.
27
So what about non-Cloud resources?
kind: Prometheus/DCServiceMonitor
name: tst-sdd-app
spec:
port: 8080
path: /internal/metrics
kind: Prometheus/GenericServiceMonitor
name: dev-atscale-app
Spec:
hosts: - ip: 1.2.3.4
hostname: some.host.name
port: 8080
path: /internal/metrics
opex: srt-bificsps
•Always initialize your metrics at zero when possible, or you won’t know the significance of the
first value.
•How do you know if your application is OK when the metrics stopped working? The up metric
might also disappear when Service Discovery no longer detects your service. Always use
absent() to check for existence of up!
•(i)rate()/increase() then sum(), not sum() then (i)rate()/increase(), since those
are the only safe functions to deal with resets.
•The rate function takes a time series over a time range, and based on the first and last data
points within that range (http://localhost:32000/d/h3RZO2Iik/rate-vs-irate?orgId=1 )
•By contrast irate is an instant rate. It only looks at the last two points within the
range passed to it and calculates a per-second rate.
•To complement the saturation signal; Prometheus has predict_linear() for Gauges.
•All the metrics? http://localhost:30000/federate?match[]={__name__%3D~%22[a-z].*%22}
Things you’ll encounter once you start making queries
28
Other tips
Questions?
Don’t bother to ask me the Ultimate Question of Life, the
Universe and Everything, because you already know the answer.
(and yes, I know where my towel is.)
Remco Overdijk
roverdijk@bol.com
So Long!
And thanks for all the fish.

Contenu connexe

Tendances

엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나종민 김
 
코드로 인프라 관리하기 - 자동화 툴 소개
코드로 인프라 관리하기 - 자동화 툴 소개코드로 인프라 관리하기 - 자동화 툴 소개
코드로 인프라 관리하기 - 자동화 툴 소개태준 문
 
PHP-FPM の子プロセス制御方法と設定をおさらいしよう
PHP-FPM の子プロセス制御方法と設定をおさらいしようPHP-FPM の子プロセス制御方法と設定をおさらいしよう
PHP-FPM の子プロセス制御方法と設定をおさらいしようShohei Okada
 
devops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptxdevops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptxByungho Lee
 
ドメイン駆動設計に15年取り組んでわかったこと
ドメイン駆動設計に15年取り組んでわかったことドメイン駆動設計に15年取り組んでわかったこと
ドメイン駆動設計に15年取り組んでわかったこと増田 亨
 
ドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみようドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみよう増田 亨
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
 
ドメイン駆動設計という仕事の流儀
ドメイン駆動設計という仕事の流儀ドメイン駆動設計という仕事の流儀
ドメイン駆動設計という仕事の流儀増田 亨
 
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]Koichiro Matsuoka
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
악성코드와 분석 방법
악성코드와 분석 방법악성코드와 분석 방법
악성코드와 분석 방법Youngjun Chang
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAnimesh Singh
 
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発慎一 古賀
 
ソフトウェアにおける 複雑さとは何なのか?
ソフトウェアにおける 複雑さとは何なのか?ソフトウェアにおける 複雑さとは何なのか?
ソフトウェアにおける 複雑さとは何なのか?Yoshitaka Kawashima
 
내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015Lim SungHyun
 
ドメイン駆動設計の捉え方 20150718
ドメイン駆動設計の捉え方 20150718ドメイン駆動設計の捉え方 20150718
ドメイン駆動設計の捉え方 20150718Mao Ohnishi
 
3週連続DDDその2 深いモデルの探求(ドメイン駆動設計 第3部)
3週連続DDDその2  深いモデルの探求(ドメイン駆動設計 第3部)3週連続DDDその2  深いモデルの探求(ドメイン駆動設計 第3部)
3週連続DDDその2 深いモデルの探求(ドメイン駆動設計 第3部)増田 亨
 
악성코드 개념 및 대응 기술 (사이버 게놈 기술)
악성코드 개념 및 대응 기술 (사이버 게놈 기술)악성코드 개념 및 대응 기술 (사이버 게놈 기술)
악성코드 개념 및 대응 기술 (사이버 게놈 기술)seungdols
 
正しいものを正しく作る塾-設計コース
正しいものを正しく作る塾-設計コース正しいものを正しく作る塾-設計コース
正しいものを正しく作る塾-設計コース増田 亨
 

Tendances (20)

엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나
 
코드로 인프라 관리하기 - 자동화 툴 소개
코드로 인프라 관리하기 - 자동화 툴 소개코드로 인프라 관리하기 - 자동화 툴 소개
코드로 인프라 관리하기 - 자동화 툴 소개
 
PHP-FPM の子プロセス制御方法と設定をおさらいしよう
PHP-FPM の子プロセス制御方法と設定をおさらいしようPHP-FPM の子プロセス制御方法と設定をおさらいしよう
PHP-FPM の子プロセス制御方法と設定をおさらいしよう
 
Jenkins と groovy
Jenkins と groovyJenkins と groovy
Jenkins と groovy
 
devops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptxdevops 2년차 이직 성공기.pptx
devops 2년차 이직 성공기.pptx
 
ドメイン駆動設計に15年取り組んでわかったこと
ドメイン駆動設計に15年取り組んでわかったことドメイン駆動設計に15年取り組んでわかったこと
ドメイン駆動設計に15年取り組んでわかったこと
 
ドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみようドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみよう
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
ドメイン駆動設計という仕事の流儀
ドメイン駆動設計という仕事の流儀ドメイン駆動設計という仕事の流儀
ドメイン駆動設計という仕事の流儀
 
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
악성코드와 분석 방법
악성코드와 분석 방법악성코드와 분석 방법
악성코드와 분석 방법
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発
 
ソフトウェアにおける 複雑さとは何なのか?
ソフトウェアにおける 複雑さとは何なのか?ソフトウェアにおける 複雑さとは何なのか?
ソフトウェアにおける 複雑さとは何なのか?
 
내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015
 
ドメイン駆動設計の捉え方 20150718
ドメイン駆動設計の捉え方 20150718ドメイン駆動設計の捉え方 20150718
ドメイン駆動設計の捉え方 20150718
 
3週連続DDDその2 深いモデルの探求(ドメイン駆動設計 第3部)
3週連続DDDその2  深いモデルの探求(ドメイン駆動設計 第3部)3週連続DDDその2  深いモデルの探求(ドメイン駆動設計 第3部)
3週連続DDDその2 深いモデルの探求(ドメイン駆動設計 第3部)
 
악성코드 개념 및 대응 기술 (사이버 게놈 기술)
악성코드 개념 및 대응 기술 (사이버 게놈 기술)악성코드 개념 및 대응 기술 (사이버 게놈 기술)
악성코드 개념 및 대응 기술 (사이버 게놈 기술)
 
正しいものを正しく作る塾-設計コース
正しいものを正しく作る塾-設計コース正しいものを正しく作る塾-設計コース
正しいものを正しく作る塾-設計コース
 

Similaire à The hitchhiker’s guide to Prometheus

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Brian Brazil
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Brian Brazil
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemAccumulo Summit
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleJuraj Hantak
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleAdam Hamsik
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)Eran Levy
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
Slack in the Age of Prometheus
Slack in the Age of PrometheusSlack in the Age of Prometheus
Slack in the Age of PrometheusGeorge Luong
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus OverviewBrian Brazil
 

Similaire à The hitchhiker’s guide to Prometheus (20)

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
RxJava@Android
RxJava@AndroidRxJava@Android
RxJava@Android
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Mini training - Reactive Extensions (Rx)
Mini training - Reactive Extensions (Rx)Mini training - Reactive Extensions (Rx)
Mini training - Reactive Extensions (Rx)
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Slack in the Age of Prometheus
Slack in the Age of PrometheusSlack in the Age of Prometheus
Slack in the Age of Prometheus
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 

Plus de Bol.com Techlab

The Reactive Rollercoaster
The Reactive RollercoasterThe Reactive Rollercoaster
The Reactive RollercoasterBol.com Techlab
 
Best painkiller for Java headache
Best painkiller for Java headacheBest painkiller for Java headache
Best painkiller for Java headacheBol.com Techlab
 
Organizing a conference in 80 days
Organizing a conference in 80 daysOrganizing a conference in 80 days
Organizing a conference in 80 daysBol.com Techlab
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jamsBol.com Techlab
 
Understanding Operating Systems by breaking them
Understanding Operating Systems by breaking themUnderstanding Operating Systems by breaking them
Understanding Operating Systems by breaking themBol.com Techlab
 
How to train your dragon
How to train your dragonHow to train your dragon
How to train your dragonBol.com Techlab
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusBol.com Techlab
 
Software for drafting a cold beer
Software for drafting a cold beerSoftware for drafting a cold beer
Software for drafting a cold beerBol.com Techlab
 
Going to the cloud: Forget EVERYTHING you know!
Going to the cloud: Forget EVERYTHING you know!Going to the cloud: Forget EVERYTHING you know!
Going to the cloud: Forget EVERYTHING you know!Bol.com Techlab
 
How to create your presentation in an iterative way
How to create your presentation in an iterative wayHow to create your presentation in an iterative way
How to create your presentation in an iterative wayBol.com Techlab
 
Jupyter and Pandas to the rescue!
Jupyter and Pandas to the rescue!Jupyter and Pandas to the rescue!
Jupyter and Pandas to the rescue!Bol.com Techlab
 
How the best of Design and Development come together
How the best of Design and Development come togetherHow the best of Design and Development come together
How the best of Design and Development come togetherBol.com Techlab
 
The addition to your team you never knew you needed
The addition to your team you never knew you neededThe addition to your team you never knew you needed
The addition to your team you never knew you neededBol.com Techlab
 
Gravitational waves: A new era in astronomy
Gravitational waves: A new era in astronomyGravitational waves: A new era in astronomy
Gravitational waves: A new era in astronomyBol.com Techlab
 
Consumer Driven Contract Testing
Consumer Driven Contract TestingConsumer Driven Contract Testing
Consumer Driven Contract TestingBol.com Techlab
 
I want to go fast! - Exposing performance bottlenecks
I want to go fast! - Exposing performance bottlenecksI want to go fast! - Exposing performance bottlenecks
I want to go fast! - Exposing performance bottlenecksBol.com Techlab
 
Kubernetes: love at first sight?
Kubernetes: love at first sight?Kubernetes: love at first sight?
Kubernetes: love at first sight?Bol.com Techlab
 
Blockchain: the magical database in the cloud?
Blockchain: the magical database in the cloud?Blockchain: the magical database in the cloud?
Blockchain: the magical database in the cloud?Bol.com Techlab
 

Plus de Bol.com Techlab (20)

Test long and prosper
Test long and prosperTest long and prosper
Test long and prosper
 
The Reactive Rollercoaster
The Reactive RollercoasterThe Reactive Rollercoaster
The Reactive Rollercoaster
 
Best painkiller for Java headache
Best painkiller for Java headacheBest painkiller for Java headache
Best painkiller for Java headache
 
Organizing a conference in 80 days
Organizing a conference in 80 daysOrganizing a conference in 80 days
Organizing a conference in 80 days
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jams
 
Understanding Operating Systems by breaking them
Understanding Operating Systems by breaking themUnderstanding Operating Systems by breaking them
Understanding Operating Systems by breaking them
 
How to train your dragon
How to train your dragonHow to train your dragon
How to train your dragon
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
Software for drafting a cold beer
Software for drafting a cold beerSoftware for drafting a cold beer
Software for drafting a cold beer
 
Going to the cloud: Forget EVERYTHING you know!
Going to the cloud: Forget EVERYTHING you know!Going to the cloud: Forget EVERYTHING you know!
Going to the cloud: Forget EVERYTHING you know!
 
How to create your presentation in an iterative way
How to create your presentation in an iterative wayHow to create your presentation in an iterative way
How to create your presentation in an iterative way
 
Wax on, wax off
Wax on, wax offWax on, wax off
Wax on, wax off
 
Jupyter and Pandas to the rescue!
Jupyter and Pandas to the rescue!Jupyter and Pandas to the rescue!
Jupyter and Pandas to the rescue!
 
How the best of Design and Development come together
How the best of Design and Development come togetherHow the best of Design and Development come together
How the best of Design and Development come together
 
The addition to your team you never knew you needed
The addition to your team you never knew you neededThe addition to your team you never knew you needed
The addition to your team you never knew you needed
 
Gravitational waves: A new era in astronomy
Gravitational waves: A new era in astronomyGravitational waves: A new era in astronomy
Gravitational waves: A new era in astronomy
 
Consumer Driven Contract Testing
Consumer Driven Contract TestingConsumer Driven Contract Testing
Consumer Driven Contract Testing
 
I want to go fast! - Exposing performance bottlenecks
I want to go fast! - Exposing performance bottlenecksI want to go fast! - Exposing performance bottlenecks
I want to go fast! - Exposing performance bottlenecks
 
Kubernetes: love at first sight?
Kubernetes: love at first sight?Kubernetes: love at first sight?
Kubernetes: love at first sight?
 
Blockchain: the magical database in the cloud?
Blockchain: the magical database in the cloud?Blockchain: the magical database in the cloud?
Blockchain: the magical database in the cloud?
 

Dernier

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

The hitchhiker’s guide to Prometheus

  • 1. The hitchhiker’s guide to Remco Overdijk 1 "A Metric, The Hitchhiker's Guide to Prometheus says, is about the most massively useful thing someone doing Monitoring can have. It has great practical value. You can wave your Metric in emergencies as a distress signal, and produce pretty Graphs at the same time."
  • 2. 1. The Landscape What are we running and why? 2. Core Concepts How does Prometheus work? 3. Demo Time! It’s a Tools in Action talk after all, right? 4. Tips & Tricks Getting the most out of your Prometheus Experience 5. Questions? I’m probably going to answer “42” to most of them.. So many things to tell, so little time.. 2 The Hitchhiker’s Guide to Prometheus
  • 3. • Started out in TES, doing Metrics, Monitoring & Logging. (Graphite, Statsd, Grafana, Nagios, Logstash, ElasticSearch, Kibana, etc. ) • Currently in DPI, doing CI/CD and bringing Gitlab/Spinnaker to the Cloud. That requires a lot of monitoring… • Member of the Cloud9 MML Circle, doing Prometheus • Core Contributor to the R2D2 module that manages Prometheus and Monitoring/Alerting resources within Cloud9 • Worked on implementing Prometheus and Grafana, while also using these stacks for monitoring production systems. • NightOwl for SRT Platform; I know how pagers work. Who are you, and why are you telling us this? 3 Introduction
  • 5. Data Center VS Cloud VM’s and Servers VS containers in Kubernetes 5 Monitoring Prometheus Metrics Prometheus (+ InfluxDB/Thanos) Alerting AlertManager, Iris, OnCall, Grafana Visualization Grafana Logging StackDriver, ElasticSearch + Kibana Monitoring Nagios + Thruk + Lookingglass Metrics Graphite + Statsd Alerting SMS modems in physical servers Visualization Grafana Logging ElasticSearch + Kibana
  • 6. •Applications in Kubernetes are much more dynamic than we’re used to. • No Static IP addresses. • No Static amount servers (Well, pods actually..) • Kubernetes can reschedule / relocate pods at will. • Prometheus uses Service Discovery to find targets •Both Nagios and Graphite have scaling issues and are too rigid. • Prometheus is Pull instead of Push based and doesn’t require execution for every single check • Combines Metrics & Monitoring into a single stack, but focuses on Monitoring. •Being based on BorgMon, it works out of the box with a lot of Kubernetes / Cloud native components and the services supporting them. •StackDriver is not a full fledged alternative due to features, retention and cost. Why didn’t you come up with something else? 6 So, why Prometheus?
  • 7. •Out of the box, Prometheus also doesn’t scale endlessly without compromises (But Thanos will) •Scalability is solved through retention, manual sharding and vertical scaling, which all have clear drawbacks. •HA is solved through duplication (Polling twice from independent instances with individual TSDB’s). •Prometheus development is very focused, which shows in certain aspects. Well.. No. 7 Is this the answer to everything then?
  • 8. All the pods & services 8 Infrastructure Overview Kubernetes {DEV, STG, PRO} Clusters Datacenters Prometheus Prometheus AlertManager AlertManager AlertManager Grafana PushGateway IRIS OnCall SMS / Call Provider HipChat Operator Remote Storage Adapter InfluxDB YOUR App! Kubernetes Exporters
  • 9. Core Concepts How does it work and what makes it tick?
  • 10. - Counters - A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. (1, 2, 5, 9, 0, 2, 7) - Gauges - A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. (1, 4, 2, 5, 8) - Histograms - A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values. - Summaries - Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window. - Quantiles are convenient when (for example) expressing median (2-quantile) and 95th percentiles. Supported Types 10 Making Metrics
  • 11. - Instead of creating separate checks for every metric that should be monitored for your application, you expose a single (or multiple..) HTTP Endpoint containing all metrics. - It’s your responsibility to make this endpoint Available, Fast and Reliable. - Multiple Frameworks and Libraries can help you provisioning and maintaining such an endpoint. - Axle Comes with built-in support for MicroMeter, which does everything for you. - Backspin support is coming soon™. - Example: http://localhost:30000/metrics The concept of Scraping HTTP Metric Endpoints 11 Exposing Metrics: Push VS Pull # HELP prometheus_tsdb_head_min_time Minimum time bound of the head block. # TYPE prometheus_tsdb_head_min_time gauge prometheus_tsdb_head_min_time 1.5282792e+12 # HELP prometheus_tsdb_head_samples_appended_total Total number of appended samples. # TYPE prometheus_tsdb_head_samples_appended_total counter prometheus_tsdb_head_samples_appended_total 2.9485092e+07 # HELP prometheus_tsdb_head_series Total number of series in the head block. # TYPE prometheus_tsdb_head_series gauge prometheus_tsdb_head_series 19956 # HELP prometheus_tsdb_head_series_created_total Total number of series created in the head # TYPE prometheus_tsdb_head_series_created_total gauge prometheus_tsdb_head_series_created_total 56888
  • 12. - An actual Query Language that looks a lot more like SQL than Graphite. - You’ll need to learn a new language, but it’s only a single language for creating Graphs and Alerts; for monitoring and long term metrics. - Allows for a lot of flexibility, but can be a bit harder to grasp when starting out. - Supports functions, operators, regex, arithmetic and expressions. - Four expression types are supported: - Instant Vectors (like http_requests_total{environment=~"staging|testing|development", method!="GET"}) - Instant vector selectors allow the selection of a set of time series and a single sample value for each at a given timestamp (instant): in the simplest form, only a metric name is specified. This results in an instant vector containing elements for all time series that have this metric name. - Range Vectors (like http_requests_total{job="prometheus"}[5m] ) - Range vector literals work like instant vector literals, except that they select a range of samples back from the current instant. Syntactically, a range duration is appended in square brackets ([]) at the end of a vector selector to specify how far back in time values should be fetched for each resulting range vector element. - Scalars - Strings PromQL 12 Querying Metrics
  • 13. - Custom Resource Type provided by Prometheus-operator - Abstraction of Prometheus “job” and Service Discovery - Allows for easy ingestion of new endpoints through their k8s service - Example: ServiceMonitors 13 Getting your endpoint monitored Prometheus Prometheus OperatorYOUR App! K8s Service ServiceMonitor apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token interval: 30s port: https scheme: https tlsConfig: insecureSkipVerify: true jobLabel: k8s-app selector: matchLabels: k8s-app: node-exporter apiVersion: v1 kind: Service metadata: labels: k8s-app: node-exporter name: node-exporter spec: ports: - name: https port: 9100 protocol: TCP targetPort: https selector: app: node-exporter type: ClusterIP
  • 14. - The same tool you were probably already using. - The central interface for cloud insights - Contains a specialized query editor for Prometheus data sources. - Prometheus currently doesn’t store metrics older than one month for performance reasons. - Multiple solutions for long term metrics exist, but it’s a work in progress. Dashboarding with Grafana 14 Creating Insights Prometheus Prometheus Grafana HipChat Remote Storage Adapter InfluxDB
  • 15. Trouble in Paradise Creating Alerts, choosing your weapon 15 WARNINGS – Notifications During workhours - No direct intervention is required - Usually picked up by members of the team developing / maintaining a system. - Alert delivery is NOT guaranteed. Use Grafana with HipChat or Email alerts CRITICALS – 24x7 Text Messages with Escalation - Actionable events that require immediate attention by an Engineer on Duty, who does not necessarily have intimate knowledge of your system. - Response is required to silence/end the alert. - Provisioned through RuleList (R2D2 / Operator) Use AlertManager / Iris / Oncall
  • 16. Yes, It’s PromQL as well! 16 Alert Basics %YAML 1.1 --- kind: PrometheusAlertRule Data: test.rules: | Groups: - name: Load interval: 30s Rules: - alert: HighLoad expr: rate(web_http_responses_total[1m]) > 1 for: 1m Labels: Severity: attention Annotations: description: The rate of HTTP requests is too high.
  • 17. - Alerts should be actionable: Somebody has to do something, now. - They should be simple: Someone without intimate knowledge of the system should ideally be able to solve the alert. - They should be urgent and require human intervention: No point in waking someone up if they shouldn’t have to do something, or when tomorrow afternoon would be soon enough. - Provide accurate descriptions and a playbook where possible. - Basic system monitoring should be based on SLI/SLO’s rather than infra metrics. - Prefer AM/Iris/OnCall if you’re serious about your alert. Creating the perfect alert 17 Alert Perfection Prometheus AlertManager AlertManager AlertManager Grafana IRIS OnCall SMS / Call Provider HipChat
  • 18. • A long list of exporters is available at https://prometheus.io/docs/instrumenting/exporters/ • A number of these come preconfigured with our Kubernetes clusters and provide additional metrics When artisanal endpoints don’t cut the cake 18 Exporters - Additional sources of metrics Databases Aerospike exporter ClickHouse exporter Consul exporter (official) CouchDB exporter ElasticSearch exporter Memcached exporter (official) MongoDB exporter MSSQL server exporter MySQL server exporter (official) OpenTSDB Exporter Oracle DB Exporter PgBouncer exporter PostgreSQL exporter ProxySQL exporter RavenDB exporter Redis exporter RethinkDB exporter SQL exporter Tarantool metric library Hardware related apcupsd exporter Collins exporter IoT Edison exporter IPMI exporter knxd exporter Node/system metrics exporter (official) Ubiquiti UniFi exporter Messaging systems Beanstalkd exporter Gearman exporter Kafka exporter NATS exporter NSQ exporter Mirth Connect exporter MQTT blackbox exporter RabbitMQ exporter RabbitMQ Management Plugin exporter Storage Ceph exporter Ceph RADOSGW exporter Gluster exporter Hadoop HDFS FSImage exporter Lustre exporter ScaleIO exporter HTTP Apache exporter HAProxy exporter (official) Nginx metric library Nginx VTS exporter Passenger exporter Tinyproxy exporter Varnish exporter WebDriver exporter APIs AWS ECS exporter AWS Health exporter AWS SQS exporter Cloudflare exporter DigitalOcean exporter Docker Cloud exporter Docker Hub exporter GitHub exporter InstaClustr exporter Mozilla Observatory exporter OpenWeatherMap exporter Pagespeed exporter Rancher exporter Speedtest exporter Logging Fluentd exporter Google's mtail log data extractor Grok exporter Other monitoring systems Akamai Cloudmonitor exporter AWS CloudWatch exporter (official) Cloud Foundry Firehose exporter Collectd exporter (official) Google Stackdriver exporter Graphite exporter (official) Heka dashboard exporter Heka exporter InfluxDB exporter (official) JavaMelody exporter JMX exporter (official) Munin exporter Nagios / Naemon exporter New Relic exporter NRPE exporter Osquery exporter Pingdom exporter scollector exporter Sensu exporter SNMP exporter (official) StatsD exporter (official) Miscellaneous Bamboo exporter BIG-IP exporter BIND exporter Bitbucket exporter Blackbox exporter (official) BOSH exporter cAdvisor Confluence exporter Dovecot exporter eBPF exporter Jenkins exporter JIRA exporter Kannel exporter Kemp LoadBalancer exporter Meteor JS web framework exporter Minecraft exporter module PHP-FPM exporter PowerDNS exporter Process exporter rTorrent exporter SABnzbd exporter Script exporter Shield exporter SMTP/Maildir MDA blackbox prober SoftEther exporter Transmission exporter Unbound exporter Xen exporter
  • 19. • StackDriver Exporter- Get your GCP Project’s native metrics into Prometheus. • Blackbox Exporter – Monitor Golden Signals on any system, without knowledge about the inner working • Nginx exporter – used in Ingresses • SNMP Exporter – Bring your own MIB’s. • Statsd Exporter – Push your statsd metrics to a sidecar container • Node Exporter – Provides system metrics for VM and Physical systems (like kubernetes nodes) • cAdvisor – Get generic container metrics • Etcd • Kubernetes • Minio (Gitlab Runner Caching) The most commonly used 19 Exporters - Highlights Prometheus Prometheus OperatorExporter K8s Service ServiceMonitor
  • 20. • For situations where you are unable to serve a HTTP metrics page for a reliable period of time. • Ideal for short running tasks like Kubernetes CronJobs, Hadoop Jobs, Scripts, etc. • Allows you to Push (through a HTTP call) Metrics to buffering service, which in turn exposes them to Prometheus. • Metrics will live forever on the Gateway, so be careful of what you push and how you name them. • Avoid this route if possible, since it scales very badly and is NOT redundant. Bring your own endpoint if and when possible. • PRO-Tip: If you have an ephemeral job, also push the timestamp of last successful job completion. The Push Gateway 20 Metrics for ephemeral jobs Prometheus PrometheusYOUR App! Push Gateway echo ”ultimate_answer 42.0" | curl --data-binary @- http://gateway:9091/metrics/job/magrathea/instance/zaphod-001/group/vogon/opex/DPI ultimate_answer{group=”vogon",instance=”zaphod-001",job=”magrathea",opex=”DPI"} 42.0
  • 22. • Kubernetes Running on Docker for macOS. • Out of the box Prometheus on Kubernetes from https://github.com/coreos/prometheus- operator/tree/master/contrib/kube-prometheus • Services are running without an Ingress, so we’re accessing them directly, using NodePorts. • We’re going to add our own Full Featured Axle Service by creating a Deployment and a Service to match it, adding a ServiceMonitor, watching Service Discovery do it’s thing, graphing one of the metrics and creating an alert for it. • Prometheus: http://localhost:30000/graph • AlertManager: http://localhost:31000/#/alerts • Grafana: http://localhost:32000/d/9dP_FHImz/pods Getting started in 5 minutes 22 Today’s Quick Demo
  • 23. Tips & Tricks Getting the most out of your Prometheus Experience
  • 24. • Metrics in Prometheus are multi dimensional; They consist of names and labels. • Names are generic identifiers to tell WHAT you are measuring, in what format. • Metric Names SHOULD have a single (base!) unit, added as a suffix describing that unit. (bytes, seconds, meters) • Labels describe characteristics, and are usually used to identify WHERE those metrics are coming from, and can be multi faceted. • Prometheus saves a separate Time Series for each name/labels combination, so you have to ensure label cardinality does not get too high, or you will kill Prometheus in the end. (Bad examples: usernames, internet IP addresses, hashes). • Read https://prometheus.io/docs/practices/naming/ before you start making your own! Keep things running smoothly by not making a mess. 24 Metric Naming api_http_requests_total { type="create|update|delete”, method=“GET|POST|DELETE” } api_request_duration_seconds { stage="extract|transform|load” } api_errors_total { endpoint=“listProducts|updatePricing”, code=“500|404|418 I'm a teapot” }
  • 25. •An SLI is a service level indicator—a carefully defined quantitative measure of some aspect of the level of service that is provided. •An SLO is a service level objective: a target value or range of values for a service level that is measured by an SLI. A natural structure for SLOs is thus [SLI ≤ target], or [lower bound ≤ SLI ≤ upper bound]. •Symptoms vs Causes: Monitor things that users will notice when using your system. •Latency - The time it takes to service a request. •Traffic. - A measure of how much demand is being placed on your system, measured in a high-level system-specific metric. For a web service, this measurement is usually HTTP requests per second. •Errors - The rate of requests that fail (like HTTP 500’s) •Saturation- "How "full" your service is. A measure of your system fraction, emphasizing the resources that are most constrained. What should you be monitoring? 25 The Golden Signals
  • 26. •BlackBox Exporter for period requests and their Metrics (Success, Latency, Errors) •Nginx Ingress Metrics for a man-in-the-middle view of your application (Flow, Latency, Errors) •Your own application’s Metrics for insights, details and under-the-hood view. Combining Metric Sources for an unbiassed view 26 Bringing it all together Your App Blackbox Exporter Ingress Poll Metrics Ingress Metrics App Metrics - job_name: 'blackbox’ metrics_path: /probe params: module: [http_2xx] # Look for a HTTP 200 response. static_configs: - targets: - http://myapp.behindingress.io # Target to probe with http Prometheus scrape
  • 27. •Introducing the GenericServiceMonitor and DCServiceMonitor •These types allow you to define endpoints outside of Kubernetes, and allow you to monitor on-premise services. •DCServiceMonitor works based on bol_applications and as such is bol.com specific: •GenericServiceMonitor works on static endpoints My stuff runs in the DC and I want to keep it there. 27 So what about non-Cloud resources? kind: Prometheus/DCServiceMonitor name: tst-sdd-app spec: port: 8080 path: /internal/metrics kind: Prometheus/GenericServiceMonitor name: dev-atscale-app Spec: hosts: - ip: 1.2.3.4 hostname: some.host.name port: 8080 path: /internal/metrics opex: srt-bificsps
  • 28. •Always initialize your metrics at zero when possible, or you won’t know the significance of the first value. •How do you know if your application is OK when the metrics stopped working? The up metric might also disappear when Service Discovery no longer detects your service. Always use absent() to check for existence of up! •(i)rate()/increase() then sum(), not sum() then (i)rate()/increase(), since those are the only safe functions to deal with resets. •The rate function takes a time series over a time range, and based on the first and last data points within that range (http://localhost:32000/d/h3RZO2Iik/rate-vs-irate?orgId=1 ) •By contrast irate is an instant rate. It only looks at the last two points within the range passed to it and calculates a per-second rate. •To complement the saturation signal; Prometheus has predict_linear() for Gauges. •All the metrics? http://localhost:30000/federate?match[]={__name__%3D~%22[a-z].*%22} Things you’ll encounter once you start making queries 28 Other tips
  • 29. Questions? Don’t bother to ask me the Ultimate Question of Life, the Universe and Everything, because you already know the answer. (and yes, I know where my towel is.)