SlideShare une entreprise Scribd logo
1  sur  47
Monitoring your Python with Prometheus
Python Ireland, April 2015
Brian Brazil
Senior Software Engineer
Boxever
What is monitoring?
What is monitoring?
• Host-based checks
• High frequency information about a few key metrics
• High frequency high granularity profiling
• Tailing logs
Céin Fath?
Why do we want monitoring?
Why: Alerting
We want to know when things go wrong
We want to know when things aren’t quite right
We want to know in advance of problems
Why: Debugging
When something is up, you need to debug.
You want to go from high-level problem, and drill down to
what’s causing it. Need to be able to reason about things.
Sometimes want to go from code back to metrics.
Why: Trending
How the various bits of a system are being used.
For example, how many static requests per dynamic
request? How many sessions active at once? How many hit
a certain corner case?
For some stats, also want to know how they change over
time for capacity planning and design discussions.
A different approach
What if we instrumented everything?
• RPCs
• Interfaces between subsystems
• Business logic
• Every time you’d log something
A different approach
What if we monitored systems and subsystems to know
how everything is generally doing?
What if each developer didn’t have to add instrumentation
themselves - what if every library came with it built-in?
Could focus on developing, and still get good metrics!
A different approach
Some things to monitor:
● Client and server qps/errors/latency
● Every log message should be a metric
● Every failure should be a metric
● Threadpool/queue size, in progress, latency
● Business logic inputs and outputs
● Data sizes in/out
● Process cpu/ram/language internals (e.g. GC)
● Blackbox and end-to-end monitoring/heartbeats
● Batch job: last success time, duration, records processed
That’s a lot of metrics
That could be tens of thousands of codepoints across an
entire system.
You’d need some way to make it easy to instrument all
code, not just the externally facing parts of applications.
You’d need something able to handle a million time series.
Presenting Prometheus
An open-source service monitoring system and time series
database.
Started in 2012, primarily developed in Soundcloud with
committers also in Boxever and Docker.
Publicly announced January 2015, many contributions and
users since then.
Architecture
Presenting Prometheus
• Client libraries that make instrumentation easy
• Support for many languages: Python, Java, Go, Ruby…
• Standalone server
• Can handle over a million time series in one instance
• No network dependencies
• Written in Go, easy to run
• Integrations
• Machine, HAProxy, CloudWatch, Statsd, Collectd, JMX, Mesos,
Consul, MySQL, cadvisor, etcd, django, elasticsearch...
Presenting Prometheus
• Dashboards
• Promdash: Ruby on Rails web app
• Console templates: More power for those who like checking things in
• Expression browser: Ad-hoc queries
• JSON interface: Roll your own
• Alerts
• Supports Pagerduty, Email, Pushover
Dashboards
Let’s Talk Python
First version of client hacked together in October 2014 in
an hour, mostly spent playing with meta-programming.
First official version 0.0.1 released February 2015.
Version 0.0.8 released April 2015.
Where’s the code?
https://github.com/prometheus/client_python
https://pypi.python.org/pypi/prometheus_client
pip install prometheus_client
The Basics
Two fundamental data types.
Counter: It only goes up (and resets), counts something
Gauge: It goes up and down, snapshot of state
Flow with your code
Instrumentation should be an integral part of your code,
similar to logging.
Don’t segregate out instrumentation to a separate class, file
or module - have it everywhere.
Instrumentation that makes this easy helps.
Counting exceptions in a method
from instrumentation import *
EX = 0
metrics.register(Counter.create(
‘method_ex’, lambda: EX))
def my_method():
try:
pass # Your code here
except:
global EX
EX += 1
raise
Counting exceptions: Prometheus
from prometheus_client import Counter
EX = Counter(
‘mymethod_exceptions_total’, 'Exceptions in mymethod’)
@EX.count_exceptions()
def my_method():
pass
Brian’s Pet Peeve #1
Wrapping instrumentation libraries to make them “simpler”
Tend to confuse abstractions, encourage bad practices and
make it difficult to write correct and useable instrumentation
e.g. Prometheus values are doubles, if you only allow ints
then end user has to do math to convert back to seconds
Speaking of Correct Instrumentation
It’s better to have math done in the server, not the client
Many instrumentation systems are exponentially decaying
Do you really want to do calculus during an outage?
Prometheus has monotonic counters
Races and missed scrapers don’t lose data
Counting exceptions: Context Manager
from prometheus_client import Counter
EX = Counter(
‘method_exceptions’, 'Exceptions in my method’)
def my_method():
with EX.count_exceptions():
pass
Decorator and Context Manager
In Python 3 have contextlib.ContextDecorator.
contextdecorator on PyPi for Python 2 - but couldn’t get it to
work.
Ended up hand coding it, an object that supports
__enter__, __exit__ and __call__.
Counter Basics
requests = Counter(
‘requests_total’,
‘Total number of requests’)
requests.inc()
requests.inc(42)
Brian’s Pet Peeve #2
Instrumentation that you need to read the code to
understand
e.g. “Total number of requests” - what type of request?
Make the names such that a random person not intimately
familiar with the system would have a good chance at
guessing what it means. Specify your units.
Gauge Basics
INPROGRESS = Gauge(
‘http_requests_inprogress’,
‘Total number of HTTP requests ongoing’)
def my_method:
INPROGRESS.inc()
try:
pass # Your code here
finally:
INPROGRESS.dec()
Gauge Basics: Convenience all the way
INPROGRESS = Gauge(
‘inprogress_requests’,
‘Total number of requests ongoing’)
@INPROGRESS.track_inprogress()
def my_method:
pass # Your code here
More Gauges
Many other ways to use a Gauge:
MYGAUGE.set(42)
MYGAUGE.set_to_current_time()
MYGAUGE.set_function(lambda: len(some_dict))
What about time?
Useful to measure how long things take.
Two options in Prometheus: Summary and Histogram.
Summary is cheap and simple.
Histogram can be expensive and is more granular.
Time a method
LATENCY = Summary(‘request_latency_seconds’,
‘Request latency in seconds’)
@LATENCY.time()
def process_request():
pass
Histogram is the same. There’s also a context manager.
How to get the data out: Summary
Summary is two counters, one for the number of requests
and the other for the amount of time spent.
Calculating rate(), aggregate and divide to get latency.
Not limited to time, can track e.g. bytes sent or objects
processed using observe() method.
How to get the data out: Histogram
Histogram is counter per bucket (plus Summary counters).
Get rate()s of buckets, aggregate and
histogram_quantile() will estimate the quantile.
Timeseries per bucket can add up fast.
Python 3 support
Wanted to add Python 3 support.
Can the same code work in both?
Python 3 support
Simple stuff:
try:
from BaseHTTPServer import BaseHTTPRequestHandler
except ImportError:
from http.server import BaseHTTPRequestHandler
iter vs. iteritems
% vs. format
Python 3 support: Unicode
from __future__ import unicode_literals
Use b‘’ for raw byte literals
unicode_literals breaks __all__ on Python 2.x,
munge with encode(‘ascii`)
unicode = str for Python 3
Data Model
Tired of aggregating and alerting off metrics like http.
responses.500.myserver.mydc.production?
Time series have structured key-value pairs, e.g.
http_responses_total{
response_code=”500”,instance=”myserver”,
dc=”mydc”,env=”production”}
Brian’s Pet Peeve #3
Munging structured data in a way that loses the structure
Is it so much to ask for some escaping, or at least sanitizing
any separators in the data?
Labels
For any metric:
LATENCY = Summary(‘request_bytes_sent’,
‘Request bytes sent’, labels=[‘method’])
LATENCY.labels(“GET”).observe(42)
Don’t go overboard!
Getting The Data Out
from prometheus_client import start_http_server
start_http_server(8000)
Easy to produce output for e.g. Django.
Can also use write_to_textfile() with Node Exporter
Textfile Collector for machine-level cronjobs!
Query Language
Aggregation based on the key-value labels
Arbitrarily complex math
And all of this can be used in pre-computed rules and alerts
Query Language: Example
Column families with the 10 highest read rates per second
topk(10,
sum by(job, keyspace, columnfamily) (
rate(cassandra_columnfamily_readlatency[5m])
)
)
The Live Demo
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work please work
please work please work please work please work please work please work please work please work please work please work
Client Libraries: In and Out
Client libraries don’t tie you to Prometheus instrumentation
Custom collectors allow pulling data from other
instrumentation systems into Prometheus client library
Similarly, can pull data out of client library and expose as
you wish
More Information
http://prometheus.io
http://www.boxever.com/tag/monitoring
SREcon15 Europe, May 14-15th

Contenu connexe

Tendances

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 

Tendances (20)

Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Improved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
 
Thanos - Prometheus on Scale
Thanos - Prometheus on ScaleThanos - Prometheus on Scale
Thanos - Prometheus on Scale
 

En vedette

New NHG-Contract brochure_v6revD
New NHG-Contract brochure_v6revDNew NHG-Contract brochure_v6revD
New NHG-Contract brochure_v6revD
Corey Barfuss
 

En vedette (20)

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Prometheus Storage
Prometheus StoragePrometheus Storage
Prometheus Storage
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
Kubernetes and Prometheus
Kubernetes and PrometheusKubernetes and Prometheus
Kubernetes and Prometheus
 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 
How to find the current active namenode in a Hadoop High Availability cluster
How to find the current active namenode in a Hadoop High Availability clusterHow to find the current active namenode in a Hadoop High Availability cluster
How to find the current active namenode in a Hadoop High Availability cluster
 
New NHG-Contract brochure_v6revD
New NHG-Contract brochure_v6revDNew NHG-Contract brochure_v6revD
New NHG-Contract brochure_v6revD
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
 
Performance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-MechanizePerformance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-Mechanize
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an Exporter
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)
 
Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)
 
PHPにないセキュリティ機能
PHPにないセキュリティ機能PHPにないセキュリティ機能
PHPにないセキュリティ機能
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
 

Similaire à Monitoring your Python with Prometheus (Python Ireland April 2015)

Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
Bill Buchan
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Tomas Cervenka
 

Similaire à Monitoring your Python with Prometheus (Python Ireland April 2015) (20)

Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
 
SiestaTime - Defcon27 Red Team Village
SiestaTime - Defcon27 Red Team VillageSiestaTime - Defcon27 Red Team Village
SiestaTime - Defcon27 Red Team Village
 
Prometheus Introduction (InfraCoders Vienna)
Prometheus Introduction (InfraCoders Vienna)Prometheus Introduction (InfraCoders Vienna)
Prometheus Introduction (InfraCoders Vienna)
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Approaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologneApproaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologne
 
Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
 
The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to Observability
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - Talk
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Approaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days PolandApproaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days Poland
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 

Plus de Brian Brazil

Plus de Brian Brazil (15)

OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Evolution of the Prometheus TSDB  (Percona Live Europe 2017)Evolution of the Prometheus TSDB  (Percona Live Europe 2017)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
 
Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
An Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQLAn Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQL
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)
 

Dernier

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
ellan12
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
sexy call girls service in goa
 

Dernier (20)

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 

Monitoring your Python with Prometheus (Python Ireland April 2015)

  • 1. Monitoring your Python with Prometheus Python Ireland, April 2015 Brian Brazil Senior Software Engineer Boxever
  • 3. What is monitoring? • Host-based checks • High frequency information about a few key metrics • High frequency high granularity profiling • Tailing logs
  • 4. Céin Fath? Why do we want monitoring?
  • 5. Why: Alerting We want to know when things go wrong We want to know when things aren’t quite right We want to know in advance of problems
  • 6. Why: Debugging When something is up, you need to debug. You want to go from high-level problem, and drill down to what’s causing it. Need to be able to reason about things. Sometimes want to go from code back to metrics.
  • 7. Why: Trending How the various bits of a system are being used. For example, how many static requests per dynamic request? How many sessions active at once? How many hit a certain corner case? For some stats, also want to know how they change over time for capacity planning and design discussions.
  • 8. A different approach What if we instrumented everything? • RPCs • Interfaces between subsystems • Business logic • Every time you’d log something
  • 9. A different approach What if we monitored systems and subsystems to know how everything is generally doing? What if each developer didn’t have to add instrumentation themselves - what if every library came with it built-in? Could focus on developing, and still get good metrics!
  • 10. A different approach Some things to monitor: ● Client and server qps/errors/latency ● Every log message should be a metric ● Every failure should be a metric ● Threadpool/queue size, in progress, latency ● Business logic inputs and outputs ● Data sizes in/out ● Process cpu/ram/language internals (e.g. GC) ● Blackbox and end-to-end monitoring/heartbeats ● Batch job: last success time, duration, records processed
  • 11. That’s a lot of metrics That could be tens of thousands of codepoints across an entire system. You’d need some way to make it easy to instrument all code, not just the externally facing parts of applications. You’d need something able to handle a million time series.
  • 12. Presenting Prometheus An open-source service monitoring system and time series database. Started in 2012, primarily developed in Soundcloud with committers also in Boxever and Docker. Publicly announced January 2015, many contributions and users since then.
  • 14. Presenting Prometheus • Client libraries that make instrumentation easy • Support for many languages: Python, Java, Go, Ruby… • Standalone server • Can handle over a million time series in one instance • No network dependencies • Written in Go, easy to run • Integrations • Machine, HAProxy, CloudWatch, Statsd, Collectd, JMX, Mesos, Consul, MySQL, cadvisor, etcd, django, elasticsearch...
  • 15. Presenting Prometheus • Dashboards • Promdash: Ruby on Rails web app • Console templates: More power for those who like checking things in • Expression browser: Ad-hoc queries • JSON interface: Roll your own • Alerts • Supports Pagerduty, Email, Pushover
  • 17. Let’s Talk Python First version of client hacked together in October 2014 in an hour, mostly spent playing with meta-programming. First official version 0.0.1 released February 2015. Version 0.0.8 released April 2015.
  • 19. The Basics Two fundamental data types. Counter: It only goes up (and resets), counts something Gauge: It goes up and down, snapshot of state
  • 20. Flow with your code Instrumentation should be an integral part of your code, similar to logging. Don’t segregate out instrumentation to a separate class, file or module - have it everywhere. Instrumentation that makes this easy helps.
  • 21. Counting exceptions in a method from instrumentation import * EX = 0 metrics.register(Counter.create( ‘method_ex’, lambda: EX)) def my_method(): try: pass # Your code here except: global EX EX += 1 raise
  • 22. Counting exceptions: Prometheus from prometheus_client import Counter EX = Counter( ‘mymethod_exceptions_total’, 'Exceptions in mymethod’) @EX.count_exceptions() def my_method(): pass
  • 23. Brian’s Pet Peeve #1 Wrapping instrumentation libraries to make them “simpler” Tend to confuse abstractions, encourage bad practices and make it difficult to write correct and useable instrumentation e.g. Prometheus values are doubles, if you only allow ints then end user has to do math to convert back to seconds
  • 24. Speaking of Correct Instrumentation It’s better to have math done in the server, not the client Many instrumentation systems are exponentially decaying Do you really want to do calculus during an outage? Prometheus has monotonic counters Races and missed scrapers don’t lose data
  • 25. Counting exceptions: Context Manager from prometheus_client import Counter EX = Counter( ‘method_exceptions’, 'Exceptions in my method’) def my_method(): with EX.count_exceptions(): pass
  • 26. Decorator and Context Manager In Python 3 have contextlib.ContextDecorator. contextdecorator on PyPi for Python 2 - but couldn’t get it to work. Ended up hand coding it, an object that supports __enter__, __exit__ and __call__.
  • 27. Counter Basics requests = Counter( ‘requests_total’, ‘Total number of requests’) requests.inc() requests.inc(42)
  • 28. Brian’s Pet Peeve #2 Instrumentation that you need to read the code to understand e.g. “Total number of requests” - what type of request? Make the names such that a random person not intimately familiar with the system would have a good chance at guessing what it means. Specify your units.
  • 29. Gauge Basics INPROGRESS = Gauge( ‘http_requests_inprogress’, ‘Total number of HTTP requests ongoing’) def my_method: INPROGRESS.inc() try: pass # Your code here finally: INPROGRESS.dec()
  • 30. Gauge Basics: Convenience all the way INPROGRESS = Gauge( ‘inprogress_requests’, ‘Total number of requests ongoing’) @INPROGRESS.track_inprogress() def my_method: pass # Your code here
  • 31. More Gauges Many other ways to use a Gauge: MYGAUGE.set(42) MYGAUGE.set_to_current_time() MYGAUGE.set_function(lambda: len(some_dict))
  • 32. What about time? Useful to measure how long things take. Two options in Prometheus: Summary and Histogram. Summary is cheap and simple. Histogram can be expensive and is more granular.
  • 33. Time a method LATENCY = Summary(‘request_latency_seconds’, ‘Request latency in seconds’) @LATENCY.time() def process_request(): pass Histogram is the same. There’s also a context manager.
  • 34. How to get the data out: Summary Summary is two counters, one for the number of requests and the other for the amount of time spent. Calculating rate(), aggregate and divide to get latency. Not limited to time, can track e.g. bytes sent or objects processed using observe() method.
  • 35. How to get the data out: Histogram Histogram is counter per bucket (plus Summary counters). Get rate()s of buckets, aggregate and histogram_quantile() will estimate the quantile. Timeseries per bucket can add up fast.
  • 36. Python 3 support Wanted to add Python 3 support. Can the same code work in both?
  • 37. Python 3 support Simple stuff: try: from BaseHTTPServer import BaseHTTPRequestHandler except ImportError: from http.server import BaseHTTPRequestHandler iter vs. iteritems % vs. format
  • 38. Python 3 support: Unicode from __future__ import unicode_literals Use b‘’ for raw byte literals unicode_literals breaks __all__ on Python 2.x, munge with encode(‘ascii`) unicode = str for Python 3
  • 39. Data Model Tired of aggregating and alerting off metrics like http. responses.500.myserver.mydc.production? Time series have structured key-value pairs, e.g. http_responses_total{ response_code=”500”,instance=”myserver”, dc=”mydc”,env=”production”}
  • 40. Brian’s Pet Peeve #3 Munging structured data in a way that loses the structure Is it so much to ask for some escaping, or at least sanitizing any separators in the data?
  • 41. Labels For any metric: LATENCY = Summary(‘request_bytes_sent’, ‘Request bytes sent’, labels=[‘method’]) LATENCY.labels(“GET”).observe(42) Don’t go overboard!
  • 42. Getting The Data Out from prometheus_client import start_http_server start_http_server(8000) Easy to produce output for e.g. Django. Can also use write_to_textfile() with Node Exporter Textfile Collector for machine-level cronjobs!
  • 43. Query Language Aggregation based on the key-value labels Arbitrarily complex math And all of this can be used in pre-computed rules and alerts
  • 44. Query Language: Example Column families with the 10 highest read rates per second topk(10, sum by(job, keyspace, columnfamily) ( rate(cassandra_columnfamily_readlatency[5m]) ) )
  • 45. The Live Demo please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work please work
  • 46. Client Libraries: In and Out Client libraries don’t tie you to Prometheus instrumentation Custom collectors allow pulling data from other instrumentation systems into Prometheus client library Similarly, can pull data out of client library and expose as you wish