SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Brian Brazil
Founder
Your data is in Prometheus,
now what?
Who am I?
Engineer passionate about running software reliably in production.
● TCD CS Degree
● Google SRE for 7 years, working on high-scale reliable systems such as
Adwords, Adsense, Ad Exchange, Billing, Database
● Boxever TL Systems&Infrastructure, applied processes and technology to let
allow company to scale and reduce operational load
● Contributor to many open source projects, including Prometheus, Ansible,
Python, Aurora and Zookeeper.
● Founder of Robust Perception, making scalability and efficiency available to
everyone
Why monitor?
● Know when things go wrong
○ To call in a human to prevent a business-level issue, or prevent an issue in advance
● Be able to debug and gain insight
● Trending to see changes over time, and drive technical/business decisions
● To feed into other systems/processes (e.g. QA, security, automation)
What is a time series?
The value of something tracked over time.
For example temperature once a day, or requests to your API once a minute.
For example
my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM
What is a time series database?
Simply put, a database for storing multiple time series.
It can optimise for the fact that many consecutive points in a time series will be
requested at once, and patterns in the data also aid compression.
May also have some in-built aggregation and downsampling functions.
Also often used to refer to any system that can store data with a time-element,
such as event logs (e.g. ELK stack). Not really a time series database in this form.
Monitoring?
Lots of information about your system could be thought of in terms of time series,
for example:
How many users are currently online?
How much disk is machine X using?
What's the average request latency?
How many requests are failing?
Prometheus
Inspired by Google’s Borgmon monitoring system.
Started in 2012 by ex-Googlers working in Soundcloud as an open source project,
mainly written in Go. Publically launched in early 2015, and continues to be
independent of any one company.
Over 100 companies have started relying on it since then.
Lots of Monitoring Systems Look Like This
Services have Internals
Monitor the Internals
Monitor as a Service, not as Machines
Put another way...
Prometheus is about pulling in lots of information about lots of instances of a
service.
This is all tracked and stored over time, allowing for analysis.
Prometheus at it's core is a time series database.
Distinguishing Timeseries
Prometheus doesn’t use dotted.strings like metric.fintech.dublin.
Multi-dimensional labels instead like metric{industry=”fintech”,city=”dublin”}
Can aggregate, cut, and slice along them.
Can come from instrumentation, or be added based on the service you are
monitoring.
Example: Labels from Node Exporter
Instrumentation made easy
Prometheus clients don’t just marshall data.
They take care of the nitty gritty details like concurrency and state tracking.
We take advantage of the strengths of each language.
In Python for example that means context managers and decorators.
Python Instrumentation Example
pip install prometheus_client
from prometheus_client import Summary, start_http_server
REQUEST_DURATION = Summary('request_duration_seconds',
'Request duration in seconds')
@REQUEST_DURATION.time()
def my_handler(request):
pass # Your code here
start_http_server(8000)
Exceptional Circumstances In Progress
from prometheus_client import Counter, Gauge
EXCEPTIONS = Counter('exceptions_total', 'Total exceptions')
IN_PROGRESS = Gauge('inprogress_requests', 'In progress')
@EXCEPTIONS.count_exceptions()
@IN_PROGRESS.track_inprogress()
def my_handler(request):
pass // Your code here
Adding Dimensions (No Evil Twins Please)
from prometheus_client import Counter
REQUESTS = Counter('requests_total',
'Total requests', ['method'])
def my_handler(request):
REQUESTS.labels(request.method).inc()
pass // Your code here
Inclusive Monitoring
Don’t monitor just at the edges:
● Instrument client libraries
● Instrument server libraries (e.g. HTTP/RPC)
● Instrument business logic
Library authors get information about usage.
Application developers get monitoring of common components for free.
Dashboards and alerting can be provided out of the box, customised for your
organisation!
Open ecosystem
Prometheus client libraries don’t tie you into Prometheus.
For example the Python and Java clients can output to Graphite, with no need to
run any Prometheus components.
This means that you as a library author can instrument with Prometheus, and your
users with just a few lines of code can output to whatever monitoring system they
want. No need for users to worry about a new standard.
This can be done incrementally.
It goes the other way too
It’s unlikely that everyone is going to switch to Prometheus all at once, so we’ve
integrations that can take in data from other monitoring systems and make it
useful.
Graphite, Collectd, Statsd, SNMP, JMX, Dropwizard, AWS Cloudwatch, New
Relic, Rsyslog and Scollector (Bosun) are some examples.
Prometheus and its client libraries can act a clearinghouse to convert between
monitoring systems.
For example Zalando’s Zmon uses the Python client to parse Prometheus metrics
from directly instrumented binaries such as the Node exporter.
Counters
One type of time series Prometheus has is a Counter
It only goes up
If we miss a scrape, it's still possible to see what the increase was during that
period - lose granularity, not data. Other systems push diffs, not as reliable.
The rate() function also handles resets
Powerful Query Language
Can multiply, add, aggregate, join, predict, take quantiles across many metrics in
the same query. Can evaluate right now, and graph back in time.
Answer questions like:
● What’s the 95th percentile latency in the European datacenter?
● How full will the disks be in 4 hours?
● Which services are the top 5 users of CPU?
Can alert based on any query.
Example: Top 5 Docker images by CPU
topk(5,
sum by (image)(
rate(container_cpu_usage_seconds_total{
id=~"/system.slice/docker.*"}[5m]
)
)
)
Manageable and Reliable
Core Prometheus server is a single binary.
Doesn’t depend on Zookeeper, Consul, Cassandra, Hadoop or the Internet.
Only requires local disk (SSD recommended). No potential for cascading failure.
Pull based, so easy to on run a workstation for testing and rogue servers can’t
push bad metrics.
State not event based, so network blips only lose resolution - not data.
Advanced service discovery finds what to monitor.
Scalable
Prometheus is easy to run, can give one to each team in each datacenter.
Federation allows pulling key metrics from other Prometheus servers.
When one job is too big for a single Prometheus server, can use
sharding+federation to scale out. Needed with thousands of machines.
Easy to integrate with
Many existing integrations: Java, JMX, Python, Go, Ruby, .Net, Machine,
Cloudwatch, EC2, MySQL, PostgreSQL, Haskell, Bash, Node.js, SNMP, Consul,
HAProxy, Mesos, Bind, CouchDB, Django, Mtail, Heka, Memcached, RabbitMQ,
Redis, RethinkDB, Rsyslog, Meteor.js, Minecraft...
Graphite, Statsd, Collectd, Scollector, Munin, Nagios integrations aid transition.
It’s so easy, most of the above were written without the core team even knowing
about them!
Dashboards
What to do with time series?
With potentially millions of time series across your system, can be difficult to know
what is and isn't useful.
What approaches help manage this complexity?
How do you avoid getting caught out?
#1: Choose your key statistics
Users don't care that one of your machines is short of CPU.
Users care if the service is slow or throwing errors.
For your primary dashboards focus on high-level metrics that directly impact
users.
#2: Use aggregations
Think about services, not machines.
Once you have more than a handful of machines, you should treat them as an
amorphous blob.
Looking at the key statistics is easier for 10 services, than 10 services each of
which is on 10 machines
Once you have isolated a problem to one service, then can see if one machine is
the problem
#3: Avoid the Wall of Graphs
Dashboards tend to grow without bound. Worst I've seen was 600 graphs.
It might look impressive, but humans can't deal with that much data at once.
(and they take forever to load)
Your services will have a rough tree structure, have a dashboard per service and
talk the tree from the top when you have a problem. Similarly for each service,
have dashboards per subsystem.
Rule of Thumb: Limit of 5 graphs per dashboard, and 5 lines per graph.
#4: Client-side quantiles aren't aggregatable
Many instrumentation systems calculate quantiles/percentiles inside each process,
and export it to the TSDB.
It is not statistically possible to aggregate these.
If you want meaningful quantiles, you should track histogram buckets in each
process, aggregate those in your monitoring system and then calculate the
quantile.
This is done using histogram_quantile() and rate() in Prometheus.
#5: Averages are easy to reason about
Q: Say you have a service with two backends. If 95th percentile latency goes up
due to one of the backends, what will you see in 95th percentile latency for that
backend?
A: ?
#5: Averages are easy to reason about
Q: Say you have a service with two backends. If 95th percentile latency goes up
due to one of the backends, what will you see in 95th percentile latency for that
backend?
A: It depends, could be no change. If the latencies are strongly correlated for each
request across the backends, you'll see the same latency bump.
This is tricky to reason about, especially in an emergency.
Averages don't have this problem, as they include all requests.
#6: Costs and Benefits
1s resolution monitoring of all metrics would be handy for debugging.
But is it ten time more valuable than 10s monitoring? And sixty times more
valuable than 60s monitoring?
Monitoring isn't free. It costs resources to run, and resources in the services being
monitored too. Quantiles and histograms can get expensive fast.
60s resolution is generally a good balance. Reserve 1s granularity or a literal
handful of key metrics.
#7: Nyquist-Shannon Sampling Theorem
To reconstruct a signal you need a resolution that's at least double it's frequency.
If you've got a 10s resolution time series, you can't reconstruct patterns that are
less than 20s long.
Higher frequency patterns can cause effects like aliasing, and mislead you.
If you suspect that there's something more to the data, try a higher resolution
temporarily or start profiling.
#8: Correlation is not Causation
Let's play a game called 2 4 6
I have a rule, and if you give me a sequence of numbers I'll tell you if it conforms
to the rule. Can you figure out the rule?
(If you've played this before, don't spoil it)
#8: Correlation is not Causation - Confirmation Bias
Humans are great at spotting patterns. Not all of them are actually there.
Always try to look for evidence that'd falsify your hypothesis.
If two metrics seem to correlate on a graph that doesn't mean that they're related.
They could be independent tasks running on the same schedule.
Or if you zoom out there plenty of times when one spikes but not the other.
Or one could be causing a slight increase in resource contention, pushing the
other over the edge.
#9 Have a way to deal with non-critical alerts
Most alerts don't justify waking up someone at night, but someone needs to look
at them sometime.
Often they're sent to a mailing list, where everyone promptly filters them away.
Better to have some form of ticketing system that'll assign a single owner for each
alert.
A daily email with all firing alerts that the oncall has to process can also work.
Final Word
It's easy to get in over your head when starting to appreciate to power of time
series monitoring.
The one thing I'd say is to Keep it Simple.
Just like anything else, more complex doesn't automatically mean better.
What do we do?
Robust Perception provides consulting and training to give you confidence in your
production service's ability to run efficiently and scale with your business.
We can help you:
● Decide if Prometheus is for you
● Manage your transition to Prometheus and resolve issues that arise
● With capacity planning, debugging, infrastructure etc.
We are proud to be among the core contributors to the Prometheus project.
Resources
Official Project Website: prometheus.io
Official Mailing List: prometheus-developers@googlegroups.com
Demo: demo.robustperception.io
Robust Perception Website: www.robustperception.io
Queries: prometheus@robustperception.io

Contenu connexe

Tendances

Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Brian Brazil
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with PrometheusQAware GmbH
 
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Brian Brazil
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialTim Vaillancourt
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheuskawamuray
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Grafana Labs
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an ExporterBrian Brazil
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusMatthias Grüter
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Brian Brazil
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Matthew Campbell
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Brian Brazil
 
Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Brian Brazil
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusFabian Reinartz
 

Tendances (20)

Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an Exporter
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to Prometheus
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 
Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
 

En vedette

Creating a personal narrative
Creating a personal narrativeCreating a personal narrative
Creating a personal narrativeEmily Kissner
 
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)Amazon Web Services
 
Rez gateway (RezOS) innovate the future
Rez gateway  (RezOS) innovate the futureRez gateway  (RezOS) innovate the future
Rez gateway (RezOS) innovate the futureindikaMaligaspe
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2Outlyer
 
Answers in environmental education @kaye
Answers in environmental education @kayeAnswers in environmental education @kaye
Answers in environmental education @kayeCee Saliendrez
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleSematext Group, Inc.
 
IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...
IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...
IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...OpenWhisk
 
Performance Pack
Performance PackPerformance Pack
Performance Packday
 
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data CenterMigrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data CenterJingnan Zhou
 
Lost in Translation - Blackhat Brazil 2014
Lost in Translation - Blackhat Brazil 2014Lost in Translation - Blackhat Brazil 2014
Lost in Translation - Blackhat Brazil 2014Rodrigo Montoro
 
WTF is Sensu and Monitoring
WTF is Sensu and MonitoringWTF is Sensu and Monitoring
WTF is Sensu and MonitoringToby Jackson
 
Powerupcloud - Customer Case Studies
Powerupcloud - Customer Case StudiesPowerupcloud - Customer Case Studies
Powerupcloud - Customer Case StudiesJonathyan Maas ☁
 
Incident Command: The far side of the edge
Incident Command: The far side of the edgeIncident Command: The far side of the edge
Incident Command: The far side of the edgeFastly
 

En vedette (20)

Creating a personal narrative
Creating a personal narrativeCreating a personal narrative
Creating a personal narrative
 
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
 
Dialogue Assessment
Dialogue AssessmentDialogue Assessment
Dialogue Assessment
 
Rez gateway (RezOS) innovate the future
Rez gateway  (RezOS) innovate the futureRez gateway  (RezOS) innovate the future
Rez gateway (RezOS) innovate the future
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2
 
Answers in environmental education @kaye
Answers in environmental education @kayeAnswers in environmental education @kaye
Answers in environmental education @kaye
 
OS17 Brochure
OS17 BrochureOS17 Brochure
OS17 Brochure
 
Build Stuff 2015 program
Build Stuff 2015 programBuild Stuff 2015 program
Build Stuff 2015 program
 
POS Malware: Is your Debit/Credit Transcations Secure?
POS Malware: Is your Debit/Credit Transcations Secure?POS Malware: Is your Debit/Credit Transcations Secure?
POS Malware: Is your Debit/Credit Transcations Secure?
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
 
An Introduction to event sourcing and CQRS
An Introduction to event sourcing and CQRSAn Introduction to event sourcing and CQRS
An Introduction to event sourcing and CQRS
 
IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...
IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...
IBM Bluemix OpenWhisk: IBM Seminar 2016, Tokyo, Japan: The Future of Cloud Pr...
 
Performance Pack
Performance PackPerformance Pack
Performance Pack
 
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data CenterMigrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
 
Lost in Translation - Blackhat Brazil 2014
Lost in Translation - Blackhat Brazil 2014Lost in Translation - Blackhat Brazil 2014
Lost in Translation - Blackhat Brazil 2014
 
Tic’s y enfermería
Tic’s y enfermeríaTic’s y enfermería
Tic’s y enfermería
 
WTF is Sensu and Monitoring
WTF is Sensu and MonitoringWTF is Sensu and Monitoring
WTF is Sensu and Monitoring
 
Mohamed Ahmed Abdelkhalek
Mohamed Ahmed AbdelkhalekMohamed Ahmed Abdelkhalek
Mohamed Ahmed Abdelkhalek
 
Powerupcloud - Customer Case Studies
Powerupcloud - Customer Case StudiesPowerupcloud - Customer Case Studies
Powerupcloud - Customer Case Studies
 
Incident Command: The far side of the edge
Incident Command: The far side of the edgeIncident Command: The far side of the edge
Incident Command: The far side of the edge
 

Similaire à Brian Brazil on Monitoring with Prometheus

Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Brian Brazil
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus OverviewBrian Brazil
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfKnoldus Inc.
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)Eran Levy
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxOpsTree solutions
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET Journal
 
Which watcher watches CloudWatch
Which watcher watches CloudWatch Which watcher watches CloudWatch
Which watcher watches CloudWatch David Lutz
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusBol.com Techlab
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusBol.com Techlab
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendNicolas Carlier
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityJulien Pivotto
 
Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?Aspire Systems
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018Christophe Rochefolle
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 

Similaire à Brian Brazil on Monitoring with Prometheus (20)

Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
 
Which watcher watches CloudWatch
Which watcher watches CloudWatch Which watcher watches CloudWatch
Which watcher watches CloudWatch
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observability
 
Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?Test automation: Are Enterprises ready to bite the bullet?
Test automation: Are Enterprises ready to bite the bullet?
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 

Plus de Brian Brazil

OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)Brian Brazil
 
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Brian Brazil
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Brian Brazil
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Brian Brazil
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Evolution of the Prometheus TSDB  (Percona Live Europe 2017)Evolution of the Prometheus TSDB  (Percona Live Europe 2017)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)Brian Brazil
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Brian Brazil
 
Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)Brian Brazil
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Brian Brazil
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Brian Brazil
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 
An Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQLAn Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQLBrian Brazil
 
Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Brian Brazil
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Brian Brazil
 

Plus de Brian Brazil (14)

OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
 
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)Anatomy of a Prometheus Client Library (PromCon 2018)
Anatomy of a Prometheus Client Library (PromCon 2018)
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Evolution of the Prometheus TSDB  (Percona Live Europe 2017)Evolution of the Prometheus TSDB  (Percona Live Europe 2017)
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
 
Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)Rule 110 for Prometheus (PromCon 2017)
Rule 110 for Prometheus (PromCon 2017)
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
An Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQLAn Exploration of the Formal Properties of PromQL
An Exploration of the Formal Properties of PromQL
 
Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)
 

Dernier

horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goahorny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goasexy call girls service in goa
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 

Dernier (20)

horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goahorny (9316020077 ) Goa  Call Girls Service by VIP Call Girls in Goa
horny (9316020077 ) Goa Call Girls Service by VIP Call Girls in Goa
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 

Brian Brazil on Monitoring with Prometheus

  • 1. Brian Brazil Founder Your data is in Prometheus, now what?
  • 2. Who am I? Engineer passionate about running software reliably in production. ● TCD CS Degree ● Google SRE for 7 years, working on high-scale reliable systems such as Adwords, Adsense, Ad Exchange, Billing, Database ● Boxever TL Systems&Infrastructure, applied processes and technology to let allow company to scale and reduce operational load ● Contributor to many open source projects, including Prometheus, Ansible, Python, Aurora and Zookeeper. ● Founder of Robust Perception, making scalability and efficiency available to everyone
  • 3. Why monitor? ● Know when things go wrong ○ To call in a human to prevent a business-level issue, or prevent an issue in advance ● Be able to debug and gain insight ● Trending to see changes over time, and drive technical/business decisions ● To feed into other systems/processes (e.g. QA, security, automation)
  • 4. What is a time series? The value of something tracked over time. For example temperature once a day, or requests to your API once a minute. For example my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM
  • 5. What is a time series database? Simply put, a database for storing multiple time series. It can optimise for the fact that many consecutive points in a time series will be requested at once, and patterns in the data also aid compression. May also have some in-built aggregation and downsampling functions. Also often used to refer to any system that can store data with a time-element, such as event logs (e.g. ELK stack). Not really a time series database in this form.
  • 6. Monitoring? Lots of information about your system could be thought of in terms of time series, for example: How many users are currently online? How much disk is machine X using? What's the average request latency? How many requests are failing?
  • 7. Prometheus Inspired by Google’s Borgmon monitoring system. Started in 2012 by ex-Googlers working in Soundcloud as an open source project, mainly written in Go. Publically launched in early 2015, and continues to be independent of any one company. Over 100 companies have started relying on it since then.
  • 8. Lots of Monitoring Systems Look Like This
  • 11. Monitor as a Service, not as Machines
  • 12. Put another way... Prometheus is about pulling in lots of information about lots of instances of a service. This is all tracked and stored over time, allowing for analysis. Prometheus at it's core is a time series database.
  • 13. Distinguishing Timeseries Prometheus doesn’t use dotted.strings like metric.fintech.dublin. Multi-dimensional labels instead like metric{industry=”fintech”,city=”dublin”} Can aggregate, cut, and slice along them. Can come from instrumentation, or be added based on the service you are monitoring.
  • 14. Example: Labels from Node Exporter
  • 15. Instrumentation made easy Prometheus clients don’t just marshall data. They take care of the nitty gritty details like concurrency and state tracking. We take advantage of the strengths of each language. In Python for example that means context managers and decorators.
  • 16. Python Instrumentation Example pip install prometheus_client from prometheus_client import Summary, start_http_server REQUEST_DURATION = Summary('request_duration_seconds', 'Request duration in seconds') @REQUEST_DURATION.time() def my_handler(request): pass # Your code here start_http_server(8000)
  • 17. Exceptional Circumstances In Progress from prometheus_client import Counter, Gauge EXCEPTIONS = Counter('exceptions_total', 'Total exceptions') IN_PROGRESS = Gauge('inprogress_requests', 'In progress') @EXCEPTIONS.count_exceptions() @IN_PROGRESS.track_inprogress() def my_handler(request): pass // Your code here
  • 18. Adding Dimensions (No Evil Twins Please) from prometheus_client import Counter REQUESTS = Counter('requests_total', 'Total requests', ['method']) def my_handler(request): REQUESTS.labels(request.method).inc() pass // Your code here
  • 19. Inclusive Monitoring Don’t monitor just at the edges: ● Instrument client libraries ● Instrument server libraries (e.g. HTTP/RPC) ● Instrument business logic Library authors get information about usage. Application developers get monitoring of common components for free. Dashboards and alerting can be provided out of the box, customised for your organisation!
  • 20. Open ecosystem Prometheus client libraries don’t tie you into Prometheus. For example the Python and Java clients can output to Graphite, with no need to run any Prometheus components. This means that you as a library author can instrument with Prometheus, and your users with just a few lines of code can output to whatever monitoring system they want. No need for users to worry about a new standard. This can be done incrementally.
  • 21. It goes the other way too It’s unlikely that everyone is going to switch to Prometheus all at once, so we’ve integrations that can take in data from other monitoring systems and make it useful. Graphite, Collectd, Statsd, SNMP, JMX, Dropwizard, AWS Cloudwatch, New Relic, Rsyslog and Scollector (Bosun) are some examples. Prometheus and its client libraries can act a clearinghouse to convert between monitoring systems. For example Zalando’s Zmon uses the Python client to parse Prometheus metrics from directly instrumented binaries such as the Node exporter.
  • 22. Counters One type of time series Prometheus has is a Counter It only goes up If we miss a scrape, it's still possible to see what the increase was during that period - lose granularity, not data. Other systems push diffs, not as reliable. The rate() function also handles resets
  • 23. Powerful Query Language Can multiply, add, aggregate, join, predict, take quantiles across many metrics in the same query. Can evaluate right now, and graph back in time. Answer questions like: ● What’s the 95th percentile latency in the European datacenter? ● How full will the disks be in 4 hours? ● Which services are the top 5 users of CPU? Can alert based on any query.
  • 24. Example: Top 5 Docker images by CPU topk(5, sum by (image)( rate(container_cpu_usage_seconds_total{ id=~"/system.slice/docker.*"}[5m] ) ) )
  • 25. Manageable and Reliable Core Prometheus server is a single binary. Doesn’t depend on Zookeeper, Consul, Cassandra, Hadoop or the Internet. Only requires local disk (SSD recommended). No potential for cascading failure. Pull based, so easy to on run a workstation for testing and rogue servers can’t push bad metrics. State not event based, so network blips only lose resolution - not data. Advanced service discovery finds what to monitor.
  • 26. Scalable Prometheus is easy to run, can give one to each team in each datacenter. Federation allows pulling key metrics from other Prometheus servers. When one job is too big for a single Prometheus server, can use sharding+federation to scale out. Needed with thousands of machines.
  • 27. Easy to integrate with Many existing integrations: Java, JMX, Python, Go, Ruby, .Net, Machine, Cloudwatch, EC2, MySQL, PostgreSQL, Haskell, Bash, Node.js, SNMP, Consul, HAProxy, Mesos, Bind, CouchDB, Django, Mtail, Heka, Memcached, RabbitMQ, Redis, RethinkDB, Rsyslog, Meteor.js, Minecraft... Graphite, Statsd, Collectd, Scollector, Munin, Nagios integrations aid transition. It’s so easy, most of the above were written without the core team even knowing about them!
  • 29. What to do with time series? With potentially millions of time series across your system, can be difficult to know what is and isn't useful. What approaches help manage this complexity? How do you avoid getting caught out?
  • 30. #1: Choose your key statistics Users don't care that one of your machines is short of CPU. Users care if the service is slow or throwing errors. For your primary dashboards focus on high-level metrics that directly impact users.
  • 31. #2: Use aggregations Think about services, not machines. Once you have more than a handful of machines, you should treat them as an amorphous blob. Looking at the key statistics is easier for 10 services, than 10 services each of which is on 10 machines Once you have isolated a problem to one service, then can see if one machine is the problem
  • 32. #3: Avoid the Wall of Graphs Dashboards tend to grow without bound. Worst I've seen was 600 graphs. It might look impressive, but humans can't deal with that much data at once. (and they take forever to load) Your services will have a rough tree structure, have a dashboard per service and talk the tree from the top when you have a problem. Similarly for each service, have dashboards per subsystem. Rule of Thumb: Limit of 5 graphs per dashboard, and 5 lines per graph.
  • 33. #4: Client-side quantiles aren't aggregatable Many instrumentation systems calculate quantiles/percentiles inside each process, and export it to the TSDB. It is not statistically possible to aggregate these. If you want meaningful quantiles, you should track histogram buckets in each process, aggregate those in your monitoring system and then calculate the quantile. This is done using histogram_quantile() and rate() in Prometheus.
  • 34. #5: Averages are easy to reason about Q: Say you have a service with two backends. If 95th percentile latency goes up due to one of the backends, what will you see in 95th percentile latency for that backend? A: ?
  • 35. #5: Averages are easy to reason about Q: Say you have a service with two backends. If 95th percentile latency goes up due to one of the backends, what will you see in 95th percentile latency for that backend? A: It depends, could be no change. If the latencies are strongly correlated for each request across the backends, you'll see the same latency bump. This is tricky to reason about, especially in an emergency. Averages don't have this problem, as they include all requests.
  • 36. #6: Costs and Benefits 1s resolution monitoring of all metrics would be handy for debugging. But is it ten time more valuable than 10s monitoring? And sixty times more valuable than 60s monitoring? Monitoring isn't free. It costs resources to run, and resources in the services being monitored too. Quantiles and histograms can get expensive fast. 60s resolution is generally a good balance. Reserve 1s granularity or a literal handful of key metrics.
  • 37. #7: Nyquist-Shannon Sampling Theorem To reconstruct a signal you need a resolution that's at least double it's frequency. If you've got a 10s resolution time series, you can't reconstruct patterns that are less than 20s long. Higher frequency patterns can cause effects like aliasing, and mislead you. If you suspect that there's something more to the data, try a higher resolution temporarily or start profiling.
  • 38. #8: Correlation is not Causation Let's play a game called 2 4 6 I have a rule, and if you give me a sequence of numbers I'll tell you if it conforms to the rule. Can you figure out the rule? (If you've played this before, don't spoil it)
  • 39. #8: Correlation is not Causation - Confirmation Bias Humans are great at spotting patterns. Not all of them are actually there. Always try to look for evidence that'd falsify your hypothesis. If two metrics seem to correlate on a graph that doesn't mean that they're related. They could be independent tasks running on the same schedule. Or if you zoom out there plenty of times when one spikes but not the other. Or one could be causing a slight increase in resource contention, pushing the other over the edge.
  • 40. #9 Have a way to deal with non-critical alerts Most alerts don't justify waking up someone at night, but someone needs to look at them sometime. Often they're sent to a mailing list, where everyone promptly filters them away. Better to have some form of ticketing system that'll assign a single owner for each alert. A daily email with all firing alerts that the oncall has to process can also work.
  • 41. Final Word It's easy to get in over your head when starting to appreciate to power of time series monitoring. The one thing I'd say is to Keep it Simple. Just like anything else, more complex doesn't automatically mean better.
  • 42. What do we do? Robust Perception provides consulting and training to give you confidence in your production service's ability to run efficiently and scale with your business. We can help you: ● Decide if Prometheus is for you ● Manage your transition to Prometheus and resolve issues that arise ● With capacity planning, debugging, infrastructure etc. We are proud to be among the core contributors to the Prometheus project.
  • 43. Resources Official Project Website: prometheus.io Official Mailing List: prometheus-developers@googlegroups.com Demo: demo.robustperception.io Robust Perception Website: www.robustperception.io Queries: prometheus@robustperception.io