SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Breaking Prometheus
Scaling prometheus
to a million machines
About Me
• Technical Lead Digital Ocean

• Microservices in GO Book
• Lives in Bangkok

Dark Days
• Graphite
• InfluxDB
• OpenTSDB (*sigh*)
Manual Prometheus
Manual Issues
• Lots of Prometheus servers
• Mismatched versions
• New machines = update config
• Missing matches
Consul + Prometheus = Peanut butter + Jelly
Datacenter wide
10,000s of nodes
Stage 2:
Prometheus Per Region
Prometheus
Grafana
Prometheus Prometheus Prometheus
Prometheus
Prometheus
I/O Problems
• Modify retention windows
• Drop metrics from node_exporter
• Larger and larger machines
Tuning options
storage.local.retention
storage.local.memory-chunks
storage.local.max-chunks-to-persist
storage.local.checkpoint-interval
storage.local.checkpoint-dirty-series-limit
Sharding
rate(node_cpu{instance=“server12345.digitalocean.com:9100”}[2m])
Shard on red
Prometheus Proxy
Alert Manager
Shard Problems
• Shard redistribution
• Over provisioning
• Data loss
• Limited data windows
Kafka Bus
Million VMs
Digital Ocean
Agent
• Installable Metrics Agent
• Authenticated Push Gateway
• “Reverse node exporter”
Query Api
• Customer facing API
• GRPC / Json
• Authenticated per customer
• Prometheus queries
https://github.com/digitalocean/vulcan
Introducing Vulcan
Vulcan
• Prometheus Api
• Cassandra storage
• Kafka incoming
• Standard Scrapers
Cassandra Store
Downsampling
• In memory shared promethues
• Driven from data in kafka
• Reusable for Alerting
Downsampling / Alerting
Future
• New Scrape Sources (Kafka)
• Per series expiry TTLs
• Plugin Storage Model (In Memory)
• Alerting High Availability
Questions?
We’re Hiring!
Matthew Campbell
hyper@hyperworks.nu
@kanwisher
github.com/mattkanwisher

Contenu connexe

Tendances

Tendances (20)

Kafka monitoring and metrics
Kafka monitoring and metricsKafka monitoring and metrics
Kafka monitoring and metrics
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Service Discovery in Prometheus
Service Discovery in PrometheusService Discovery in Prometheus
Service Discovery in Prometheus
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
What is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to PrometheusWhat is your application doing right now? An introduction to Prometheus
What is your application doing right now? An introduction to Prometheus
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
PostgreSQL Terminology
PostgreSQL TerminologyPostgreSQL Terminology
PostgreSQL Terminology
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
Streaming huge databases using logical decoding
Streaming huge databases using logical decodingStreaming huge databases using logical decoding
Streaming huge databases using logical decoding
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
 

Similaire à Breaking Prometheus (Promcon Berlin '16)

Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
Nitin Mehta
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Tupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FBTupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FB
Docker, Inc.
 

Similaire à Breaking Prometheus (Promcon Berlin '16) (20)

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
Square Peg Round Hole: Serverless Solutions For Non-Serverless ProblemsSquare Peg Round Hole: Serverless Solutions For Non-Serverless Problems
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
 
Running Legacy Applications with Containers
Running Legacy Applications with ContainersRunning Legacy Applications with Containers
Running Legacy Applications with Containers
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Server 2016 sneak peek
Server 2016 sneak peekServer 2016 sneak peek
Server 2016 sneak peek
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Scalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureScalability strategies for cloud based system architecture
Scalability strategies for cloud based system architecture
 
Cortex: Prometheus as a Service, One Year On
Cortex: Prometheus as a Service, One Year OnCortex: Prometheus as a Service, One Year On
Cortex: Prometheus as a Service, One Year On
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on Docker
 
Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 
.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7
 
Tupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FBTupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FB
 

Plus de Matthew Campbell

Plus de Matthew Campbell (9)

Practical Plasma: Gaming. Upbit Developers conference 2018
Practical Plasma: Gaming. Upbit Developers conference 2018Practical Plasma: Gaming. Upbit Developers conference 2018
Practical Plasma: Gaming. Upbit Developers conference 2018
 
Microservices Python bangkok
Microservices Python bangkokMicroservices Python bangkok
Microservices Python bangkok
 
Intro to microservices GopherDay Taipei '17
Intro to microservices  GopherDay Taipei '17Intro to microservices  GopherDay Taipei '17
Intro to microservices GopherDay Taipei '17
 
Distributed Timeseries Database In Go (gophercon India 17)
Distributed Timeseries Database In Go (gophercon India 17)Distributed Timeseries Database In Go (gophercon India 17)
Distributed Timeseries Database In Go (gophercon India 17)
 
DigitalOcean Microservices Talk Rocket Internet Conf '16
DigitalOcean Microservices Talk Rocket Internet Conf '16DigitalOcean Microservices Talk Rocket Internet Conf '16
DigitalOcean Microservices Talk Rocket Internet Conf '16
 
Cloud in your Cloud
Cloud in your CloudCloud in your Cloud
Cloud in your Cloud
 
presentation-chaos-monkey
presentation-chaos-monkeypresentation-chaos-monkey
presentation-chaos-monkey
 
Making Wallstreet talk with GO (GO India Conference 2015)
Making Wallstreet talk with GO (GO India Conference 2015)Making Wallstreet talk with GO (GO India Conference 2015)
Making Wallstreet talk with GO (GO India Conference 2015)
 
Intro to GO (Bangkok Launchpad 2014)
Intro to GO (Bangkok Launchpad 2014)Intro to GO (Bangkok Launchpad 2014)
Intro to GO (Bangkok Launchpad 2014)
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Breaking Prometheus (Promcon Berlin '16)