SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Project Frankenstein
A multi-tenant, horizontally scalable Prometheus as a Service
Tom Wilkie (& Julius Volz)
Weaveworks, August 2016
“the best way to visualise, manage & monitor your cloud
native application”
Design
why not just run my own Prometheus?
• the as-a-service bit provides authentication and
access control
• virtually infinite retention; all the state is managed
for you, by us
• provide a different story around durability, HA and
scalability
• (eventually) better query performance, especially
for long queries
requirements:
1. API compatible with Prometheus
2. easy to operate and manage
3. tens of thousands of users, tens of
millions samples/s
4. cost effective to run
5. reuse as much of Prometheus as possible
… so we can sell it
Aim: build proof of concept as
quickly as possible
16/06 started design doc
22/06 circulated on list
22/06 initial commit
26/07 launch jobs
25/08 give talk!
http://goo.gl/prdUYV
Retriever
scraping
your jobs
Your DC
Weave Cloud
Frontend,
Authenticator
Distributor
Ingester
Distributor…
IngesterIngester
DynamoDB S3
Retriever
/bin/prometheus -retrieval-only -storage.remote.generic-url=...
Does scraping and relabelling.
Is a vanilla Prometheus plus:
• Brian Brazil’s generic write PR (#1487)
• Some modification to prevent local storage + indexing
• Uses consistent hashing to assign
timeseries to Ingesters
• Input to hash is (user ID, metric
name)
• Tokens stored in Consul
• Also currently handles queries
Distributor
http://goo.gl/U9u1U2
• Heavily modified MemorySeriesStorage
• Use same chunk format as Prometheus
• Keeps everything in memory (for up to an hour)
• Also stores in memory inverted index for queries
• Flushes chunks to S3 and indexes them in DynamoDB
Ingester
External inverted index maintained in DynamoDB, chunks stored in S3
Item in DynamoDB looks like:
{
hash key: “{user ID}:{metric name}:{hour}”,
range key: “{label name}:{label value}:{chunk ID}”,
metric: ...,
from, through: ...,
ID: ...,
}
DynamoDB S3
Evaluation
The Good
• It works! And in ~2
months.
• Seems pretty scalable,
handling two clusters right
now
• Query performance better
than expected
The Ugly: the code…
The Bad
• Hashing scheme means
can’t do queries that don’t
involve metric names.
• Possible to hotspot an
ingester
Demo
Lots left to do…
Features:
• Recording rules
• Alerting & Alertmanager
Reliability:
• Replication between
ingesters, commit log etc
• Ingestor lifecycle
• Separate query service?
Performance:
• Query parallelisation
• Background chunk
coalescing
Code:
• Code cleanup
• Upstream appropriate
changes
Questions?
Try it out!
Email help@weave.works for
instructions and to get on white list
https://github.com/tomwilkie/prometheus

Contenu connexe

Tendances

Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaAvinash Ramineni
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3PoguttuezhiniVP
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsMarco Pracucci
 
Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.HungWei Chiu
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Amazon Web Services
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
 
PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?Ohyama Masanori
 
Stream Computing & Analytics at Uber
Stream Computing & Analytics at UberStream Computing & Analytics at Uber
Stream Computing & Analytics at UberSudhir Tonse
 
Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...
Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...
Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...NGINX, Inc.
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...Mydbops
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at UberDataWorks Summit
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyStreamNative
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
 
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)confluent
 

Tendances (20)

PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 
Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
 
PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?
 
Stream Computing & Analytics at Uber
Stream Computing & Analytics at UberStream Computing & Analytics at Uber
Stream Computing & Analytics at Uber
 
Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...
Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...
Install and Configure NGINX Unit, the Universal Application, Web, and Proxy S...
 
MySQL Cheat Sheet
MySQL Cheat SheetMySQL Cheat Sheet
MySQL Cheat Sheet
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik Ramasamy
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)
 

Similaire à Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...Weaveworks
 
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a ServiceWeave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a ServiceWeaveworks
 
Scaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosScaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosThomas Riley
 
Surrogate dependencies (in node js) v1.0
Surrogate dependencies  (in node js)  v1.0Surrogate dependencies  (in node js)  v1.0
Surrogate dependencies (in node js) v1.0Dinis Cruz
 
Tech io spa_angularjs_20130814_v0.9.5
Tech io spa_angularjs_20130814_v0.9.5Tech io spa_angularjs_20130814_v0.9.5
Tech io spa_angularjs_20130814_v0.9.5Ganesh Kondal
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - TalkMatthias Noback
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceAntonio García-Domínguez
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basicsJuraj Hantak
 
Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusGrafana Labs
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Laure Vergeron
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Derek Jacoby
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Brian Brazil
 
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDB
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDBWebinar: Enabling Microservices with Containers, Orchestration, and MongoDB
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDBMongoDB
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAnswerModules
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community EngineCommunity Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community enginemathraq
 
how to mesure web performance metrics
how to mesure web performance metricshow to mesure web performance metrics
how to mesure web performance metricsMarc Cortinas Val
 
Microservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL MeetupMicroservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL MeetupBoaz Ziniman
 

Similaire à Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service (20)

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
 
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a ServiceWeave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
 
Scaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosScaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with Thanos
 
Surrogate dependencies (in node js) v1.0
Surrogate dependencies  (in node js)  v1.0Surrogate dependencies  (in node js)  v1.0
Surrogate dependencies (in node js) v1.0
 
Tech io spa_angularjs_20130814_v0.9.5
Tech io spa_angularjs_20130814_v0.9.5Tech io spa_angularjs_20130814_v0.9.5
Tech io spa_angularjs_20130814_v0.9.5
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - Talk
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available Prometheus
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDB
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDBWebinar: Enabling Microservices with Containers, Orchestration, and MongoDB
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDB
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuite
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
how to mesure web performance metrics
how to mesure web performance metricshow to mesure web performance metrics
how to mesure web performance metrics
 
Microservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL MeetupMicroservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL Meetup
 

Plus de Weaveworks

Weave AI Controllers (Weave GitOps Office Hours)
Weave AI Controllers (Weave GitOps Office Hours)Weave AI Controllers (Weave GitOps Office Hours)
Weave AI Controllers (Weave GitOps Office Hours)Weaveworks
 
Flamingo: Expand ArgoCD with Flux (Office Hours)
Flamingo: Expand ArgoCD with Flux (Office Hours)Flamingo: Expand ArgoCD with Flux (Office Hours)
Flamingo: Expand ArgoCD with Flux (Office Hours)Weaveworks
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWeaveworks
 
Six Signs You Need Platform Engineering
Six Signs You Need Platform EngineeringSix Signs You Need Platform Engineering
Six Signs You Need Platform EngineeringWeaveworks
 
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfSRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfWeaveworks
 
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWebinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWeaveworks
 
Flux Beyond Git Harnessing the Power of OCI
Flux Beyond Git Harnessing the Power of OCIFlux Beyond Git Harnessing the Power of OCI
Flux Beyond Git Harnessing the Power of OCIWeaveworks
 
Automated Provisioning, Management & Cost Control for Kubernetes Clusters
Automated Provisioning, Management & Cost Control for Kubernetes ClustersAutomated Provisioning, Management & Cost Control for Kubernetes Clusters
Automated Provisioning, Management & Cost Control for Kubernetes ClustersWeaveworks
 
How to Avoid Kubernetes Multi-tenancy Catastrophes
How to Avoid Kubernetes Multi-tenancy CatastrophesHow to Avoid Kubernetes Multi-tenancy Catastrophes
How to Avoid Kubernetes Multi-tenancy CatastrophesWeaveworks
 
Building internal developer platform with EKS and GitOps
Building internal developer platform with EKS and GitOpsBuilding internal developer platform with EKS and GitOps
Building internal developer platform with EKS and GitOpsWeaveworks
 
GitOps Testing in Kubernetes with Flux and Testkube.pdf
GitOps Testing in Kubernetes with Flux and Testkube.pdfGitOps Testing in Kubernetes with Flux and Testkube.pdf
GitOps Testing in Kubernetes with Flux and Testkube.pdfWeaveworks
 
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdIntro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdWeaveworks
 
Implementing Flux for Scale with Soft Multi-tenancy
Implementing Flux for Scale with Soft Multi-tenancyImplementing Flux for Scale with Soft Multi-tenancy
Implementing Flux for Scale with Soft Multi-tenancyWeaveworks
 
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKS
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKSAccelerating Hybrid Multistage Delivery with Weave GitOps on EKS
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKSWeaveworks
 
The Story of Flux Reaching Graduation in the CNCF
The Story of Flux Reaching Graduation in the CNCFThe Story of Flux Reaching Graduation in the CNCF
The Story of Flux Reaching Graduation in the CNCFWeaveworks
 
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...Weaveworks
 
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...Weaveworks
 
Flux’s Security & Scalability with OCI & Helm Slides.pdf
Flux’s Security & Scalability with OCI & Helm Slides.pdfFlux’s Security & Scalability with OCI & Helm Slides.pdf
Flux’s Security & Scalability with OCI & Helm Slides.pdfWeaveworks
 
Flux Security & Scalability using VS Code GitOps Extension
Flux Security & Scalability using VS Code GitOps Extension Flux Security & Scalability using VS Code GitOps Extension
Flux Security & Scalability using VS Code GitOps Extension Weaveworks
 
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOpsDeploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOpsWeaveworks
 

Plus de Weaveworks (20)

Weave AI Controllers (Weave GitOps Office Hours)
Weave AI Controllers (Weave GitOps Office Hours)Weave AI Controllers (Weave GitOps Office Hours)
Weave AI Controllers (Weave GitOps Office Hours)
 
Flamingo: Expand ArgoCD with Flux (Office Hours)
Flamingo: Expand ArgoCD with Flux (Office Hours)Flamingo: Expand ArgoCD with Flux (Office Hours)
Flamingo: Expand ArgoCD with Flux (Office Hours)
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
 
Six Signs You Need Platform Engineering
Six Signs You Need Platform EngineeringSix Signs You Need Platform Engineering
Six Signs You Need Platform Engineering
 
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfSRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
 
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWebinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
 
Flux Beyond Git Harnessing the Power of OCI
Flux Beyond Git Harnessing the Power of OCIFlux Beyond Git Harnessing the Power of OCI
Flux Beyond Git Harnessing the Power of OCI
 
Automated Provisioning, Management & Cost Control for Kubernetes Clusters
Automated Provisioning, Management & Cost Control for Kubernetes ClustersAutomated Provisioning, Management & Cost Control for Kubernetes Clusters
Automated Provisioning, Management & Cost Control for Kubernetes Clusters
 
How to Avoid Kubernetes Multi-tenancy Catastrophes
How to Avoid Kubernetes Multi-tenancy CatastrophesHow to Avoid Kubernetes Multi-tenancy Catastrophes
How to Avoid Kubernetes Multi-tenancy Catastrophes
 
Building internal developer platform with EKS and GitOps
Building internal developer platform with EKS and GitOpsBuilding internal developer platform with EKS and GitOps
Building internal developer platform with EKS and GitOps
 
GitOps Testing in Kubernetes with Flux and Testkube.pdf
GitOps Testing in Kubernetes with Flux and Testkube.pdfGitOps Testing in Kubernetes with Flux and Testkube.pdf
GitOps Testing in Kubernetes with Flux and Testkube.pdf
 
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdIntro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and Linkerd
 
Implementing Flux for Scale with Soft Multi-tenancy
Implementing Flux for Scale with Soft Multi-tenancyImplementing Flux for Scale with Soft Multi-tenancy
Implementing Flux for Scale with Soft Multi-tenancy
 
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKS
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKSAccelerating Hybrid Multistage Delivery with Weave GitOps on EKS
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKS
 
The Story of Flux Reaching Graduation in the CNCF
The Story of Flux Reaching Graduation in the CNCFThe Story of Flux Reaching Graduation in the CNCF
The Story of Flux Reaching Graduation in the CNCF
 
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
 
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...
 
Flux’s Security & Scalability with OCI & Helm Slides.pdf
Flux’s Security & Scalability with OCI & Helm Slides.pdfFlux’s Security & Scalability with OCI & Helm Slides.pdf
Flux’s Security & Scalability with OCI & Helm Slides.pdf
 
Flux Security & Scalability using VS Code GitOps Extension
Flux Security & Scalability using VS Code GitOps Extension Flux Security & Scalability using VS Code GitOps Extension
Flux Security & Scalability using VS Code GitOps Extension
 
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOpsDeploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

  • 1. Project Frankenstein A multi-tenant, horizontally scalable Prometheus as a Service Tom Wilkie (& Julius Volz) Weaveworks, August 2016
  • 2.
  • 3.
  • 4. “the best way to visualise, manage & monitor your cloud native application”
  • 5.
  • 6.
  • 8. why not just run my own Prometheus? • the as-a-service bit provides authentication and access control • virtually infinite retention; all the state is managed for you, by us • provide a different story around durability, HA and scalability • (eventually) better query performance, especially for long queries
  • 9. requirements: 1. API compatible with Prometheus 2. easy to operate and manage 3. tens of thousands of users, tens of millions samples/s 4. cost effective to run 5. reuse as much of Prometheus as possible … so we can sell it
  • 10. Aim: build proof of concept as quickly as possible 16/06 started design doc 22/06 circulated on list 22/06 initial commit 26/07 launch jobs 25/08 give talk! http://goo.gl/prdUYV
  • 11. Retriever scraping your jobs Your DC Weave Cloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3
  • 12. Retriever /bin/prometheus -retrieval-only -storage.remote.generic-url=... Does scraping and relabelling. Is a vanilla Prometheus plus: • Brian Brazil’s generic write PR (#1487) • Some modification to prevent local storage + indexing
  • 13. • Uses consistent hashing to assign timeseries to Ingesters • Input to hash is (user ID, metric name) • Tokens stored in Consul • Also currently handles queries Distributor http://goo.gl/U9u1U2
  • 14. • Heavily modified MemorySeriesStorage • Use same chunk format as Prometheus • Keeps everything in memory (for up to an hour) • Also stores in memory inverted index for queries • Flushes chunks to S3 and indexes them in DynamoDB Ingester
  • 15. External inverted index maintained in DynamoDB, chunks stored in S3 Item in DynamoDB looks like: { hash key: “{user ID}:{metric name}:{hour}”, range key: “{label name}:{label value}:{chunk ID}”, metric: ..., from, through: ..., ID: ..., } DynamoDB S3
  • 17. The Good • It works! And in ~2 months. • Seems pretty scalable, handling two clusters right now • Query performance better than expected The Ugly: the code… The Bad • Hashing scheme means can’t do queries that don’t involve metric names. • Possible to hotspot an ingester
  • 18.
  • 19. Demo
  • 20. Lots left to do… Features: • Recording rules • Alerting & Alertmanager Reliability: • Replication between ingesters, commit log etc • Ingestor lifecycle • Separate query service? Performance: • Query parallelisation • Background chunk coalescing Code: • Code cleanup • Upstream appropriate changes
  • 21. Questions? Try it out! Email help@weave.works for instructions and to get on white list https://github.com/tomwilkie/prometheus