SlideShare une entreprise Scribd logo
1  sur  99
Télécharger pour lire hors ligne
© 2022 VictoriaMetrics
VictoriaMetrics:
scaling to 100 million metrics per second
Aliaksandr Valialkin, CTO @ VictoriaMetrics
Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
doSimpleThing1()
doSimpleThing2()
Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
● I hate over-engineered code, useless abstractions and bloated dependencies
abstractSingletonFabricProducerVisitorOperatorPrototype
Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
● I hate over-engineered code, useless abstractions and bloated dependencies
● I like performance optimizations (fasthttp, fastjson, quicktemplate, fastcache)
Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
● I hate over-engineered code, useless abstractions and bloated dependencies
● I like performance optimizations (fasthttp, fastjson, quicktemplate, fastcache)
● https://github.com/valyala/
What is VictoriaMetrics?
● Open source monitoring solution and time series database
What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
● Easy to setup and operate
What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
● Easy to setup and operate
● Low resource usage
What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
● Easy to setup and operate
● Low resource usage
● High performance
VictoriaMetrics kinds
● Single-node - scales vertically
VictoriaMetrics kinds
● Single-node - scales vertically
● Cluster - scales horizontally
VictoriaMetrics kinds
● Single-node - scales vertically
● Cluster - scales horizontally
● Single-node and cluster share the same core code
VictoriaMetrics single-node: scaling data ingestion
● Read incoming data in blocks
Client
data Read data
blocks
VictoriaMetrics
VictoriaMetrics single-node: scaling data ingestion
● Read incoming data in blocks
● Process blocks in parallel on multiple CPU cores
Client
data Read data
blocks
CPU_1
CPU_2
CPU_N
…
Process blocks
blocks
VictoriaMetrics
VictoriaMetrics single-node: scaling data ingestion
● Put the parsed data into independent buffers
CPU_1
CPU_2
CPU_N
…
Parse blocks
Buffer_1
Buffer_2
Buffer_M
…
In-memory buffers
Buffer parsed data
Tech details
VictoriaMetrics single-node: scaling data ingestion
● Put the parsed data into independent buffers
● Periodically store buffers to disk as independent LSM parts
Part_1
Part_2
Part_P
…
LSM parts
CPU_1
CPU_2
CPU_N
…
Parse blocks
Buffer_1
Buffer_2
Buffer_M
…
In-memory buffers
Compress and store data
Buffer parsed data
Tech details
● VictoriaMetrics stores data in compressed blocks
VictoriaMetrics single-node: scaling querying path
block_1 block_N1
…
series_1
block_1 block_NM
…
series_M
…
block_1 block_N2
…
series_2
● VictoriaMetrics stores data in compressed blocks
● Selected blocks are unpacked in parallel on available CPUs
VictoriaMetrics single-node: scaling querying path
block_1 block_N1
…
series_1
block_1 block_NM
…
series_M
…
CPU_1
CPU_P
…
blocks
block_1 block_N2
…
series_2
● VictoriaMetrics stores data in compressed blocks
● Selected blocks are unpacked in parallel on available CPUs
● Selected time series are processed in parallel on available CPUs
VictoriaMetrics single-node: scaling querying path
block_1 block_N1
…
series_1
block_1 block_NM
…
series_M
…
CPU_1
CPU_P
…
blocks
CPU_1
CPU_P
…
series
block_1 block_N2
…
series_2
VictoriaMetrics single-node: scalability limits
● The performance is limited by a single host (CPU, RAM, disk)
VictoriaMetrics single-node: scalability limits
● The performance is limited by a single host (CPU, RAM, disk)
● Benchmark numbers:
○ Data ingestion: 300k samples/sec per CPU
○ Active time series: 1 million per GB of RAM
○ Query path: 50 million samples/sec per CPU
VictoriaMetrics single-node: scalability limits
● The performance is limited by a single host (CPU, RAM, disk)
● Benchmark numbers:
○ Data ingestion: 300k samples/sec per CPU
○ Active time series: 1 million per GB of RAM
○ Query path: 50 million samples/sec per CPU
● Production numbers:
○ Data ingestion: 2 million samples/sec
○ Active time series: 100 millions
○ Query path: 1 billion samples/sec
○ Total samples: 15 trillions
Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
vminsert_1
vminsert_2
vminsert_M
…
HTTP
load
balancer
Incoming
data
Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
○ vmselect - processes incoming queries
vminsert_1
vminsert_2
vminsert_M
…
HTTP
load
balancer
vmselect_1
vmselect_2
vmselect_P
…
Incoming
data
HTTP
load
balancer
Incoming
queries
Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
○ vmselect - processes incoming queries
○ vmstorage - stores the data
vmstorage_1
vmstorage_2
vmstorage_N
…
vminsert_1
vminsert_2
vminsert_M
…
data
HTTP
load
balancer
vmselect_1
vmselect_2
vmselect_P
…
queries
Incoming
data
HTTP
load
balancer
Incoming
queries
Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
○ vmselect - processes incoming queries
○ vmstorage - stores the data
● Each component can run on the most suitable hardware
vmstorage_1
vmstorage_2
vmstorage_N
…
vminsert_1
vminsert_2
vminsert_M
…
data
HTTP
load
balancer
vmselect_1
vmselect_2
vmselect_P
…
queries
Incoming
data
HTTP
load
balancer
Incoming
queries
Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
○ vmselect - processes incoming queries
○ vmstorage - stores the data
● Each component can run on the most suitable hardware
● Each component can scale independently to any number of instances
vmstorage_1
vmstorage_2
vmstorage_N
…
vminsert_1
vminsert_2
vminsert_M
…
data
HTTP
load
balancer
vmselect_1
vmselect_2
vmselect_P
…
queries
Incoming
data
HTTP
load
balancer
Incoming
queries
VictoriaMetrics cluster: scaling data ingestion
● An http load balancer spreads incoming data among vminsert nodes
● Data ingestion performance scales with the number of vminsert nodes
HTTP load
balancer
vminsert_2
vminsert_1
vminsert_N
…
incoming data
VictoriaMetrics cluster: scaling data ingestion
● vminsert automatically shards incoming data among available vmstorage nodes
via consistent hashing
● Each vmstorage node has its own subset of time series (ideally)
● Data ingestion performance scales with the number of vmstorage nodes
vminsert vmstorage_2
vmstorage_1
vmstorage_M
…
sharding
VictoriaMetrics cluster: scaling querying path
● An http load balancer spreads incoming queries among vmselect nodes
● QPS scales with the number of vmselect nodes
HTTP load
balancer
vmselect_2
vmselect_1
vmselect_P
…
incoming queries
VictoriaMetrics cluster: scaling querying path
● vmselect fetches the needed data from every vmstorage node in parallel
● Querying performance scales with the number of vmstorage nodes
vmselect vmstorage_2
vmstorage_1
vmstorage_N
…
compressed data
VictoriaMetrics cluster: scaling querying path
● vmselect fetches the needed data from every vmstorage node in parallel
● Querying performance scales with the number of vmstorage nodes
● vmselect unpacks the fetched data in parallel on available CPUs
● Querying performance scales with the number of vCPUs at a single vmselect node
vmselect vmstorage_2
vmstorage_1
vmstorage_N
…
compressed data
VictoriaMetrics cluster: scalability limits
● CPU?
VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM?
VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk?
VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk? No - cluster capacity scales with disk space and io
VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk? No - cluster capacity scales with disk space and io
● Network?
VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk? No - cluster capacity scales with disk space and io
● Network? Yes!
100M benchmark
● Can VictoriaMetrics cluster accept 100 million samples per second in
production?
● Can VictoriaMetrics cluster handle a billion of active time series
● How much resources does it need?
Benchmarketing?
● Artificial data?
Benchmarketing?
● Artificial data?
● Limited amounts of data?
Benchmarketing?
● Artificial data?
● Limited amounts of data?
● Limited benchmark duration?
Benchmarketing?
● Artificial data?
● Limited amounts of data?
● Limited benchmark duration?
● Special configs?
Benchmarketing?
● Artificial data?
● Limited amounts of data?
● Limited benchmark duration?
● Special configs?
● Optimized hardware?
No!
Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
● Pushes the real node-exporter metrics to the tested systems
vmagent
node_exporter
scrape
load generator
Prometheus-like system
remote_write
Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
● Pushes the real node-exporter metrics to the tested systems
● Allows using the real alerting rules for node-exporter metrics
vmagent
node_exporter
scrape
load generator
Prometheus-like system
remote_write
vmalert
alerting rules
read
queries
Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
● Pushes the real node-exporter metrics to the tested systems
● Allows using the real alerting rules for node-exporter metrics
● https://github.com/VictoriaMetrics/prometheus-benchmark
vmagent
node_exporter
scrape
load generator
Prometheus-like system
remote_write
vmalert
alerting rules
read
queries
100M benchmark: requirements
● Stable ingestion rate: 100.000.000 samples/sec
100M benchmark: requirements
● Stable ingestion rate: 100.000.000 samples/sec
● Active time series: 1.000.000.000 (1 billion)
100M benchmark: requirements
● Stable ingestion rate: 100.000.000 samples/sec
● Active time series: 1.000.000.000 (1 billion)
● Duration: 24 hours
100M benchmark: requirements
● Stable ingestion rate: 100.000.000 samples/sec
● Active time series: 1.000.000.000 (1 billion)
● Duration: 24 hours
● Total samples: 100M*3600s*24h=8.640.000.000.000 (8.64 trillions)
100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
● Total number of metrics (aka active series): 820K*1220=1 billion
100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
● Total number of metrics (aka active series): 820K*1220=1 billion
● Scrape interval: 10 seconds
100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
● Total number of metrics (aka active series): 820K*1220=1 billion
● Scrape interval: 10 seconds
● Scrape rate: 1 billion / 10 seconds = 100M samples/sec
100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk)
100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk)
● vminsert: 18 x (16 vCPU, 55GB RAM)
100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk)
● vminsert: 18 x (16 vCPU, 55GB RAM)
● vmselect: none (wait for the next talk)
100M benchmark: allocated resources
● Prometheus-benchmark resources:
○ vCPU cores: 16*8=128
○ RAM: 16*25GB=400GB
100M benchmark: allocated resources
● Prometheus-benchmark resources:
○ vCPU cores: 16*8=128
○ RAM: 16*25GB=400GB
● VictoriaMetrics cluster resources:
○ vCPU cores: (46vmstorage+18vminsert)*16=1024
○ RAM: (46vmstorage+18vminsert)*55=3520GB
○ Disk: 46 x 2200GB = 101.2 TB
100M benchmark: allocated resources
● Prometheus-benchmark resources:
○ vCPU cores: 16*8=128
○ RAM: 16*25GB=400GB
● VictoriaMetrics cluster resources:
○ vCPU cores: (46vmstorage+18vminsert)*16=1024
○ RAM: (46vmstorage+18vminsert)*55=3520GB
○ Disk: 46 x 2200GB = 101.2 TB
● Kubernetes cluster:
○ 36x e2-standard-32 nodes (32 vCPU, 128GB RAM each)
○ Total: 1152 vCPU, 4608GB RAM
100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%)
100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%)
● Network: 140Gbit/s (can be reduced to 20Gbit/s at the cost of 10% CPU)
100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%)
● Network: 140Gbit/s (can be reduced to 20Gbit/s at the cost of 10% CPU)
● Disk IO: 3GB/s write, 450MB/s read
100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
● Total samples ingested: 8.77 trillions
100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
● Total samples ingested: 8.77 trillions
100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
● Total samples ingested: 8.77 trillions
● Average sample size: 0.85 bytes
100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
vmagent
host_1
host_2
host_1.000.000
…
scrape
VictoriaMetrics cluster
remote_write
a million of hosts
scrape_interval=10s
100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
● Cluster stability improves with the number of nodes
100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
● Cluster stability improves with the number of nodes
● HDD-based disks are enough - there is no need in SSD-based disks
HDD
$40/TB/month
SSD
$170/TB/month
vs
100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
● Cluster stability improves with the number of nodes
● HDD-based disks are enough - there is no need in SSD-based disks
● VictoriaMetrics handles large workloads with default configs
Reproduce the 100M benchmark on yourself!
● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100
Reproduce the 100M benchmark on yourself!
● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100
● Benchmark configs
Reproduce the 100M benchmark on yourself!
● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100
● Benchmark configs
● VictoriaMetrics cluster configs
What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
● Kubernetes-like time series churn rate?
What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
● Kubernetes-like time series churn rate?
● A month-long benchmark (needs $$$)?
What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
● Kubernetes-like time series churn rate?
● A month-long benchmark (needs $$$)?
● Share your results!
Questions?

Contenu connexe

Tendances

Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
Tim Vaillancourt
 

Tendances (20)

PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language
 
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Prometheus Storage
Prometheus StoragePrometheus Storage
Prometheus Storage
 
Thanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoringThanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoring
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
 
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 

Similaire à OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Aliaksandr Valialkin

What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
VictoriaMetrics
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 

Similaire à OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Aliaksandr Valialkin (20)

Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
EM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsEM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM Metrics
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
 
MongoDB World 2018: Transactions and Durability: Putting the “D” in ACID
MongoDB World 2018: Transactions and Durability: Putting the “D” in ACIDMongoDB World 2018: Transactions and Durability: Putting the “D” in ACID
MongoDB World 2018: Transactions and Durability: Putting the “D” in ACID
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
XPDDS17: NoXS: Death to the XenStore - Filipe Manco, NEC
XPDDS17:  NoXS: Death to the XenStore - Filipe Manco, NECXPDDS17:  NoXS: Death to the XenStore - Filipe Manco, NEC
XPDDS17: NoXS: Death to the XenStore - Filipe Manco, NEC
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
Crikeycon 2019 Velociraptor Workshop
Crikeycon 2019 Velociraptor WorkshopCrikeycon 2019 Velociraptor Workshop
Crikeycon 2019 Velociraptor Workshop
 
DevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMDevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBM
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Ducksboard - A real-time data oriented webservice architecture
Ducksboard - A real-time data oriented webservice architectureDucksboard - A real-time data oriented webservice architecture
Ducksboard - A real-time data oriented webservice architecture
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdf
 
Ibm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIbm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_Capabilities
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 

Dernier

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Dernier (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 

OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Aliaksandr Valialkin

  • 1. © 2022 VictoriaMetrics VictoriaMetrics: scaling to 100 million metrics per second Aliaksandr Valialkin, CTO @ VictoriaMetrics
  • 2. Let’s meet ● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
  • 3. Let’s meet ● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics ● I like writing programs in Go
  • 4. Let’s meet ● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics ● I like writing programs in Go ● I like simple and clear code doSimpleThing1() doSimpleThing2()
  • 5. Let’s meet ● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics ● I like writing programs in Go ● I like simple and clear code ● I hate over-engineered code, useless abstractions and bloated dependencies abstractSingletonFabricProducerVisitorOperatorPrototype
  • 6. Let’s meet ● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics ● I like writing programs in Go ● I like simple and clear code ● I hate over-engineered code, useless abstractions and bloated dependencies ● I like performance optimizations (fasthttp, fastjson, quicktemplate, fastcache)
  • 7. Let’s meet ● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics ● I like writing programs in Go ● I like simple and clear code ● I hate over-engineered code, useless abstractions and bloated dependencies ● I like performance optimizations (fasthttp, fastjson, quicktemplate, fastcache) ● https://github.com/valyala/
  • 8. What is VictoriaMetrics? ● Open source monitoring solution and time series database
  • 9. What is VictoriaMetrics? ● Open source monitoring solution and time series database ● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite, DataDog, OpenTSDB, CSV, JSON
  • 10. What is VictoriaMetrics? ● Open source monitoring solution and time series database ● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite, DataDog, OpenTSDB, CSV, JSON ● Can discover and scrape Prometheus targets (Kubernetes too)
  • 11. What is VictoriaMetrics? ● Open source monitoring solution and time series database ● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite, DataDog, OpenTSDB, CSV, JSON ● Can discover and scrape Prometheus targets (Kubernetes too) ● Easy to setup and operate
  • 12. What is VictoriaMetrics? ● Open source monitoring solution and time series database ● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite, DataDog, OpenTSDB, CSV, JSON ● Can discover and scrape Prometheus targets (Kubernetes too) ● Easy to setup and operate ● Low resource usage
  • 13. What is VictoriaMetrics? ● Open source monitoring solution and time series database ● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite, DataDog, OpenTSDB, CSV, JSON ● Can discover and scrape Prometheus targets (Kubernetes too) ● Easy to setup and operate ● Low resource usage ● High performance
  • 15. VictoriaMetrics kinds ● Single-node - scales vertically ● Cluster - scales horizontally
  • 16. VictoriaMetrics kinds ● Single-node - scales vertically ● Cluster - scales horizontally ● Single-node and cluster share the same core code
  • 17. VictoriaMetrics single-node: scaling data ingestion ● Read incoming data in blocks Client data Read data blocks VictoriaMetrics
  • 18. VictoriaMetrics single-node: scaling data ingestion ● Read incoming data in blocks ● Process blocks in parallel on multiple CPU cores Client data Read data blocks CPU_1 CPU_2 CPU_N … Process blocks blocks VictoriaMetrics
  • 19. VictoriaMetrics single-node: scaling data ingestion ● Put the parsed data into independent buffers CPU_1 CPU_2 CPU_N … Parse blocks Buffer_1 Buffer_2 Buffer_M … In-memory buffers Buffer parsed data Tech details
  • 20. VictoriaMetrics single-node: scaling data ingestion ● Put the parsed data into independent buffers ● Periodically store buffers to disk as independent LSM parts Part_1 Part_2 Part_P … LSM parts CPU_1 CPU_2 CPU_N … Parse blocks Buffer_1 Buffer_2 Buffer_M … In-memory buffers Compress and store data Buffer parsed data Tech details
  • 21. ● VictoriaMetrics stores data in compressed blocks VictoriaMetrics single-node: scaling querying path block_1 block_N1 … series_1 block_1 block_NM … series_M … block_1 block_N2 … series_2
  • 22. ● VictoriaMetrics stores data in compressed blocks ● Selected blocks are unpacked in parallel on available CPUs VictoriaMetrics single-node: scaling querying path block_1 block_N1 … series_1 block_1 block_NM … series_M … CPU_1 CPU_P … blocks block_1 block_N2 … series_2
  • 23. ● VictoriaMetrics stores data in compressed blocks ● Selected blocks are unpacked in parallel on available CPUs ● Selected time series are processed in parallel on available CPUs VictoriaMetrics single-node: scaling querying path block_1 block_N1 … series_1 block_1 block_NM … series_M … CPU_1 CPU_P … blocks CPU_1 CPU_P … series block_1 block_N2 … series_2
  • 24. VictoriaMetrics single-node: scalability limits ● The performance is limited by a single host (CPU, RAM, disk)
  • 25. VictoriaMetrics single-node: scalability limits ● The performance is limited by a single host (CPU, RAM, disk) ● Benchmark numbers: ○ Data ingestion: 300k samples/sec per CPU ○ Active time series: 1 million per GB of RAM ○ Query path: 50 million samples/sec per CPU
  • 26. VictoriaMetrics single-node: scalability limits ● The performance is limited by a single host (CPU, RAM, disk) ● Benchmark numbers: ○ Data ingestion: 300k samples/sec per CPU ○ Active time series: 1 million per GB of RAM ○ Query path: 50 million samples/sec per CPU ● Production numbers: ○ Data ingestion: 2 million samples/sec ○ Active time series: 100 millions ○ Query path: 1 billion samples/sec ○ Total samples: 15 trillions
  • 27. Scaling VictoriaMetrics cluster ● VictoriaMetrics cluster consists of three components: ○ vminsert - accepts incoming data vminsert_1 vminsert_2 vminsert_M … HTTP load balancer Incoming data
  • 28. Scaling VictoriaMetrics cluster ● VictoriaMetrics cluster consists of three components: ○ vminsert - accepts incoming data ○ vmselect - processes incoming queries vminsert_1 vminsert_2 vminsert_M … HTTP load balancer vmselect_1 vmselect_2 vmselect_P … Incoming data HTTP load balancer Incoming queries
  • 29. Scaling VictoriaMetrics cluster ● VictoriaMetrics cluster consists of three components: ○ vminsert - accepts incoming data ○ vmselect - processes incoming queries ○ vmstorage - stores the data vmstorage_1 vmstorage_2 vmstorage_N … vminsert_1 vminsert_2 vminsert_M … data HTTP load balancer vmselect_1 vmselect_2 vmselect_P … queries Incoming data HTTP load balancer Incoming queries
  • 30. Scaling VictoriaMetrics cluster ● VictoriaMetrics cluster consists of three components: ○ vminsert - accepts incoming data ○ vmselect - processes incoming queries ○ vmstorage - stores the data ● Each component can run on the most suitable hardware vmstorage_1 vmstorage_2 vmstorage_N … vminsert_1 vminsert_2 vminsert_M … data HTTP load balancer vmselect_1 vmselect_2 vmselect_P … queries Incoming data HTTP load balancer Incoming queries
  • 31. Scaling VictoriaMetrics cluster ● VictoriaMetrics cluster consists of three components: ○ vminsert - accepts incoming data ○ vmselect - processes incoming queries ○ vmstorage - stores the data ● Each component can run on the most suitable hardware ● Each component can scale independently to any number of instances vmstorage_1 vmstorage_2 vmstorage_N … vminsert_1 vminsert_2 vminsert_M … data HTTP load balancer vmselect_1 vmselect_2 vmselect_P … queries Incoming data HTTP load balancer Incoming queries
  • 32. VictoriaMetrics cluster: scaling data ingestion ● An http load balancer spreads incoming data among vminsert nodes ● Data ingestion performance scales with the number of vminsert nodes HTTP load balancer vminsert_2 vminsert_1 vminsert_N … incoming data
  • 33. VictoriaMetrics cluster: scaling data ingestion ● vminsert automatically shards incoming data among available vmstorage nodes via consistent hashing ● Each vmstorage node has its own subset of time series (ideally) ● Data ingestion performance scales with the number of vmstorage nodes vminsert vmstorage_2 vmstorage_1 vmstorage_M … sharding
  • 34. VictoriaMetrics cluster: scaling querying path ● An http load balancer spreads incoming queries among vmselect nodes ● QPS scales with the number of vmselect nodes HTTP load balancer vmselect_2 vmselect_1 vmselect_P … incoming queries
  • 35. VictoriaMetrics cluster: scaling querying path ● vmselect fetches the needed data from every vmstorage node in parallel ● Querying performance scales with the number of vmstorage nodes vmselect vmstorage_2 vmstorage_1 vmstorage_N … compressed data
  • 36. VictoriaMetrics cluster: scaling querying path ● vmselect fetches the needed data from every vmstorage node in parallel ● Querying performance scales with the number of vmstorage nodes ● vmselect unpacks the fetched data in parallel on available CPUs ● Querying performance scales with the number of vCPUs at a single vmselect node vmselect vmstorage_2 vmstorage_1 vmstorage_N … compressed data
  • 38. VictoriaMetrics cluster: scalability limits ● CPU? No - data ingestion and querying performance scales with CPUs
  • 39. VictoriaMetrics cluster: scalability limits ● CPU? No - data ingestion and querying performance scales with CPUs ● RAM?
  • 40. VictoriaMetrics cluster: scalability limits ● CPU? No - data ingestion and querying performance scales with CPUs ● RAM? No - cluster capacity scales with RAM
  • 41. VictoriaMetrics cluster: scalability limits ● CPU? No - data ingestion and querying performance scales with CPUs ● RAM? No - cluster capacity scales with RAM ● Disk?
  • 42. VictoriaMetrics cluster: scalability limits ● CPU? No - data ingestion and querying performance scales with CPUs ● RAM? No - cluster capacity scales with RAM ● Disk? No - cluster capacity scales with disk space and io
  • 43. VictoriaMetrics cluster: scalability limits ● CPU? No - data ingestion and querying performance scales with CPUs ● RAM? No - cluster capacity scales with RAM ● Disk? No - cluster capacity scales with disk space and io ● Network?
  • 44. VictoriaMetrics cluster: scalability limits ● CPU? No - data ingestion and querying performance scales with CPUs ● RAM? No - cluster capacity scales with RAM ● Disk? No - cluster capacity scales with disk space and io ● Network? Yes!
  • 45. 100M benchmark ● Can VictoriaMetrics cluster accept 100 million samples per second in production? ● Can VictoriaMetrics cluster handle a billion of active time series ● How much resources does it need?
  • 47. Benchmarketing? ● Artificial data? ● Limited amounts of data?
  • 48. Benchmarketing? ● Artificial data? ● Limited amounts of data? ● Limited benchmark duration?
  • 49. Benchmarketing? ● Artificial data? ● Limited amounts of data? ● Limited benchmark duration? ● Special configs?
  • 50. Benchmarketing? ● Artificial data? ● Limited amounts of data? ● Limited benchmark duration? ● Special configs? ● Optimized hardware?
  • 51. No!
  • 52. Prometheus-benchmark ● Helm chart for testing Prometheus-like systems
  • 53. Prometheus-benchmark ● Helm chart for testing Prometheus-like systems ● Uses production-like workload for data ingestion and querying
  • 54. Prometheus-benchmark ● Helm chart for testing Prometheus-like systems ● Uses production-like workload for data ingestion and querying ● Pushes the real node-exporter metrics to the tested systems vmagent node_exporter scrape load generator Prometheus-like system remote_write
  • 55. Prometheus-benchmark ● Helm chart for testing Prometheus-like systems ● Uses production-like workload for data ingestion and querying ● Pushes the real node-exporter metrics to the tested systems ● Allows using the real alerting rules for node-exporter metrics vmagent node_exporter scrape load generator Prometheus-like system remote_write vmalert alerting rules read queries
  • 56. Prometheus-benchmark ● Helm chart for testing Prometheus-like systems ● Uses production-like workload for data ingestion and querying ● Pushes the real node-exporter metrics to the tested systems ● Allows using the real alerting rules for node-exporter metrics ● https://github.com/VictoriaMetrics/prometheus-benchmark vmagent node_exporter scrape load generator Prometheus-like system remote_write vmalert alerting rules read queries
  • 57. 100M benchmark: requirements ● Stable ingestion rate: 100.000.000 samples/sec
  • 58. 100M benchmark: requirements ● Stable ingestion rate: 100.000.000 samples/sec ● Active time series: 1.000.000.000 (1 billion)
  • 59. 100M benchmark: requirements ● Stable ingestion rate: 100.000.000 samples/sec ● Active time series: 1.000.000.000 (1 billion) ● Duration: 24 hours
  • 60. 100M benchmark: requirements ● Stable ingestion rate: 100.000.000 samples/sec ● Active time series: 1.000.000.000 (1 billion) ● Duration: 24 hours ● Total samples: 100M*3600s*24h=8.640.000.000.000 (8.64 trillions)
  • 61. 100M benchmark: prometheus-benchmark configs ● 16 load generator pods (8vCPU, 25GB RAM each)
  • 62. 100M benchmark: prometheus-benchmark configs ● 16 load generator pods (8vCPU, 25GB RAM each) ● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
  • 63. 100M benchmark: prometheus-benchmark configs ● 16 load generator pods (8vCPU, 25GB RAM each) ● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000 ● Each scrape targets exposes around 1220 metrics
  • 64. 100M benchmark: prometheus-benchmark configs ● 16 load generator pods (8vCPU, 25GB RAM each) ● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000 ● Each scrape targets exposes around 1220 metrics ● Total number of metrics (aka active series): 820K*1220=1 billion
  • 65. 100M benchmark: prometheus-benchmark configs ● 16 load generator pods (8vCPU, 25GB RAM each) ● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000 ● Each scrape targets exposes around 1220 metrics ● Total number of metrics (aka active series): 820K*1220=1 billion ● Scrape interval: 10 seconds
  • 66. 100M benchmark: prometheus-benchmark configs ● 16 load generator pods (8vCPU, 25GB RAM each) ● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000 ● Each scrape targets exposes around 1220 metrics ● Total number of metrics (aka active series): 820K*1220=1 billion ● Scrape interval: 10 seconds ● Scrape rate: 1 billion / 10 seconds = 100M samples/sec
  • 67. 100M benchmark: VictoriaMetrics cluster configs ● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
  • 68. 100M benchmark: VictoriaMetrics cluster configs ● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts ● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk)
  • 69. 100M benchmark: VictoriaMetrics cluster configs ● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts ● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk) ● vminsert: 18 x (16 vCPU, 55GB RAM)
  • 70. 100M benchmark: VictoriaMetrics cluster configs ● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts ● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk) ● vminsert: 18 x (16 vCPU, 55GB RAM) ● vmselect: none (wait for the next talk)
  • 71. 100M benchmark: allocated resources ● Prometheus-benchmark resources: ○ vCPU cores: 16*8=128 ○ RAM: 16*25GB=400GB
  • 72. 100M benchmark: allocated resources ● Prometheus-benchmark resources: ○ vCPU cores: 16*8=128 ○ RAM: 16*25GB=400GB ● VictoriaMetrics cluster resources: ○ vCPU cores: (46vmstorage+18vminsert)*16=1024 ○ RAM: (46vmstorage+18vminsert)*55=3520GB ○ Disk: 46 x 2200GB = 101.2 TB
  • 73. 100M benchmark: allocated resources ● Prometheus-benchmark resources: ○ vCPU cores: 16*8=128 ○ RAM: 16*25GB=400GB ● VictoriaMetrics cluster resources: ○ vCPU cores: (46vmstorage+18vminsert)*16=1024 ○ RAM: (46vmstorage+18vminsert)*55=3520GB ○ Disk: 46 x 2200GB = 101.2 TB ● Kubernetes cluster: ○ 36x e2-standard-32 nodes (32 vCPU, 128GB RAM each) ○ Total: 1152 vCPU, 4608GB RAM
  • 74. 100M benchmark: used resources ● vminsert: 206vCPU, 26GB RAM
  • 75. 100M benchmark: used resources ● vminsert: 206vCPU, 26GB RAM ● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
  • 76. 100M benchmark: used resources ● vminsert: 206vCPU, 26GB RAM ● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk ● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%)
  • 77. 100M benchmark: used resources ● vminsert: 206vCPU, 26GB RAM ● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk ● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%) ● Network: 140Gbit/s (can be reduced to 20Gbit/s at the cost of 10% CPU)
  • 78. 100M benchmark: used resources ● vminsert: 206vCPU, 26GB RAM ● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk ● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%) ● Network: 140Gbit/s (can be reduced to 20Gbit/s at the cost of 10% CPU) ● Disk IO: 3GB/s write, 450MB/s read
  • 79. 100M benchmark: results ● Stable data ingestion at 100M samples/sec during 24 hours
  • 80. 100M benchmark: results ● Stable data ingestion at 100M samples/sec during 24 hours ● Active time series: 1 billion
  • 81. 100M benchmark: results ● Stable data ingestion at 100M samples/sec during 24 hours ● Active time series: 1 billion ● Total samples ingested: 8.77 trillions
  • 82. 100M benchmark: results ● Stable data ingestion at 100M samples/sec during 24 hours ● Active time series: 1 billion ● Total samples ingested: 8.77 trillions
  • 83. 100M benchmark: results ● Stable data ingestion at 100M samples/sec during 24 hours ● Active time series: 1 billion ● Total samples ingested: 8.77 trillions ● Average sample size: 0.85 bytes
  • 84.
  • 85. 100M benchmark: key takeaways ● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes and more
  • 86. 100M benchmark: key takeaways ● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes and more ● A single VictoriaMetrics cluster can collect metrics from a million of hosts vmagent host_1 host_2 host_1.000.000 … scrape VictoriaMetrics cluster remote_write a million of hosts scrape_interval=10s
  • 87. 100M benchmark: key takeaways ● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes and more ● A single VictoriaMetrics cluster can collect metrics from a million of hosts ● Cluster stability improves with the number of nodes
  • 88. 100M benchmark: key takeaways ● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes and more ● A single VictoriaMetrics cluster can collect metrics from a million of hosts ● Cluster stability improves with the number of nodes ● HDD-based disks are enough - there is no need in SSD-based disks HDD $40/TB/month SSD $170/TB/month vs
  • 89. 100M benchmark: key takeaways ● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes and more ● A single VictoriaMetrics cluster can collect metrics from a million of hosts ● Cluster stability improves with the number of nodes ● HDD-based disks are enough - there is no need in SSD-based disks ● VictoriaMetrics handles large workloads with default configs
  • 90. Reproduce the 100M benchmark on yourself! ● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100
  • 91. Reproduce the 100M benchmark on yourself! ● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100 ● Benchmark configs
  • 92. Reproduce the 100M benchmark on yourself! ● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100 ● Benchmark configs ● VictoriaMetrics cluster configs
  • 93. What’s next? ● Benchmark querying performance (50M samples/sec per vCPU processing speed)?
  • 94. What’s next? ● Benchmark querying performance (50M samples/sec per vCPU processing speed)? ● A billion samples/sec benchmark?
  • 95. What’s next? ● Benchmark querying performance (50M samples/sec per vCPU processing speed)? ● A billion samples/sec benchmark? ● 10 billions of active time series?
  • 96. What’s next? ● Benchmark querying performance (50M samples/sec per vCPU processing speed)? ● A billion samples/sec benchmark? ● 10 billions of active time series? ● Kubernetes-like time series churn rate?
  • 97. What’s next? ● Benchmark querying performance (50M samples/sec per vCPU processing speed)? ● A billion samples/sec benchmark? ● 10 billions of active time series? ● Kubernetes-like time series churn rate? ● A month-long benchmark (needs $$$)?
  • 98. What’s next? ● Benchmark querying performance (50M samples/sec per vCPU processing speed)? ● A billion samples/sec benchmark? ● 10 billions of active time series? ● Kubernetes-like time series churn rate? ● A month-long benchmark (needs $$$)? ● Share your results!