This document summarizes the author's experience optimizing Gnocchi, an open source time-series database, to store metrics for hundreds of thousands of resources over many months. The author describes improving performance by adding Ceph storage nodes, tuning Ceph configurations, minimizing I/O operations, and improving the storage format. Benchmark results show the new version achieves 50% higher write throughput, 40-60% faster computation times, 30-60% better overall performance, and 30-40% fewer operations. Usage hints are also provided to help optimize for different use cases.
3. built to address storage performance issues
encountered in Ceilometer
4. designed to be used to store time series and
their associated resource metadata
Metric storage
(Ceph)
MetricD
Computation
workers
data
stores aggregated
measurement data
stores metadata
background workers which
aggregate data to minimise
query computations
LoadBalancer
APIAPIAPI
Indexer (SQL)
6. collect usage information for hundreds of
thousands of metrics* over many months for
use in capacity planning recommendations
and scheduling
* data is received in batches every x minutes. not streaming
10. POST ~1000 generic resources with
20 metrics each (20K metrics)
60 measures per metric.
policy rolls up to minute, hour, and day.
8 different aggregations each*.
* min, max, sum, average, median, 95th
percentile, count, stdev
11. METRIC PROCESSING RATE
• rate drops
significantly
after initial
push
• high variance in
processing rate
41. ADDITIONAL FUNCTIONALITY
▪ aggregate of aggregates
▪ get max of means, stdev of maxs, etc…
▪ dynamic resources
▪ create and modify resource definitions
▪ aggregate on demand
▪ avoid/minimise background aggregation tasks and
defer until request