Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Bartek Plotka
Bwplotka Bplotka
Fabian Reinartz
fabxc
Global, durable Prometheus monitoring
Prometheus 2.0
● Reliable operational model
● Powerful query language
● Scraping capabilities beyond the casual usage
● Lo...
Prometheus at Scale
A dream
n1
Improbable case
2 n+1
...
● Multiple isolated Kubernetes clusters
n1
Improbable case
2
Prometheus
n+1
Prometheus
...
● Multiple isolated Kubernetes clusters
● Single Prometheus server per ...
1
Improbable case
2
Prometheus
n
n+1
Prometheus
...
● Multiple isolated Kubernetes clusters
● Single Prometheus server per...
Improbable case
● Multiple isolated Kubernetes clusters
● Single Prometheus server per cluster
● Dashboards & Alertmanager...
Global View
See everything from a single
place!
1
Global View
2
Prometheus
n
n+1
Prometheus
...
Grafana
Alertmanager
“Alert when 66% of clusters in a region are down”
1
Global View
2
Prometheus
n
n+1
Prometheus
...
● How to aggregate data from different clusters?
Grafana
Alertmanager
1
Global View
2
Prometheus
n
n+1
Prometheus
...
● How to aggregate data from different clusters?
○ Use hierarchical federa...
1
Global View
2
Prometheus
n
n+1
Prometheus
...
● How to aggregate data from different clusters?
○ Use hierarchical federa...
Availability
Where is my sample?
1
Availability
2
Prometheus
n
n+1
Prometheus
...
Grafana
Alertmanager
Operator error
Hardware failure
Rollout
1
Availability
2
Prometheus
n
n+1
Prometheus
...
● Can we assure no loss in metric data?
Grafana
Alertmanager
1
Availability
2
Prometheus
n
n+1
Prometheus
...
● Can we assure no loss in metric data?
○ Add HA replicas?
Grafana
Alertm...
1
Availability
2
Prometheus
n
n+1
Prometheus
...
● Can we assure no loss in metric data?
○ Add HA replicas?
Grafana
Alertm...
Historical Metrics
What exactly happened
X weeks ago?
Metric retention
T - 3 years TT - 12 monthsT - 2 years
?
“Go back to what happened 6 months ago...”
Metric retention
T - 3 years TT - 12 monthsT - 2 years
?
“Good progress! Memory allocs looks better than 1,5 year ago!”
T
X
Metric retention
“Let’s see user traffic across years!”
T - 2 years T - 12 monthsT - 3 years
? ? ? ?
T
X
Metric retention
Infrastructure retention: 9 days
9 days
T - 3 years TT - 12 monthsT - 2 years T
Metric retention
● Can we have longer retention?
Metric retention
● Can we have longer retention?
○ Upgrade to Prometheus 2.0
Metric retention
● Can we have longer retention?
○ Upgrade to Prometheus 2.0
○ Scale SSD Vertically?
SSD
Prometheus
Metric retention
● Can we have longer retention?
○ Upgrade to Prometheus 2.0
○ Scale SSD Vertically?
SSD
Prometheus
Metric retention
● Can we have longer retention?
○ Upgrade to Prometheus 2.0
○ Scale SSD Vertically?
SSD
Prometheus
Metric retention
● Can we have longer retention?
○ Upgrade to Prometheus 2.0
○ Scale SSD Vertically?
SSD
Prometheus
SSD
SS...
Metric retention
● Can we have longer retention?
○ Upgrade to Prometheus 2.0
○ Scale SSD Vertically?
SSD
Prometheus
Backup...
Recap
1
2
Prometheus
n
n+1
Prometheus
...
Grafana
Alertmanager
Prometheus
It is just hard to…
● Have a global view
● Have ...
Thanos
It is just hard to…
● Have a global view
● Have a HA in place
● Increase retention
● Seamless integration with Prometheus
● Easy deployment model
● Minimal number of dependencies
● Minimal baseline cost
Ad...
Global View
See everything from a single
place!
SSD
Prometheus
Prometheus
Targets
SSD
Sidecar
Prometheus Sidecar
Targets
SSD
Sidecar
Prometheus Sidecar
Targets
gRPC (Store API)
Store API
service Store {
rpc Series(SeriesRequest) returns (stream SeriesResponse);
rpc LabelNames(LabelNamesRequest) ret...
SSD
Querier
Prometheus Sidecar
Querier
Store API
Targets
HTTP
Query API
SSD
Global View
Prometheus Sidecar
Querier
Targets
SSD
Sidecar
Targets
Prometheus
Merge
Store API
SSD
Global View + Availability
Prometheus Sidecar
Targets
SSD
Sidecar
Targets
Prometheus
SSD
Sidecar Prometheus
“replica”:...
Thanos
It is just hard to…
● Have a global view
● Have a HA in place
● Increase retention
Historical Metrics
What exactly happened
X weeks ago?
TSDB Layout
Block 2 Block 4Block 3Block 1
T-200T-300 T-100 T-50 T
TSDB Layout
Block 4Block 3Block 1
chunks chunks
chunks chunks
index
T-200T-300 T-100 T-50 T
SSD
Data saving
Prometheus Sidecar
Targets
Object Storage
Blocks Blocks
Block
Store
Object Storage
Blocks
Cache
Store
Querier
Store API
Store
Object Storage
Blocks
Cache
Store
Querier
Block
Store API
Store
● A series is made up of one or more “chunks”
● A chunk contains ~120 samples each
● Chunks can be retrieved through...
Store
● A series is made up of one or more “chunks”
● A chunk contains ~120 samples each
● Chunks can be retrieved through...
Store
● A series is made up of one or more “Chunks”
● A chunk contains ~120 samples each
● Chunks can be retrieved through...
Store
Leverage Prometheus’ TSDB file layout
Store
Leverage Prometheus’ TSDB file layout
● Chunks of the same series are aligned
sequentially
Store
Leverage Prometheus’ TSDB file layout
● Chunks of the same series are aligned
● Similar series are aligned, e.g. due...
Store
Leverage Prometheus’ TSDB file layout
● Chunks of the same series are aligned
● Similar series are aligned, e.g. due...
Store
Leverage Prometheus’ TSDB file layout
● Chunks of the same series are aligned
● Similar series are aligned, e.g. due...
Compaction
Density matters
Compaction
Object Storage
Blocks
Disk
Compactor
Compaction
Object Storage
Blocks
Disk
Compactor
Blocks
Compaction
Object Storage
Blocks
Disk
Compactor
Blocks
Block
Compaction
Object Storage
Blocks
Disk
Compactor
Block
Thanos
It is just hard to…
● Have a global view
● Have a HA in place
● Increase retention
Downsampling
Let’s just step back a little
Downsampling
Raw: 16 bytes/sample
Compressed: 1.07
bytes/sample
Downsampling
BUT…
Downsampling
Decompressing one sample takes 10-40 nanoseconds
● Times 1000 series @ 30s scrape interval
● Times 1 year
Downsampling
Decompressing one sample takes 10-40 nanoseconds
● Times 1000 series @ 30s scrape interval
● Times 1 year
● O...
Downsampling
Block
RAW
Block
@ 5m
Block
@ 1h
10x 12x
Downsampling
raw chunk
count sum min max counter
raw chunk...
Downsampling
count sum min max counter
...
Downsampling
count sum min max counter
count_over_time(requests_total[1h])
Downsampling
count sum min max counter
sum_over_time(requests_total[1h])
Downsampling
count sum min max counter
min(requests_total)
min_over_time(requests_total[1h])
Downsampling
count sum min max counter
max(requests_total)
max_over_time(requests_total[1h])
Downsampling
count sum min max counter
rate(requests_total[1h])
increase(requests_total[1h])
Downsampling
count sum min max counter
requests_total
avg(requests_total)
...
*
avg
Full Architecture
Querier
SSD
Sidecar Prometheus
SSD
Sidecar Prometheus
QuerierQuerier
…
Compactor
Store
Bucket
Full Architecture
$ thanos sidecar …
$ thanos query …
$ thanos store …
$ thanos compact …
Deployment Models
Querier
S P
QuerierQuerier
…
Store
Bucket
S P
Querier
S P
QuerierQuerier
…
Store
Bucket
S P
Querier
S P
...
Deployment Models
Querier
S P
QuerierQuerier
…
Store
Bucket
S P
Querier
S P
QuerierQuerier
…
Store
Bucket
S P
Querier
S P
...
Deployment Models
Querier
S P
QuerierQuerier
…
Store
Bucket
S P
S P …
Store
Bucket
S P
S P …
Store
Bucket
S P
Cluster A
Cl...
Cost
● Store + Query node ~ Savings on Prometheus side (+/- 0)
● Fewer SSD space on Prometheus side (savings)
● Basically:...
Cost
Example:
● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk
● 20 x 250GB of new data per mo...
Cost
Example:
● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk
● 20 x 250GB of new data per mo...
Demo - retention
Demo - deduplication
Demo - deduplication
Any questions?
github.com/improbable-eng/thanos
Fabian Reinartz
fabxc
Bartek Plotka
bwplotka Bplotka
Prochain SlideShare
Chargement dans…5
×

Thanos: Global, durable Prometheus monitoring

12 142 vues

Publié le

Prometheus’s simple and reliable operational model is one of its major selling points. However, after surpassing a certain scale, we have identified a few shortcomings it imposes. We are proud to present Thanos, an open source project by Improbable that bundles a set of components that seamlessly transform existing Prometheus deployments, into a unified, global scale monitoring system.

Authors: Fabian Reinartz, Bartlomiej Plotka

Slides from January London Prometheus Meetup 2018.
Thanos: https://github.com/improbable-eng/thanos

Publié dans : Technologie
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Thanos: Global, durable Prometheus monitoring

  1. 1. Bartek Plotka Bwplotka Bplotka Fabian Reinartz fabxc Global, durable Prometheus monitoring
  2. 2. Prometheus 2.0 ● Reliable operational model ● Powerful query language ● Scraping capabilities beyond the casual usage ● Local metric storage
  3. 3. Prometheus at Scale A dream
  4. 4. n1 Improbable case 2 n+1 ... ● Multiple isolated Kubernetes clusters
  5. 5. n1 Improbable case 2 Prometheus n+1 Prometheus ... ● Multiple isolated Kubernetes clusters ● Single Prometheus server per cluster
  6. 6. 1 Improbable case 2 Prometheus n n+1 Prometheus ... ● Multiple isolated Kubernetes clusters ● Single Prometheus server per cluster ● Dashboards & Alertmanager in separate cluster Grafana Alertmanager
  7. 7. Improbable case ● Multiple isolated Kubernetes clusters ● Single Prometheus server per cluster ● Dashboards & Alertmanager in separate cluster Grafana Alertmanager What is missing? 1 2 Prometheus n n+1 Prometheus ...
  8. 8. Global View See everything from a single place!
  9. 9. 1 Global View 2 Prometheus n n+1 Prometheus ... Grafana Alertmanager “Alert when 66% of clusters in a region are down”
  10. 10. 1 Global View 2 Prometheus n n+1 Prometheus ... ● How to aggregate data from different clusters? Grafana Alertmanager
  11. 11. 1 Global View 2 Prometheus n n+1 Prometheus ... ● How to aggregate data from different clusters? ○ Use hierarchical federation? Grafana Alertmanager Prometheus
  12. 12. 1 Global View 2 Prometheus n n+1 Prometheus ... ● How to aggregate data from different clusters? ○ Use hierarchical federation? Grafana AlertmanagerSingle point of failure Maintenance What data are federated? Prometheus
  13. 13. Availability Where is my sample?
  14. 14. 1 Availability 2 Prometheus n n+1 Prometheus ... Grafana Alertmanager Operator error Hardware failure Rollout
  15. 15. 1 Availability 2 Prometheus n n+1 Prometheus ... ● Can we assure no loss in metric data? Grafana Alertmanager
  16. 16. 1 Availability 2 Prometheus n n+1 Prometheus ... ● Can we assure no loss in metric data? ○ Add HA replicas? Grafana Alertmanager Prometheus Prometheus
  17. 17. 1 Availability 2 Prometheus n n+1 Prometheus ... ● Can we assure no loss in metric data? ○ Add HA replicas? Grafana Alertmanager Prometheus Prometheus What replica we should query? Cost? Where to put rules and alerts?
  18. 18. Historical Metrics What exactly happened X weeks ago?
  19. 19. Metric retention T - 3 years TT - 12 monthsT - 2 years ? “Go back to what happened 6 months ago...”
  20. 20. Metric retention T - 3 years TT - 12 monthsT - 2 years ? “Good progress! Memory allocs looks better than 1,5 year ago!” T X
  21. 21. Metric retention “Let’s see user traffic across years!” T - 2 years T - 12 monthsT - 3 years ? ? ? ? T X
  22. 22. Metric retention Infrastructure retention: 9 days 9 days T - 3 years TT - 12 monthsT - 2 years T
  23. 23. Metric retention ● Can we have longer retention?
  24. 24. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0
  25. 25. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus
  26. 26. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus
  27. 27. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus
  28. 28. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus SSD SSD SSD SSD
  29. 29. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus Backup? Maintenance Cost
  30. 30. Recap 1 2 Prometheus n n+1 Prometheus ... Grafana Alertmanager Prometheus It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  31. 31. Thanos It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  32. 32. ● Seamless integration with Prometheus ● Easy deployment model ● Minimal number of dependencies ● Minimal baseline cost Additional Goals
  33. 33. Global View See everything from a single place!
  34. 34. SSD Prometheus Prometheus Targets
  35. 35. SSD Sidecar Prometheus Sidecar Targets
  36. 36. SSD Sidecar Prometheus Sidecar Targets gRPC (Store API)
  37. 37. Store API service Store { rpc Series(SeriesRequest) returns (stream SeriesResponse); rpc LabelNames(LabelNamesRequest) returns (LabelNamesResponse); rpc LabelValues(LabelValuesRequest) returns (LabelValuesResponse); } message SeriesRequest { int64 min_time = 1; int64 max_time = 2; repeated LabelMatcher matchers = 3; } Sidecar Prometheus remote read Store API
  38. 38. SSD Querier Prometheus Sidecar Querier Store API Targets HTTP Query API
  39. 39. SSD Global View Prometheus Sidecar Querier Targets SSD Sidecar Targets Prometheus Merge Store API
  40. 40. SSD Global View + Availability Prometheus Sidecar Targets SSD Sidecar Targets Prometheus SSD Sidecar Prometheus “replica”:”1” “replica”:”2” Querier Merge Deduplicate Store API
  41. 41. Thanos It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  42. 42. Historical Metrics What exactly happened X weeks ago?
  43. 43. TSDB Layout Block 2 Block 4Block 3Block 1 T-200T-300 T-100 T-50 T
  44. 44. TSDB Layout Block 4Block 3Block 1 chunks chunks chunks chunks index T-200T-300 T-100 T-50 T
  45. 45. SSD Data saving Prometheus Sidecar Targets Object Storage Blocks Blocks Block
  46. 46. Store Object Storage Blocks Cache Store Querier Store API
  47. 47. Store Object Storage Blocks Cache Store Querier Block Store API
  48. 48. Store ● A series is made up of one or more “chunks” ● A chunk contains ~120 samples each ● Chunks can be retrieved through HTTP byte range queries
  49. 49. Store ● A series is made up of one or more “chunks” ● A chunk contains ~120 samples each ● Chunks can be retrieved through HTTP byte range queries Example: ● 1000 series @ 30s scrape interval
  50. 50. Store ● A series is made up of one or more “Chunks” ● A chunk contains ~120 samples each ● Chunks can be retrieved through HTTP byte range queries Example: ● 1000 series @ 30s scrape interval ● Query 1 year 8.7 million chunks/range queries
  51. 51. Store Leverage Prometheus’ TSDB file layout
  52. 52. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned sequentially
  53. 53. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned ● Similar series are aligned, e.g. due to same metric name
  54. 54. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned ● Similar series are aligned, e.g. due to same metric name Consolidating ranges in close proximity reduces request count by 4-6 orders of magnitude. 8.7 million requests turned into O(20) requests.
  55. 55. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned ● Similar series are aligned, e.g. due to same metric name Index lookups profit from a similar approach.
  56. 56. Compaction Density matters
  57. 57. Compaction Object Storage Blocks Disk Compactor
  58. 58. Compaction Object Storage Blocks Disk Compactor Blocks
  59. 59. Compaction Object Storage Blocks Disk Compactor Blocks Block
  60. 60. Compaction Object Storage Blocks Disk Compactor Block
  61. 61. Thanos It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  62. 62. Downsampling Let’s just step back a little
  63. 63. Downsampling Raw: 16 bytes/sample Compressed: 1.07 bytes/sample
  64. 64. Downsampling BUT…
  65. 65. Downsampling Decompressing one sample takes 10-40 nanoseconds ● Times 1000 series @ 30s scrape interval ● Times 1 year
  66. 66. Downsampling Decompressing one sample takes 10-40 nanoseconds ● Times 1000 series @ 30s scrape interval ● Times 1 year ● Over 1 billion samples, i.e. 10-40s – for decoding alone ● Plus your actual computation over all those samples, e.g. rate()
  67. 67. Downsampling Block RAW Block @ 5m Block @ 1h 10x 12x
  68. 68. Downsampling raw chunk count sum min max counter raw chunk...
  69. 69. Downsampling count sum min max counter ...
  70. 70. Downsampling count sum min max counter count_over_time(requests_total[1h])
  71. 71. Downsampling count sum min max counter sum_over_time(requests_total[1h])
  72. 72. Downsampling count sum min max counter min(requests_total) min_over_time(requests_total[1h])
  73. 73. Downsampling count sum min max counter max(requests_total) max_over_time(requests_total[1h])
  74. 74. Downsampling count sum min max counter rate(requests_total[1h]) increase(requests_total[1h])
  75. 75. Downsampling count sum min max counter requests_total avg(requests_total) ... * avg
  76. 76. Full Architecture Querier SSD Sidecar Prometheus SSD Sidecar Prometheus QuerierQuerier … Compactor Store Bucket
  77. 77. Full Architecture $ thanos sidecar … $ thanos query … $ thanos store … $ thanos compact …
  78. 78. Deployment Models Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Cluster A Cluster B Cluster C
  79. 79. Deployment Models Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Cluster A Cluster B Cluster C Federation (through Store API)
  80. 80. Deployment Models Querier S P QuerierQuerier … Store Bucket S P S P … Store Bucket S P S P … Store Bucket S P Cluster A Cluster B Cluster C Global Scale Thanos Cluster
  81. 81. Cost ● Store + Query node ~ Savings on Prometheus side (+/- 0) ● Fewer SSD space on Prometheus side (savings) ● Basically: just your data stored in S3/GCS/HDFS + requests
  82. 82. Cost Example: ● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk ● 20 x 250GB of new data per month + ~20% overhead for downsampling ● $1440/month for storage after 1 year (72TB of queryable data) ● $100/month for sustained 100 query/sec against object storage Thanos Cost: $1540
  83. 83. Cost Example: ● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk ● 20 x 250GB of new data per month + ~20% overhead for downsampling ● $1440/month for storage after 1 year (72TB of queryable data) ● $100/month for sustained 100 query/sec against object storage ● $1530/month savings in local SSDs Thanos Cost: $1540 Prometheus Savings: $1530
  84. 84. Demo - retention
  85. 85. Demo - deduplication
  86. 86. Demo - deduplication
  87. 87. Any questions? github.com/improbable-eng/thanos Fabian Reinartz fabxc Bartek Plotka bwplotka Bplotka

×