SlideShare a Scribd company logo
1 of 37
Stabilizing the Jenga tower:
Scaling out Ceilometer
Gordon Chung & Pradeep Kilambi
Engineers @ Red Hat, Inc.
Our Mission
“To reliably collect measurements
of the utilization of physical &
virtual resources comprising
deployed clouds, persist this data
for subsequent retrieval &
analysis, and trigger actions when
defined criteria are met."
Overview
● Collect physical and virtual resource data
● Transform data to something measurable
● Publish data to various targets
● Persist data to storage
● Retrieve data via API for further analysis, billing,
triggering actions etc.
Collect Transform Publish Persist Retrieve
Architecture (Icehouse)
OpenStack Services
Notification Bus
API
External Systems
Notification
Agents
Agent1
AgentN
Agent2
Pipeline
Polling
Agents
Agent1
AgentN
Agent2
Pipeline
Database
Events
Meters
Alarms
AlarmEvaluator
AlarmNotifier
Collectors
Collector1
CollectorN
Collector2
Partial HA
Support
Active-Active
HA support
Ceilometer as it’s perceived
Ceilometer
Cloud Admin
“API response too slow”
“When Ceilometer dies,
Glance dies.”
“Ceilometer is leaking
memory”
“Ceilometer doesn’t scale”
“HAProxy is messing with
MongoDB replica-sets”
“Ceilometer is not
Production Ready”
Evolution of Ceilometer
Architecture (Juno)
OpenStack Services
Notification Bus
API
External Systems
Notification
Agents
Agent1
AgentN
Agent2
Pipeline
Polling
Agents
Agent1
AgentN
Agent2
Pipeline
Databases
Alarms
Events
Meters
AlarmEvaluator
AlarmNotifier
Collectors
Collector1
CollectorN
Collector2
Partial HA
Support
Active-Active
HA support
Active/Active Workload Partitioning
Architecture (Kilo)
OpenStack Services
Notification Bus
API
External Systems
Notification
Agents
Agent1
AgentN
Agent2
Pipeline
Polling
Agents
Agent1
AgentN
Agent2
Pipeline
Databases
Alarms
AlarmEvaluator
AlarmNotifier
Collectors
Collector1
CollectorN
Collector2
Meters
Events
Active-Active
HA support
best Practices
Best Practices (Data Collection)
● Modify your pipeline to match requirements
○ Collect only meters you need by tuning pipeline.yaml
○ Tweak polling interval as needed
● Enable jittering to polling (Kilo+)
● Scale out - add agents as load increases (Juno+)
● Use notifier publisher vs rpc publisher (Juno+)
Best Practices (Data Storage)
● Avoid open-ended queries, query on a time range
● Install API behind mod_wsgi
● Tweak WSGIDaemon settings such as threads and
processes
● Set a TTL, expire data to minimise database size
● Run mongodb on a separate node
○ Use sharding and replica-sets
Different Strokes for Different Folks
Deployment Scenarios (Lambda Design)
Polling /
Notification
Agents
Queue1
Queue2
Short-Term
Database
Archive
Database
Collector (short-
term)
Collector (short-
term)
Collector (short-
term)
Collector (short-
term)
Collector (short-
term)
Collector (long-
term)
Deployment Scenarios (Data Segregation)
Polling /
Notification
Agents
Queue1
Queue2
Database
Audit
Database
Collector (short-
term)
Collector (short-
term)Collector (public)
Collector (short-
term)
Collector (short-
term)
Collector
(audit)
Deployment Scenarios (JSON Files)
Polling /
Notification
Agents
Queue1
Collector
(short-term)
Collector
(short-term)Collector Apache Spark
JSON files
Deployment Scenarios (Fraud Detection)
Polling /
Notification
Agents
Queue
Collector
(short-term)
Collector
(short-term)Collector
Proprietary
Alerting
System
HTTP
Deployment Scenarios (Custom consumers)
Polling /
Notification
Agents
Kafka Apache Storm
Deployment Scenarios (Debugging)
Polling /
Notification
Agents
Event
Queue
Collectors ElasticSearch
Kibana
OpenStack Services
Deployment Scenarios (Noisy Services)
Notification Bus
Notification Bus
Databases
Alarms
Collectors
Collector1
CollectorN
Collector2
Meters
Events
Notification
Agents
Agent1
AgentN
Agent2
Pipeline
Continual Evolution
Continual Evolution
Liberty
● Gnocchi Integration
● Building up events
● Declarative data collection
● Minimise the bloat
Gnocchi: Resource Metering as a Service
● Lightweight time-series metadata
● Separate storage and data models for
resources and time-series data
● indexer for metrics and resources
● Eagerly pre-aggregates metric data
● Supports restricted cross-metric
aggregation
● Per time-series configurable retention policy
Size matters
{
"_id": ObjectId("55103dd3bf4d2c7a7de6e319"),
"counter_name": "cpu",
"user_id": "72bd0799d496476f9eed16d49e0b86e9",
"resource_id": "d7f94857-a0d8-4864-8ab1-124055950973",
"timestamp": ISODate("2015-03-23T16:22:43Z"),
"message_signature": "539736605d14c0aa8c85058e6e9e67a078146f2e80a218d8dc6711c8d6875ae5",
"message_id": "d559f244-d178-11e4-9fa9-28b2bd01ed52",
"source": "openstack",
"counter_unit": "ns",
"counter_volume": NumberLong("22450000000"),
"recorded_at": ISODate("2015-03-23T16:22:43.412Z"),
"project_id": "99fb96cb63624163975dcbf95d7d2d6f",
"resource_metadata": {
"status": "active",
"cpu_number": 1,
"ephemeral_gb": 0,
"display_name": "inst-3",
"name": "instance-00000003",
"disk_gb": 0,
"kernel_id": "4e303a91-ae5b-43c7-b823-fd6f2cceab4e",
"image": {
"id": "490af6b0-2402-45d8-bcb1-c81376326e8d",
"links": [
{
"href": "http://10.162.32.175:8774/837660dc95324be594a0607d80a22c53/images/490af6b0-2402-
45d8-bcb1-c81376326e8d",
"rel": "bookmark"
}
],
"name": "cirros-0.3.2-x86_64-uec"
},
"ramdisk_id": "7112ea15-3ece-4805-9f23-f6141a6f27b0",
"vcpus": 1,
"memory_mb": 64,
"instance_type": "42",
…..
}
{
"2015-03-23T16:22:43Z" : 1
}
gnocchi datapoint
ceilometer datapoint (mongodb)
Vs
Gnocchi Benchmarks
Gnocchi Benchmarks
Gnocchi
Architecture (Gnocchi)
OpenStack Services
Notification Bus
API
External Systems
Notification
Agents
Agent1
AgentN
Agent2
Pipeline
Polling
Agents
Agent1
AgentN
Agent2
Pipeline
Databases
Alarms
Alarm
Evaluator
AlarmNotifier
Collectors
Collector1
CollectorN
Collector2
Events
Active-Active
HA support
API
Metric Resources
Discussions
● operators session - May 19, 2015 (12:05pm) Rm 306
● design track - May 20, 2015 (9:00am - 3:30pm)
o event alarms; ceilometer componentisation
● design track - May 21, 2015 (9:00am - 12:30pm)
● speaker session:
o The Anatomy of an Action - May 21, 2015 (1:30pm)
● irc: #openstack-ceilometer
● mailing-list: openstack-dev@lists.openstack.org
● https://wiki.openstack.org/wiki/ReleaseNotes/Juno
● https://wiki.openstack.org/wiki/ReleaseNotes/Kilo
● http://nejc.saje.info/ceilometer-central-agent.html
● https://julien.danjou.info/blog/2015/openstack-gnocchi-
first-release
● https://blog.sileht.net/writing-a-gnocchi-storage-driver-
for-ceph.html
Resources
Thank You

More Related Content

What's hot

Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and presentGordon Chung
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to PriamJason Brown
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuitNAVER D2
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
Resource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsResource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsSharma Podila
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on AndroidTomáš Kypta
 
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Srinath Perera
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...DataStax
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStackBelmiro Moreira
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterChristopher Bradford
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNBelmiro Moreira
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...Cloud Native Day Tel Aviv
 

What's hot (20)

Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and present
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to Priam
 
Gnocchi v3
Gnocchi v3Gnocchi v3
Gnocchi v3
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Resource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsResource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native Environments
 
Mario on spark
Mario on sparkMario on spark
Mario on spark
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
 
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
 
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...
 

Similar to Stabilising the jenga tower

YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environmentsDocker, Inc.
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesHao Chen
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportQAware GmbH
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudRick Bilodeau
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Asset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, BistelAsset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, BistelMetatron
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...NETWAYS
 
OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...
OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...
OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...NETWAYS
 
Automated Application Management with SaltStack
Automated Application Management with SaltStackAutomated Application Management with SaltStack
Automated Application Management with SaltStackinovex GmbH
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...Lucidworks
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudyDissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudySalman Baset
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business ProblemsKen Owens
 

Similar to Stabilising the jenga tower (20)

YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New Features
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Asset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, BistelAsset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, Bistel
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...
 
OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...
OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...
OSMC 2015 | Monitor OpenStack environments from the bottom up and front to ba...
 
Automated Application Management with SaltStack
Automated Application Management with SaltStackAutomated Application Management with SaltStack
Automated Application Management with SaltStack
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudyDissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 

More from Gordon Chung

Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with GnocchiGordon Chung
 
beyond the technology: privacy, trust and security in the cloud
beyond the technology: privacy, trust and security in the cloudbeyond the technology: privacy, trust and security in the cloud
beyond the technology: privacy, trust and security in the cloudGordon Chung
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2Gordon Chung
 
The Gnocchi Experiment
The Gnocchi ExperimentThe Gnocchi Experiment
The Gnocchi ExperimentGordon Chung
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGordon Chung
 
Ceilometer to Gnocchi
Ceilometer to GnocchiCeilometer to Gnocchi
Ceilometer to GnocchiGordon Chung
 

More from Gordon Chung (6)

Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
beyond the technology: privacy, trust and security in the cloud
beyond the technology: privacy, trust and security in the cloudbeyond the technology: privacy, trust and security in the cloud
beyond the technology: privacy, trust and security in the cloud
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
 
The Gnocchi Experiment
The Gnocchi ExperimentThe Gnocchi Experiment
The Gnocchi Experiment
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
Ceilometer to Gnocchi
Ceilometer to GnocchiCeilometer to Gnocchi
Ceilometer to Gnocchi
 

Recently uploaded

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Stabilising the jenga tower

  • 1. Stabilizing the Jenga tower: Scaling out Ceilometer Gordon Chung & Pradeep Kilambi Engineers @ Red Hat, Inc.
  • 2. Our Mission “To reliably collect measurements of the utilization of physical & virtual resources comprising deployed clouds, persist this data for subsequent retrieval & analysis, and trigger actions when defined criteria are met."
  • 3. Overview ● Collect physical and virtual resource data ● Transform data to something measurable ● Publish data to various targets ● Persist data to storage ● Retrieve data via API for further analysis, billing, triggering actions etc. Collect Transform Publish Persist Retrieve
  • 4. Architecture (Icehouse) OpenStack Services Notification Bus API External Systems Notification Agents Agent1 AgentN Agent2 Pipeline Polling Agents Agent1 AgentN Agent2 Pipeline Database Events Meters Alarms AlarmEvaluator AlarmNotifier Collectors Collector1 CollectorN Collector2 Partial HA Support Active-Active HA support
  • 5. Ceilometer as it’s perceived Ceilometer Cloud Admin
  • 6.
  • 11. “HAProxy is messing with MongoDB replica-sets”
  • 14. Architecture (Juno) OpenStack Services Notification Bus API External Systems Notification Agents Agent1 AgentN Agent2 Pipeline Polling Agents Agent1 AgentN Agent2 Pipeline Databases Alarms Events Meters AlarmEvaluator AlarmNotifier Collectors Collector1 CollectorN Collector2 Partial HA Support Active-Active HA support
  • 16. Architecture (Kilo) OpenStack Services Notification Bus API External Systems Notification Agents Agent1 AgentN Agent2 Pipeline Polling Agents Agent1 AgentN Agent2 Pipeline Databases Alarms AlarmEvaluator AlarmNotifier Collectors Collector1 CollectorN Collector2 Meters Events Active-Active HA support
  • 18. Best Practices (Data Collection) ● Modify your pipeline to match requirements ○ Collect only meters you need by tuning pipeline.yaml ○ Tweak polling interval as needed ● Enable jittering to polling (Kilo+) ● Scale out - add agents as load increases (Juno+) ● Use notifier publisher vs rpc publisher (Juno+)
  • 19. Best Practices (Data Storage) ● Avoid open-ended queries, query on a time range ● Install API behind mod_wsgi ● Tweak WSGIDaemon settings such as threads and processes ● Set a TTL, expire data to minimise database size ● Run mongodb on a separate node ○ Use sharding and replica-sets
  • 20. Different Strokes for Different Folks
  • 21. Deployment Scenarios (Lambda Design) Polling / Notification Agents Queue1 Queue2 Short-Term Database Archive Database Collector (short- term) Collector (short- term) Collector (short- term) Collector (short- term) Collector (short- term) Collector (long- term)
  • 22. Deployment Scenarios (Data Segregation) Polling / Notification Agents Queue1 Queue2 Database Audit Database Collector (short- term) Collector (short- term)Collector (public) Collector (short- term) Collector (short- term) Collector (audit)
  • 23. Deployment Scenarios (JSON Files) Polling / Notification Agents Queue1 Collector (short-term) Collector (short-term)Collector Apache Spark JSON files
  • 24. Deployment Scenarios (Fraud Detection) Polling / Notification Agents Queue Collector (short-term) Collector (short-term)Collector Proprietary Alerting System HTTP
  • 25. Deployment Scenarios (Custom consumers) Polling / Notification Agents Kafka Apache Storm
  • 26. Deployment Scenarios (Debugging) Polling / Notification Agents Event Queue Collectors ElasticSearch Kibana
  • 27. OpenStack Services Deployment Scenarios (Noisy Services) Notification Bus Notification Bus Databases Alarms Collectors Collector1 CollectorN Collector2 Meters Events Notification Agents Agent1 AgentN Agent2 Pipeline
  • 29. Liberty ● Gnocchi Integration ● Building up events ● Declarative data collection ● Minimise the bloat
  • 30. Gnocchi: Resource Metering as a Service ● Lightweight time-series metadata ● Separate storage and data models for resources and time-series data ● indexer for metrics and resources ● Eagerly pre-aggregates metric data ● Supports restricted cross-metric aggregation ● Per time-series configurable retention policy
  • 31. Size matters { "_id": ObjectId("55103dd3bf4d2c7a7de6e319"), "counter_name": "cpu", "user_id": "72bd0799d496476f9eed16d49e0b86e9", "resource_id": "d7f94857-a0d8-4864-8ab1-124055950973", "timestamp": ISODate("2015-03-23T16:22:43Z"), "message_signature": "539736605d14c0aa8c85058e6e9e67a078146f2e80a218d8dc6711c8d6875ae5", "message_id": "d559f244-d178-11e4-9fa9-28b2bd01ed52", "source": "openstack", "counter_unit": "ns", "counter_volume": NumberLong("22450000000"), "recorded_at": ISODate("2015-03-23T16:22:43.412Z"), "project_id": "99fb96cb63624163975dcbf95d7d2d6f", "resource_metadata": { "status": "active", "cpu_number": 1, "ephemeral_gb": 0, "display_name": "inst-3", "name": "instance-00000003", "disk_gb": 0, "kernel_id": "4e303a91-ae5b-43c7-b823-fd6f2cceab4e", "image": { "id": "490af6b0-2402-45d8-bcb1-c81376326e8d", "links": [ { "href": "http://10.162.32.175:8774/837660dc95324be594a0607d80a22c53/images/490af6b0-2402- 45d8-bcb1-c81376326e8d", "rel": "bookmark" } ], "name": "cirros-0.3.2-x86_64-uec" }, "ramdisk_id": "7112ea15-3ece-4805-9f23-f6141a6f27b0", "vcpus": 1, "memory_mb": 64, "instance_type": "42", ….. } { "2015-03-23T16:22:43Z" : 1 } gnocchi datapoint ceilometer datapoint (mongodb) Vs
  • 34. Gnocchi Architecture (Gnocchi) OpenStack Services Notification Bus API External Systems Notification Agents Agent1 AgentN Agent2 Pipeline Polling Agents Agent1 AgentN Agent2 Pipeline Databases Alarms Alarm Evaluator AlarmNotifier Collectors Collector1 CollectorN Collector2 Events Active-Active HA support API Metric Resources
  • 35. Discussions ● operators session - May 19, 2015 (12:05pm) Rm 306 ● design track - May 20, 2015 (9:00am - 3:30pm) o event alarms; ceilometer componentisation ● design track - May 21, 2015 (9:00am - 12:30pm) ● speaker session: o The Anatomy of an Action - May 21, 2015 (1:30pm) ● irc: #openstack-ceilometer ● mailing-list: openstack-dev@lists.openstack.org
  • 36. ● https://wiki.openstack.org/wiki/ReleaseNotes/Juno ● https://wiki.openstack.org/wiki/ReleaseNotes/Kilo ● http://nejc.saje.info/ceilometer-central-agent.html ● https://julien.danjou.info/blog/2015/openstack-gnocchi- first-release ● https://blog.sileht.net/writing-a-gnocchi-storage-driver- for-ceph.html Resources