SlideShare une entreprise Scribd logo
1  sur  31
1
2
Daniel Krook
Senior Certified IT Specialist, IBM
The IBM dashboard for operational metrics
3
We run Cloud Foundry on dozens of OpenStack VMs
Two intranet clusters
In the past year, we’ve learned how to
Classic: 38 huge VMs deployed with Chef: 1,302 users, 1,710 apps
NG: 41 medium VMs deployed with BOSH: 123 users, 247 apps
Not counting Dev deployments
All on 50+ Nova Compute nodes
• Keep Cloud Foundry running smoothly
• Discover and prevent impending problems
• Resolve unexpected issues quickly
4
1. Show the key data points we track
2. Show how our metrics dashboard helps us monitor that data
3. Share ideas on how to find better data in NG and beyond
4. Spark discussion on improved visibility for CF admins and customers.
Goals of this lightning talk
We are looking to get better at this, and help the community get better as well.
5
1. The key data
6
What are the important metrics?
Data that can be
tracked over time to see
trends and behaviors
Data that can help
us predict problems
before they happen
DEAs and apps health
 Memory reserved as a proportion of the
memory available
General health of all components
 Health of the virtual machines
 Status of the processes running on them
Database nodes and services
 Number of provisioned services against
capacity available
At the PaaS layer, that means:
7
 Deliver continuous
availability in the cloud
 Proactively solve
problems rather than
react to them
 Understand the behavior
of the system to
automate it
Why do we need metrics?
8
 NATS message bus
• Discover the components to interrogate
• Best for dynamically changing data
Where can we find them?
 Cloud Controller database (CCDB)
• Longer lived data that isn’t in the varz endpoints
9
2. Monitoring that data
10
1. Views of component health
2. Resource usage details
3. Ongoing growth trends
4. Access to logs and raw varz
5. Email notifications
Our metrics dashboard provides…
11
 Components nearing capacity or failure
 Already failed components
 Out of control apps and noisy users
 Active/inactive users and apps
 Growth trends and runtime/service adoption
It helps us find (and fix) problems
It helps us see patterns
12
User and app trends
There is also one unauthenticated page for high level stats
13
DEA list
14
DEA details
15
Service node list
16
Service node details
17
User list
18
User details
19
App list
20
App details
21
Log list
22
Log details
23
Email notifications
24
3. Finding and acting on better data
25
 NG provides granular user/org/space views…
• This enables better BSS potential in terms of QoS and departmental billing
 …But we lost user and app data linkages from the health manager
• Can’t see what DEA my app resides on (not currently enabled in our NG version)
• Can’t see how many apps a user has (replaced by orgs and spaces, but still
valuable to trace)
• See https://github.com/cloudfoundry/cloud_controller_ng/issues/81
 We’d like to restore that data, either surface it
• in varz endpoints (dynamic data, preferred) or
• CC_DB (static data, could be a security concern)
Let’s resolve gaps in data captured from NG
26
 Detect errors in applications that are traceable to users/orgs
• Preemptively reach out to them to see if they need help
• Think customer service and proactive support!
• Can we hook into to BOSH or Jenkins for automation?
 Automate (and expand links to the IaaS and SaaS stacks)
• Self healing systems (out of disk, move apps)
• Self scaling systems (detect when nearing thresholds)
• Evolving topologies (replace unused service nodes with popular ones)
Let’s begin to link metrics to automation
27
 Admins are the primary beneficiary right now
• But data is almost completely read only
• Should we provide UAA based tiers of access to admins?
 Others can and should benefit
• Customers
• End users
• Developers
• Management
• Executives, line of business owners
• Finance
Let’s expand the broadcast of metrics to more users
28
Thanks!
29
The metrics dashboard innovators
Chris Peters Russell Boykin
Doug Davis Wei Feng
30
We’re hiring!
Search Jobs at IBM by:
SmartCloud Application Services
31

Contenu connexe

Tendances

January 2015 Webinar - Wins and Successes from 2014
January 2015 Webinar -  Wins and Successes from 2014January 2015 Webinar -  Wins and Successes from 2014
January 2015 Webinar - Wins and Successes from 2014RapidScale
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataIan Foster
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationSoftServe
 
Towards Personalization in Global Digital Health
Towards Personalization in Global Digital HealthTowards Personalization in Global Digital Health
Towards Personalization in Global Digital HealthDatabricks
 
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5Splunk
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console Splunk
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easierCloverDX
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Affecto Informatica World Tour 2015: The Age of Engagement
Affecto Informatica World Tour 2015: The Age of EngagementAffecto Informatica World Tour 2015: The Age of Engagement
Affecto Informatica World Tour 2015: The Age of EngagementAffecto
 
Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk
 
RapidScale CloudMail
RapidScale CloudMailRapidScale CloudMail
RapidScale CloudMailRapidScale
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityDevOps.com
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMarkus Eisele
 
Event-driven architecture
Event-driven architectureEvent-driven architecture
Event-driven architectureAndrew Easter
 
IBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentIBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentLightbend
 
SplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSASplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSASplunk
 
SplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - StaplesSplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - StaplesSplunk
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk
 
Conferencia principal: Evolución y visión de Elastic Observability
Conferencia principal: Evolución y visión de Elastic ObservabilityConferencia principal: Evolución y visión de Elastic Observability
Conferencia principal: Evolución y visión de Elastic ObservabilityElasticsearch
 

Tendances (20)

January 2015 Webinar - Wins and Successes from 2014
January 2015 Webinar -  Wins and Successes from 2014January 2015 Webinar -  Wins and Successes from 2014
January 2015 Webinar - Wins and Successes from 2014
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
 
Towards Personalization in Global Digital Health
Towards Personalization in Global Digital HealthTowards Personalization in Global Digital Health
Towards Personalization in Global Digital Health
 
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easier
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Affecto Informatica World Tour 2015: The Age of Engagement
Affecto Informatica World Tour 2015: The Age of EngagementAffecto Informatica World Tour 2015: The Age of Engagement
Affecto Informatica World Tour 2015: The Age of Engagement
 
Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS)
 
RapidScale CloudMail
RapidScale CloudMailRapidScale CloudMail
RapidScale CloudMail
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking Observability
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systems
 
Event-driven architecture
Event-driven architectureEvent-driven architecture
Event-driven architecture
 
IBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentIBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive Development
 
SplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSASplunkLive! Customer Presentation - SSA
SplunkLive! Customer Presentation - SSA
 
SplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - StaplesSplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - Staples
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Dev ops toronto
Dev ops torontoDev ops toronto
Dev ops toronto
 
Conferencia principal: Evolución y visión de Elastic Observability
Conferencia principal: Evolución y visión de Elastic ObservabilityConferencia principal: Evolución y visión de Elastic Observability
Conferencia principal: Evolución y visión de Elastic Observability
 

En vedette

Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...Earley Information Science
 
Best Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support MetricsBest Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support Metricsdreamforce2006
 
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Cloud Foundry Deployment Tools:  BOSH vs Juju CharmsCloud Foundry Deployment Tools:  BOSH vs Juju Charms
Cloud Foundry Deployment Tools: BOSH vs Juju CharmsAltoros
 
Webinar: “KPIs in Digital Marketing” - presented by Jacques Warren
Webinar: “KPIs in Digital Marketing” - presented by Jacques WarrenWebinar: “KPIs in Digital Marketing” - presented by Jacques Warren
Webinar: “KPIs in Digital Marketing” - presented by Jacques WarrenAT Internet
 
Regulatory Reporting Dashboard
Regulatory Reporting DashboardRegulatory Reporting Dashboard
Regulatory Reporting Dashboardaccenture
 
The difference between a KPI and a Metric
The difference between a KPI and a MetricThe difference between a KPI and a Metric
The difference between a KPI and a MetricDennis Mortensen
 
Stress management in hr
Stress management in hrStress management in hr
Stress management in hr'Anuraag Ghosh
 
KPI for HR Manager - Sample of KPIs for HR
KPI for HR Manager - Sample of KPIs for HRKPI for HR Manager - Sample of KPIs for HR
KPI for HR Manager - Sample of KPIs for HRYodhia Antariksa
 
Microservices with Spring and Cloud Foundry
Microservices with Spring and Cloud FoundryMicroservices with Spring and Cloud Foundry
Microservices with Spring and Cloud Foundrymimacom
 
The 10 Most Important Banking Metrics
The 10 Most Important Banking MetricsThe 10 Most Important Banking Metrics
The 10 Most Important Banking MetricsJohn J. Maxfield
 
Developing Metrics and KPI (Key Performance Indicators
Developing Metrics and KPI (Key Performance IndicatorsDeveloping Metrics and KPI (Key Performance Indicators
Developing Metrics and KPI (Key Performance IndicatorsVictor Holman
 
KEY PERFORMANCE INDICATOR
KEY PERFORMANCE INDICATORKEY PERFORMANCE INDICATOR
KEY PERFORMANCE INDICATORspeedcars
 

En vedette (14)

Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
Meaningful Metrics - Aligning Operational Metrics with Marketing & Customer E...
 
Best Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support MetricsBest Practices in Measuring Critical Support Metrics
Best Practices in Measuring Critical Support Metrics
 
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Cloud Foundry Deployment Tools:  BOSH vs Juju CharmsCloud Foundry Deployment Tools:  BOSH vs Juju Charms
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
 
Webinar: “KPIs in Digital Marketing” - presented by Jacques Warren
Webinar: “KPIs in Digital Marketing” - presented by Jacques WarrenWebinar: “KPIs in Digital Marketing” - presented by Jacques Warren
Webinar: “KPIs in Digital Marketing” - presented by Jacques Warren
 
Regulatory Reporting Dashboard
Regulatory Reporting DashboardRegulatory Reporting Dashboard
Regulatory Reporting Dashboard
 
The difference between a KPI and a Metric
The difference between a KPI and a MetricThe difference between a KPI and a Metric
The difference between a KPI and a Metric
 
Stress management in hr
Stress management in hrStress management in hr
Stress management in hr
 
KPI for HR Manager - Sample of KPIs for HR
KPI for HR Manager - Sample of KPIs for HRKPI for HR Manager - Sample of KPIs for HR
KPI for HR Manager - Sample of KPIs for HR
 
Microservices with Spring and Cloud Foundry
Microservices with Spring and Cloud FoundryMicroservices with Spring and Cloud Foundry
Microservices with Spring and Cloud Foundry
 
The 10 Most Important Banking Metrics
The 10 Most Important Banking MetricsThe 10 Most Important Banking Metrics
The 10 Most Important Banking Metrics
 
Project Metrics & Measures
Project Metrics & MeasuresProject Metrics & Measures
Project Metrics & Measures
 
Developing Metrics and KPI (Key Performance Indicators
Developing Metrics and KPI (Key Performance IndicatorsDeveloping Metrics and KPI (Key Performance Indicators
Developing Metrics and KPI (Key Performance Indicators
 
Learning Metrics: Building Your Training Scorecard
Learning Metrics: Building Your Training ScorecardLearning Metrics: Building Your Training Scorecard
Learning Metrics: Building Your Training Scorecard
 
KEY PERFORMANCE INDICATOR
KEY PERFORMANCE INDICATORKEY PERFORMANCE INDICATOR
KEY PERFORMANCE INDICATOR
 

Similaire à The IBM dashboard for operational metrics

Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summitMatt Carroll
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Whitepaper factors to consider when selecting an open source infrastructure ...
Whitepaper  factors to consider when selecting an open source infrastructure ...Whitepaper  factors to consider when selecting an open source infrastructure ...
Whitepaper factors to consider when selecting an open source infrastructure ...apprize360
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptxRATISHKUMAR32
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools Mickey Boxell
 
Whitepaper factors to consider commercial infrastructure management vendors
Whitepaper  factors to consider commercial infrastructure management vendorsWhitepaper  factors to consider commercial infrastructure management vendors
Whitepaper factors to consider commercial infrastructure management vendorsapprize360
 
The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015Chip Childers
 
About Streaming Data Solutions for Hadoop
About Streaming Data Solutions for HadoopAbout Streaming Data Solutions for Hadoop
About Streaming Data Solutions for HadoopLynn Langit
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Adin Ermie
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsVMware Tanzu
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoringAndrew White
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
Why Monitoring and Logging are Important in DevOps.pdf
Why Monitoring and Logging are Important in DevOps.pdfWhy Monitoring and Logging are Important in DevOps.pdf
Why Monitoring and Logging are Important in DevOps.pdfDatacademy.ai
 
Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0alok khobragade
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devopsUlf Mattsson
 
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y KubernetesIntroducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y KubernetesSUSE España
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 

Similaire à The IBM dashboard for operational metrics (20)

Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summit
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Whitepaper factors to consider when selecting an open source infrastructure ...
Whitepaper  factors to consider when selecting an open source infrastructure ...Whitepaper  factors to consider when selecting an open source infrastructure ...
Whitepaper factors to consider when selecting an open source infrastructure ...
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
Whitepaper factors to consider commercial infrastructure management vendors
Whitepaper  factors to consider commercial infrastructure management vendorsWhitepaper  factors to consider commercial infrastructure management vendors
Whitepaper factors to consider commercial infrastructure management vendors
 
The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015The Architecture of Continuous Innovation - OSCON 2015
The Architecture of Continuous Innovation - OSCON 2015
 
About Streaming Data Solutions for Hadoop
About Streaming Data Solutions for HadoopAbout Streaming Data Solutions for Hadoop
About Streaming Data Solutions for Hadoop
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Why Monitoring and Logging are Important in DevOps.pdf
Why Monitoring and Logging are Important in DevOps.pdfWhy Monitoring and Logging are Important in DevOps.pdf
Why Monitoring and Logging are Important in DevOps.pdf
 
Big Data
Big DataBig Data
Big Data
 
Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y KubernetesIntroducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 

Plus de Platform CF

The Platform for Building Great Software
The Platform for Building Great SoftwareThe Platform for Building Great Software
The Platform for Building Great SoftwarePlatform CF
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to StackatoPlatform CF
 
Continuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CIContinuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CIPlatform CF
 
The Journey to Cloud Foundry
The Journey to Cloud FoundryThe Journey to Cloud Foundry
The Journey to Cloud FoundryPlatform CF
 
Pivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry ServicePivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry ServicePlatform CF
 
What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?Platform CF
 
Cloud Foundry at VMware
Cloud Foundry at VMwareCloud Foundry at VMware
Cloud Foundry at VMwarePlatform CF
 
Go Within Cloud Foundry
Go Within Cloud FoundryGo Within Cloud Foundry
Go Within Cloud FoundryPlatform CF
 
Continuous Delivery with Cloud Foundry
Continuous Delivery with Cloud FoundryContinuous Delivery with Cloud Foundry
Continuous Delivery with Cloud FoundryPlatform CF
 
From Zero To Factory
From Zero To FactoryFrom Zero To Factory
From Zero To FactoryPlatform CF
 
Service Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud ElementsService Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud ElementsPlatform CF
 
Cloud Foundry Marketplace Powered by AppDirect
Cloud Foundry MarketplacePowered by AppDirectCloud Foundry MarketplacePowered by AppDirect
Cloud Foundry Marketplace Powered by AppDirectPlatform CF
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to StackatoPlatform CF
 
Multi-site Architecture Considerations
Multi-site Architecture ConsiderationsMulti-site Architecture Considerations
Multi-site Architecture ConsiderationsPlatform CF
 
Cloud Foundry at NTT
Cloud Foundry at NTTCloud Foundry at NTT
Cloud Foundry at NTTPlatform CF
 
Building Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud ArchitectureBuilding Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud ArchitecturePlatform CF
 
Extending Cloud Foundry to .NET
Extending Cloud Foundry to .NETExtending Cloud Foundry to .NET
Extending Cloud Foundry to .NETPlatform CF
 
Cloud Foundry at Rakuten
Cloud Foundry at RakutenCloud Foundry at Rakuten
Cloud Foundry at RakutenPlatform CF
 

Plus de Platform CF (19)

The Platform for Building Great Software
The Platform for Building Great SoftwareThe Platform for Building Great Software
The Platform for Building Great Software
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to Stackato
 
Continuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CIContinuous Deployment with Cloud Foundry, Github and Travis CI
Continuous Deployment with Cloud Foundry, Github and Travis CI
 
The Journey to Cloud Foundry
The Journey to Cloud FoundryThe Journey to Cloud Foundry
The Journey to Cloud Foundry
 
Pivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry ServicePivotal HD as a Cloud Foundry Service
Pivotal HD as a Cloud Foundry Service
 
What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?What Lessons Can Cloud Foundry Teach to IaaS?
What Lessons Can Cloud Foundry Teach to IaaS?
 
Cloud Foundry at VMware
Cloud Foundry at VMwareCloud Foundry at VMware
Cloud Foundry at VMware
 
Go Within Cloud Foundry
Go Within Cloud FoundryGo Within Cloud Foundry
Go Within Cloud Foundry
 
Continuous Delivery with Cloud Foundry
Continuous Delivery with Cloud FoundryContinuous Delivery with Cloud Foundry
Continuous Delivery with Cloud Foundry
 
From Zero To Factory
From Zero To FactoryFrom Zero To Factory
From Zero To Factory
 
Service Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud ElementsService Distribution to Any Cloud - Cloud Elements
Service Distribution to Any Cloud - Cloud Elements
 
Cloud Foundry Marketplace Powered by AppDirect
Cloud Foundry MarketplacePowered by AppDirectCloud Foundry MarketplacePowered by AppDirect
Cloud Foundry Marketplace Powered by AppDirect
 
The Path to Stackato
The Path to StackatoThe Path to Stackato
The Path to Stackato
 
Multi-site Architecture Considerations
Multi-site Architecture ConsiderationsMulti-site Architecture Considerations
Multi-site Architecture Considerations
 
Intro to MoPaaS
Intro to MoPaaSIntro to MoPaaS
Intro to MoPaaS
 
Cloud Foundry at NTT
Cloud Foundry at NTTCloud Foundry at NTT
Cloud Foundry at NTT
 
Building Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud ArchitectureBuilding Opportunity with an Open Cloud Architecture
Building Opportunity with an Open Cloud Architecture
 
Extending Cloud Foundry to .NET
Extending Cloud Foundry to .NETExtending Cloud Foundry to .NET
Extending Cloud Foundry to .NET
 
Cloud Foundry at Rakuten
Cloud Foundry at RakutenCloud Foundry at Rakuten
Cloud Foundry at Rakuten
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

The IBM dashboard for operational metrics

  • 1. 1
  • 2. 2 Daniel Krook Senior Certified IT Specialist, IBM The IBM dashboard for operational metrics
  • 3. 3 We run Cloud Foundry on dozens of OpenStack VMs Two intranet clusters In the past year, we’ve learned how to Classic: 38 huge VMs deployed with Chef: 1,302 users, 1,710 apps NG: 41 medium VMs deployed with BOSH: 123 users, 247 apps Not counting Dev deployments All on 50+ Nova Compute nodes • Keep Cloud Foundry running smoothly • Discover and prevent impending problems • Resolve unexpected issues quickly
  • 4. 4 1. Show the key data points we track 2. Show how our metrics dashboard helps us monitor that data 3. Share ideas on how to find better data in NG and beyond 4. Spark discussion on improved visibility for CF admins and customers. Goals of this lightning talk We are looking to get better at this, and help the community get better as well.
  • 6. 6 What are the important metrics? Data that can be tracked over time to see trends and behaviors Data that can help us predict problems before they happen DEAs and apps health  Memory reserved as a proportion of the memory available General health of all components  Health of the virtual machines  Status of the processes running on them Database nodes and services  Number of provisioned services against capacity available At the PaaS layer, that means:
  • 7. 7  Deliver continuous availability in the cloud  Proactively solve problems rather than react to them  Understand the behavior of the system to automate it Why do we need metrics?
  • 8. 8  NATS message bus • Discover the components to interrogate • Best for dynamically changing data Where can we find them?  Cloud Controller database (CCDB) • Longer lived data that isn’t in the varz endpoints
  • 10. 10 1. Views of component health 2. Resource usage details 3. Ongoing growth trends 4. Access to logs and raw varz 5. Email notifications Our metrics dashboard provides…
  • 11. 11  Components nearing capacity or failure  Already failed components  Out of control apps and noisy users  Active/inactive users and apps  Growth trends and runtime/service adoption It helps us find (and fix) problems It helps us see patterns
  • 12. 12 User and app trends There is also one unauthenticated page for high level stats
  • 24. 24 3. Finding and acting on better data
  • 25. 25  NG provides granular user/org/space views… • This enables better BSS potential in terms of QoS and departmental billing  …But we lost user and app data linkages from the health manager • Can’t see what DEA my app resides on (not currently enabled in our NG version) • Can’t see how many apps a user has (replaced by orgs and spaces, but still valuable to trace) • See https://github.com/cloudfoundry/cloud_controller_ng/issues/81  We’d like to restore that data, either surface it • in varz endpoints (dynamic data, preferred) or • CC_DB (static data, could be a security concern) Let’s resolve gaps in data captured from NG
  • 26. 26  Detect errors in applications that are traceable to users/orgs • Preemptively reach out to them to see if they need help • Think customer service and proactive support! • Can we hook into to BOSH or Jenkins for automation?  Automate (and expand links to the IaaS and SaaS stacks) • Self healing systems (out of disk, move apps) • Self scaling systems (detect when nearing thresholds) • Evolving topologies (replace unused service nodes with popular ones) Let’s begin to link metrics to automation
  • 27. 27  Admins are the primary beneficiary right now • But data is almost completely read only • Should we provide UAA based tiers of access to admins?  Others can and should benefit • Customers • End users • Developers • Management • Executives, line of business owners • Finance Let’s expand the broadcast of metrics to more users
  • 29. 29 The metrics dashboard innovators Chris Peters Russell Boykin Doug Davis Wei Feng
  • 30. 30 We’re hiring! Search Jobs at IBM by: SmartCloud Application Services
  • 31. 31