SlideShare a Scribd company logo
1 of 25
Download to read offline
Monitoring Node.js Microservices on CloudFoundry with
Open Source Tools and a Shoestring Budget
Tony Erwin, aerwin@us.ibm.com
Agenda
• Introduction to Bluemix UI & Architecture
• Importance of Monitoring w/ Microservices
• Overview of Monitoring Architecture
• Using Monitoring Data
• Building Your Own Monitoring System
• Synthetic Measurements
Bluemix UI
• Front-end to IBM’s open cloud Bluemix offering
• Lets users view and manage CF resources, containers,
virtual servers, user accounts, billing/usage, etc
• Runs on top of Bluemix PaaS Layer (Cloud Foundry)
Dashboard Catalog Resource Details
And
More!
Bluemix UI Architecture
• Migrated from a
monolithic to a
microservice
architecture over
the last couple of
years
• Composed of 25+
Node.js apps
deployed to Cloud
Foundry
• See talk from
earlier this week
for more details
– To Kill a Monolith:
Slaying the Demons
of a Monolith with
Node.js
Microservices on
CloudFoundry
Home Catalog … DashboardPricing
Orgs/	
Spaces
Backend	APIs	(CF,	Containers,	VMs,	BSS,	MCCP,	etc.)
Bluemix UI (Client)
Bluemix
PaaS Proxy
Common
Monitoring	
Framework
Session	
Store
NoSQL	
DB
Cloud Foundry
Importance of Monitoring
Importance of Monitoring
• Root cause analysis when a problem occurs
– Bluemix UI is most visible part of the platform and acts as a “canary in the mine shaft”
for the whole platform
– When a critical event or outage occurs, it often starts with reports like:
• “Can’t login to console”
• “Console doesn’t work…”
• “Console is slow…”
– When this happens in the middle of the night, my team is regularly the first to get a
PagerDuty
• Being able to quickly find root cause is a matter of self-preservation
– Console behavior is often (but not always!) a symptom of something going on elsewhere
(like CF is having problems, networking is down, etc.)
• Auto-detection of problems
– Ideally, we want to find and fix problems before a user hits them
– Example: Send a PagerDuty when error rates for a given API go above a threshold
• Tracking against performance and quality targets
– Can’t meet goals for something you can’t measure over time
What to Monitor?
• Metrics we were especially interested in:
– Data for every inbound/outbound request for every microservice
• Response time
• HTTP response code
• Etc.
– Memory usage, CPU usage, uptime, and crashes for every instance of every microservice
– General health of ourselves and dependencies
Monitoring Architecture
Monitoring Architecture
Monitor	
Storage
Backend	APIs	(CF,	Containers,	VMs,	BSS,	MCCP,	etc.)
Bluemix UI (Client)
Cloud Foundry
Proxy
InfluxDB
App	1
MQTT
PagerDuty,	
Slack,	etc.
… App	N
Monitor	
Alerts
Space	
Scanner
Monitoring Components
• Each microservice bound to an MQTT service (which happens to be provided by the IBM Internet of Things
service)
• Each microservice adds middleware (private npm module) that publishes inbound / outbound request data to
MQTT in a “fire and forget” manner
– Also supports a general “publish” function to send arbitrary metrics to MQTT (e.g., overall system health, number of times we
retrieve JSON from Redis cache instead of API, etc.)
• Storage microservice:
– Subscribes to the same queue, does some massaging of the data (such as tagging with URL “category”), and writes to
InfluxDB
• Alerts microservice:
– Subscribes to the same queue, aggregates the inputs over the last X minutes, and sends alerts (like Slack, PagerDuty, etc.)
• Scanner microservice:
– Calls CF APIs every 60 seconds to get data for each app instance on mem usage, CPU usage, uptime, and crashes
– Publishes the data to MQTT
• Grafana dashboards display data from data series in InfluxDB
• Details app is deployed that can pull data from InfluxDB to complement Grafana:
– Shows details of all of the requests in tabular format
– Provides capabilities to make special queries against the InfluxDB data
Using Monitoring Data
Grafana Dashboards
• Grafana
dashboards used
to visualize data
over time for any
microservice
• Data includes:
– Total requests
– Response time
(mean, median,
90% time)
– Error rate
Identifying a Problem in Grafana
• Like a
cardiologist
reading an
echocardiogram,
we’ve gotten
good at
identifying
anomalies in
these charts
• Data to left
shows a recent
“outage” where
error rates and
response times
spiked for a
period of time
Root Cause Analysis
• We can dive into more detailed data to do root cause analysis
• In chart below, response time is broken down by “category” (e.g., CF, UAA,
Containers, etc.)
• We can see time outs in a large number of components, indicating a broader
systemic issue
Details View
• Can drill down and get tabular view with aggregated details about the
requests making up a chart
• Can drill down again to see list of individual requests (with timestamps) as well as get more
detailed statistics on individual URLs
Wall of Shame
• Building on the details view from the previous page,
we can build walls of “shame” to help drive
improvements
– Show the 10 slowest API calls made to/from a specific
microservice that have been called at least 1000 times
during the last 24 hours
– Show the top 10 requests with the most error responses
that are invoked at least X times over an arbitrary time
period
– Etc.
Memory, CPU Usage, Crashes
• Another important set of data includes memory, CPU usage, and crashes for all instances of
all microservices
• Chart below shows a major CPU usage issue we found in a dev system, so was able to fix
before finding its way to production
Building Your Own Monitoring System
Node Application Metrics (appmetrics)
• Had planned on publishing some of my monitoring code,
but in prep for CF Summit learned of the appmetrics
project being driven by some fellow IBMers
• Shares much in common with the middleware I
mentioned earlier that publishes metrics to MQTT, but
goes even deeper to provide additional performance
insights
• Fully open source
– https://github.com/RuntimeTools/appmetrics
• Proves yet again that IBM is a big place J
Default Capabilities and MQTT
• Sends data to MQTT, meaning you can subscribe to updates
• Provides an Event API which allows:
– custom triggers based on the monitoring data
– publication of custom events
• This would be enough to support other pieces of the Bluemix UI monitoring system (like the
storage service or the alerts service)
App Metrics – Default Capabilities
Data Storage
• Can be configured to store data:
– Elastic Search
• https://github.com/RuntimeTools/appmetrics-elk
– StatsD
• https://github.com/RuntimeTools/appmetrics-statsd
• No support for InfluxDB yet, but I’ve suggested
to the team they should add it
Collecting Synthetic Data
Collecting Synthetic Data
• Monitoring discussed so far only
paints a picture of the server side
• It’s also important to get a
perspective from the client
• Continuously run scripts that
leverage Sitespeed.io
(https://www.sitespeed.io/) to load
the major pages of the product
• Collects data such as perf score,
first visual change, speed index,
etc. and stores in Graphite
– Grafana dashboards built to allow us
to visualize the data
– Scripts can be running from multiple
geo locations
The End
Questions?
Tony Erwin
Email: aerwin@us.ibm.com
Twitter: @tonyerwin
See also presentation from earlier this week:
To Kill a Monolith: Slaying the Demons of a Monolith
with Node.js Microservices on CloudFoundry
(http://sched.co/AJmh)

More Related Content

What's hot

REST vs. Messaging For Microservices
REST vs. Messaging For MicroservicesREST vs. Messaging For Microservices
REST vs. Messaging For MicroservicesEberhard Wolff
 
NIC - Windows Azure Pack - Level 300
NIC - Windows Azure Pack - Level 300NIC - Windows Azure Pack - Level 300
NIC - Windows Azure Pack - Level 300Kristian Nese
 
NServiceBus introduction
NServiceBus introductionNServiceBus introduction
NServiceBus introductionBoris Tveritnev
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)Meni Lubetkin
 
Getting Started with Orchestrator and Service Manager
Getting Started with Orchestrator and Service ManagerGetting Started with Orchestrator and Service Manager
Getting Started with Orchestrator and Service ManagerAlexandre Verkinderen
 
Ordina SOFTC Presentation - Desktop Virtualization
Ordina SOFTC Presentation - Desktop VirtualizationOrdina SOFTC Presentation - Desktop Virtualization
Ordina SOFTC Presentation - Desktop VirtualizationOrdina Belgium
 
10 ways to trigger runbooks from Orchestrator
10 ways to trigger runbooks from Orchestrator10 ways to trigger runbooks from Orchestrator
10 ways to trigger runbooks from OrchestratorFredrik Knalstad
 
Iib v10 performance problem determination examples
Iib v10 performance problem determination examplesIib v10 performance problem determination examples
Iib v10 performance problem determination examplesMartinRoss_IBM
 
VMware VCP7-DTM: More than just Horizon View
VMware VCP7-DTM: More than just Horizon ViewVMware VCP7-DTM: More than just Horizon View
VMware VCP7-DTM: More than just Horizon ViewMatt Crape
 
Microservices Using Docker Containers for Magento 2
Microservices Using Docker Containers for Magento 2Microservices Using Docker Containers for Magento 2
Microservices Using Docker Containers for Magento 2Schogini Systems Pvt Ltd
 
Olympus pesentation2
Olympus pesentation2Olympus pesentation2
Olympus pesentation2mskmoorthy
 
Designing distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBusDesigning distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBusMauro Servienti
 
VMware Mirage for Retail
VMware Mirage for RetailVMware Mirage for Retail
VMware Mirage for RetailKiss Tibor
 
Roll your own FOSS cloud hosting
Roll your own FOSS cloud hostingRoll your own FOSS cloud hosting
Roll your own FOSS cloud hostingRussell Searle
 
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaSWSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaSWSO2
 
Configuration management comes to Windows
Configuration management comes to WindowsConfiguration management comes to Windows
Configuration management comes to WindowsRavikanth Chaganti
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaMatt Masuda
 

What's hot (20)

REST vs. Messaging For Microservices
REST vs. Messaging For MicroservicesREST vs. Messaging For Microservices
REST vs. Messaging For Microservices
 
NIC - Windows Azure Pack - Level 300
NIC - Windows Azure Pack - Level 300NIC - Windows Azure Pack - Level 300
NIC - Windows Azure Pack - Level 300
 
NServiceBus introduction
NServiceBus introductionNServiceBus introduction
NServiceBus introduction
 
Grails in the Cloud (2013)
Grails in the Cloud (2013)Grails in the Cloud (2013)
Grails in the Cloud (2013)
 
Getting Started with Orchestrator and Service Manager
Getting Started with Orchestrator and Service ManagerGetting Started with Orchestrator and Service Manager
Getting Started with Orchestrator and Service Manager
 
SCORCH: Tying it All Together
SCORCH: Tying it All TogetherSCORCH: Tying it All Together
SCORCH: Tying it All Together
 
Ordina SOFTC Presentation - Desktop Virtualization
Ordina SOFTC Presentation - Desktop VirtualizationOrdina SOFTC Presentation - Desktop Virtualization
Ordina SOFTC Presentation - Desktop Virtualization
 
Spring cloud
Spring cloudSpring cloud
Spring cloud
 
Ios models
Ios modelsIos models
Ios models
 
10 ways to trigger runbooks from Orchestrator
10 ways to trigger runbooks from Orchestrator10 ways to trigger runbooks from Orchestrator
10 ways to trigger runbooks from Orchestrator
 
Iib v10 performance problem determination examples
Iib v10 performance problem determination examplesIib v10 performance problem determination examples
Iib v10 performance problem determination examples
 
VMware VCP7-DTM: More than just Horizon View
VMware VCP7-DTM: More than just Horizon ViewVMware VCP7-DTM: More than just Horizon View
VMware VCP7-DTM: More than just Horizon View
 
Microservices Using Docker Containers for Magento 2
Microservices Using Docker Containers for Magento 2Microservices Using Docker Containers for Magento 2
Microservices Using Docker Containers for Magento 2
 
Olympus pesentation2
Olympus pesentation2Olympus pesentation2
Olympus pesentation2
 
Designing distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBusDesigning distributed, scalable and reliable systems using NServiceBus
Designing distributed, scalable and reliable systems using NServiceBus
 
VMware Mirage for Retail
VMware Mirage for RetailVMware Mirage for Retail
VMware Mirage for Retail
 
Roll your own FOSS cloud hosting
Roll your own FOSS cloud hostingRoll your own FOSS cloud hosting
Roll your own FOSS cloud hosting
 
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaSWSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
 
Configuration management comes to Windows
Configuration management comes to WindowsConfiguration management comes to Windows
Configuration management comes to Windows
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
 

Similar to Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a Shoestring Budget

2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_iocPierre Souchay
 
Hybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHewlett-Packard
 
ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxGrace Jansen
 
Log insight technical overview customer facing (based on 3.x)
Log insight technical overview customer facing (based on 3.x)Log insight technical overview customer facing (based on 3.x)
Log insight technical overview customer facing (based on 3.x)David Pasek
 
An Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerAn Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerMongoDB
 
Cloud Foundry Technical Overview
Cloud Foundry Technical OverviewCloud Foundry Technical Overview
Cloud Foundry Technical Overviewcornelia davis
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesQAware GmbH
 
Azure Monitoring Overview
Azure Monitoring OverviewAzure Monitoring Overview
Azure Monitoring Overviewgjuljo
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringManageEngine, Zoho Corporation
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Adin Ermie
 
Microservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL MeetupMicroservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL MeetupBoaz Ziniman
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingCREATE-NET
 
Containers as Infrastructure for New Gen Apps
Containers as Infrastructure for New Gen AppsContainers as Infrastructure for New Gen Apps
Containers as Infrastructure for New Gen AppsKhalid Ahmed
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDBInfluxData
 

Similar to Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a Shoestring Budget (20)

2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc
 
Predix
PredixPredix
Predix
 
Hybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShare
 
ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptx
 
Log insight technical overview customer facing (based on 3.x)
Log insight technical overview customer facing (based on 3.x)Log insight technical overview customer facing (based on 3.x)
Log insight technical overview customer facing (based on 3.x)
 
An Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerAn Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops Manager
 
Cloud Foundry Technical Overview
Cloud Foundry Technical OverviewCloud Foundry Technical Overview
Cloud Foundry Technical Overview
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing Microservices
 
Azure Monitoring Overview
Azure Monitoring OverviewAzure Monitoring Overview
Azure Monitoring Overview
 
Un-clouding the cloud
Un-clouding the cloudUn-clouding the cloud
Un-clouding the cloud
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 
Cloud monitoring with Applications Manager
Cloud monitoring with Applications ManagerCloud monitoring with Applications Manager
Cloud monitoring with Applications Manager
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
 
Mule soft meetup____indy_may_02
Mule soft meetup____indy_may_02Mule soft meetup____indy_may_02
Mule soft meetup____indy_may_02
 
Microservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL MeetupMicroservices and Serverless for Mega Startups - DevOps IL Meetup
Microservices and Serverless for Mega Startups - DevOps IL Meetup
 
Webinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computingWebinar Monitoring in era of cloud computing
Webinar Monitoring in era of cloud computing
 
Containers as Infrastructure for New Gen Apps
Containers as Infrastructure for New Gen AppsContainers as Infrastructure for New Gen Apps
Containers as Infrastructure for New Gen Apps
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDB
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a Shoestring Budget

  • 1. Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a Shoestring Budget Tony Erwin, aerwin@us.ibm.com
  • 2. Agenda • Introduction to Bluemix UI & Architecture • Importance of Monitoring w/ Microservices • Overview of Monitoring Architecture • Using Monitoring Data • Building Your Own Monitoring System • Synthetic Measurements
  • 3. Bluemix UI • Front-end to IBM’s open cloud Bluemix offering • Lets users view and manage CF resources, containers, virtual servers, user accounts, billing/usage, etc • Runs on top of Bluemix PaaS Layer (Cloud Foundry) Dashboard Catalog Resource Details And More!
  • 4. Bluemix UI Architecture • Migrated from a monolithic to a microservice architecture over the last couple of years • Composed of 25+ Node.js apps deployed to Cloud Foundry • See talk from earlier this week for more details – To Kill a Monolith: Slaying the Demons of a Monolith with Node.js Microservices on CloudFoundry Home Catalog … DashboardPricing Orgs/ Spaces Backend APIs (CF, Containers, VMs, BSS, MCCP, etc.) Bluemix UI (Client) Bluemix PaaS Proxy Common Monitoring Framework Session Store NoSQL DB Cloud Foundry
  • 6. Importance of Monitoring • Root cause analysis when a problem occurs – Bluemix UI is most visible part of the platform and acts as a “canary in the mine shaft” for the whole platform – When a critical event or outage occurs, it often starts with reports like: • “Can’t login to console” • “Console doesn’t work…” • “Console is slow…” – When this happens in the middle of the night, my team is regularly the first to get a PagerDuty • Being able to quickly find root cause is a matter of self-preservation – Console behavior is often (but not always!) a symptom of something going on elsewhere (like CF is having problems, networking is down, etc.) • Auto-detection of problems – Ideally, we want to find and fix problems before a user hits them – Example: Send a PagerDuty when error rates for a given API go above a threshold • Tracking against performance and quality targets – Can’t meet goals for something you can’t measure over time
  • 7. What to Monitor? • Metrics we were especially interested in: – Data for every inbound/outbound request for every microservice • Response time • HTTP response code • Etc. – Memory usage, CPU usage, uptime, and crashes for every instance of every microservice – General health of ourselves and dependencies
  • 9. Monitoring Architecture Monitor Storage Backend APIs (CF, Containers, VMs, BSS, MCCP, etc.) Bluemix UI (Client) Cloud Foundry Proxy InfluxDB App 1 MQTT PagerDuty, Slack, etc. … App N Monitor Alerts Space Scanner
  • 10. Monitoring Components • Each microservice bound to an MQTT service (which happens to be provided by the IBM Internet of Things service) • Each microservice adds middleware (private npm module) that publishes inbound / outbound request data to MQTT in a “fire and forget” manner – Also supports a general “publish” function to send arbitrary metrics to MQTT (e.g., overall system health, number of times we retrieve JSON from Redis cache instead of API, etc.) • Storage microservice: – Subscribes to the same queue, does some massaging of the data (such as tagging with URL “category”), and writes to InfluxDB • Alerts microservice: – Subscribes to the same queue, aggregates the inputs over the last X minutes, and sends alerts (like Slack, PagerDuty, etc.) • Scanner microservice: – Calls CF APIs every 60 seconds to get data for each app instance on mem usage, CPU usage, uptime, and crashes – Publishes the data to MQTT • Grafana dashboards display data from data series in InfluxDB • Details app is deployed that can pull data from InfluxDB to complement Grafana: – Shows details of all of the requests in tabular format – Provides capabilities to make special queries against the InfluxDB data
  • 12. Grafana Dashboards • Grafana dashboards used to visualize data over time for any microservice • Data includes: – Total requests – Response time (mean, median, 90% time) – Error rate
  • 13. Identifying a Problem in Grafana • Like a cardiologist reading an echocardiogram, we’ve gotten good at identifying anomalies in these charts • Data to left shows a recent “outage” where error rates and response times spiked for a period of time
  • 14. Root Cause Analysis • We can dive into more detailed data to do root cause analysis • In chart below, response time is broken down by “category” (e.g., CF, UAA, Containers, etc.) • We can see time outs in a large number of components, indicating a broader systemic issue
  • 15. Details View • Can drill down and get tabular view with aggregated details about the requests making up a chart • Can drill down again to see list of individual requests (with timestamps) as well as get more detailed statistics on individual URLs
  • 16. Wall of Shame • Building on the details view from the previous page, we can build walls of “shame” to help drive improvements – Show the 10 slowest API calls made to/from a specific microservice that have been called at least 1000 times during the last 24 hours – Show the top 10 requests with the most error responses that are invoked at least X times over an arbitrary time period – Etc.
  • 17. Memory, CPU Usage, Crashes • Another important set of data includes memory, CPU usage, and crashes for all instances of all microservices • Chart below shows a major CPU usage issue we found in a dev system, so was able to fix before finding its way to production
  • 18. Building Your Own Monitoring System
  • 19. Node Application Metrics (appmetrics) • Had planned on publishing some of my monitoring code, but in prep for CF Summit learned of the appmetrics project being driven by some fellow IBMers • Shares much in common with the middleware I mentioned earlier that publishes metrics to MQTT, but goes even deeper to provide additional performance insights • Fully open source – https://github.com/RuntimeTools/appmetrics • Proves yet again that IBM is a big place J
  • 20. Default Capabilities and MQTT • Sends data to MQTT, meaning you can subscribe to updates • Provides an Event API which allows: – custom triggers based on the monitoring data – publication of custom events • This would be enough to support other pieces of the Bluemix UI monitoring system (like the storage service or the alerts service)
  • 21. App Metrics – Default Capabilities
  • 22. Data Storage • Can be configured to store data: – Elastic Search • https://github.com/RuntimeTools/appmetrics-elk – StatsD • https://github.com/RuntimeTools/appmetrics-statsd • No support for InfluxDB yet, but I’ve suggested to the team they should add it
  • 24. Collecting Synthetic Data • Monitoring discussed so far only paints a picture of the server side • It’s also important to get a perspective from the client • Continuously run scripts that leverage Sitespeed.io (https://www.sitespeed.io/) to load the major pages of the product • Collects data such as perf score, first visual change, speed index, etc. and stores in Graphite – Grafana dashboards built to allow us to visualize the data – Scripts can be running from multiple geo locations
  • 25. The End Questions? Tony Erwin Email: aerwin@us.ibm.com Twitter: @tonyerwin See also presentation from earlier this week: To Kill a Monolith: Slaying the Demons of a Monolith with Node.js Microservices on CloudFoundry (http://sched.co/AJmh)