Public and private cloud infrastructures promise to make fully dynamic infrastructure a reality – compute instances can be provisioned and terminated at a moments notice, all in response to customer demand. Though “auto-scaling” was once held as the pinnacle of infrastructure automation, it is now considered table stakes. And while this has relieved certain operational burdens (developers can now have access to “on-demand” compute!), it has also created new challenges.
3. ● Open core monitoring framework, released in 2011
● Enterprise offering launched in 2015
● Sensu Inc formed in January 2017
● 20 employees & growing!
About Sensu
4. What is Sensu?
● An open source, cloud native monitoring framework
● The monitoring router
● Infrastructure, service, and application monitoring
● Designed for automation
● Cross platform (linux, Windows, BSD, AIX, Solaris, MacOS, etc)
● Learn more: https://sensuapp.org
11. Problem Statement
● Cloud platforms and automation systems cause changes in
infrastructure that increase the complexity of monitoring
● New systems/endpoints must be discovered and monitored
automatically
● Monitoring must now distinguish the subtle differences between
"down" and "decommissioned"
12. Expectations
Our infrastructure is becoming increasingly more automated and ephemeral.
Shouldn't we expect similar capabilities from our monitoring?
21. New systems should be
monitored automatically.
Cloud Native Monitoring Requirements
22. Automated Monitoring
Cloud concepts
● Almost all infrastructures are distributed systems
● Disparate systems fulfill unique roles (e.g. db, web service)
● Simple architectures = one or more roles per system
● Complex architectures = one role per system
23. Automated Monitoring
Cloud monitoring anti-patterns
● Monitoring configuration mapped to individual systems
● Monitoring via remote access (e.g. SSH, WinRM, NRPE)
28. Terminated systems should be
automatically removed
from monitoring.
Cloud Native Monitoring Requirements
29. Automated Decommissioning
Cloud Concepts
● Utility computing incentivizes cost savings
● Decommission systems when not in use, or during reduced load
● Intentional actions look very similar to failure scenarios
30. Automated Decommissioning
Cloud monitoring anti-patterns
● Making assumptions about the lack of monitoring data
● Making assumptions about the loss of network connectivity
● Using a monitoring system as a source of absolute truth
31. Cloud-native monitoring requirements
● Should be invoked by the terminated system (i.e. stop signal)
● May be triggered by the provisioning system (i.e. via APIs)
● Optionally verified via external source(s) of truth (as needed)
● Must be the most reliable function of the monitoring system
Automated Decommissioning
32. When you can no longer trust your monitoring alerts.
34. Public/Private Cloud (IaaS)
Who knows what "the cloud" is?
Who understands basic cloud computing
concepts like ASGs and ELBs?
Who is currently using a IaaS provider like
AWS, GCP, Azure, or OpenStack?
Kubernetes
Who knows what Kubernetes is?
Who has Kubernetes on their roadmap?
Who is currently using Kubernetes?
Audience participation time!
39. Conclusion
● Cloud computing introduces challenges that demand
cloud-native monitoring solutions.
● Monitoring solutions must automatically discover new systems.
● Monitoring configuration should be applied automatically.
● Monitoring should comprehend "down" vs "decommissioned".