OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro

Automating the Configuration of
Monitoring on Large Infrastructures
How monitoring of dynamic infrastructures at scale can be made easier with
Uyuni, Prometheus and Grafana
João Cavalheiro, Engineering Manager – jcavalheiro@suse.com
Johannes Renner, Engineering Manager – jrenner@suse.com

Managing IT Infrastructures is hard
● In most companies, the IT landscape is diverse and complex
● ...And nearly impossible to manage beyond a certain scale without
automation
● Modern application stacks are multi-modal: VMs and containers
spread across private and public clouds
● Different operating systems have different requirements
● Many companies require reporting and compliance
● Security is a concern
2

Enter Uyuni
Uyuni is an open-source solution for managing Linux infrastructure
● Can save you time and headaches when you have to manage and
update tens, hundreds or even thousands of machines
● Mass-deploy patches and packages based on software channels
● Consistent and repeatable provisioning and configuration of bare
metal, VMs and containers
● Automates configuration of monitoring with Prometheus and
Grafana
3

Origins: Spacewalk
● Open-source systems management solution
● Upstream for Red Hat Satellite 5, around since 2008
● Supported managing of Fedora, CentOS and Debian
● Adopted by SUSE as upstream for SUSE Manager
● Satellite 6 was built on different technologies:
∙ Spacewalk entered maintenance mode
∙ Only bugfixes, no plans for the future
∙ Many patches pending to implement modernizations!
4

Uyuni
/uju:ˈni/
“Salar de Uyuni” is the world's largest salt flat*
Image: https://www.flickr.com/photos/madeleine_h/9468953452/
Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)
* https://en.wikipedia.org/wiki/Salar_de_Uyuni

What is Salt?
● Open-source software for remote task execution and (descriptive)
configuration management
● Works on almost any platform - only Python is needed
● Typically requires an agent (minion) that connects to a master
● ZeroMQ used as default transport
● Event-driven architecture supporting automation
● Scalable, extensible and customizable
6

Uyuni: An Opinionated Fork of Spacewalk
● New backend based on Salt
● Modernized codebase (React.js, Python 3, JDK11)
● Content lifecycle management
● Container image building and Kubernetes integration
● Improved virtualization management
● Monitoring automation based on Prometheus & Grafana
8

Getting started with metrics
Main data source for alerting and visualization:
● Starting point for troubleshooting
∙ "Something looks wrong on this dashboard"
∙ Used as Service Level Indicators
● How available are we to the outside world?
∙ What are our customers experiencing?
Good metrics help to eliminate hypothesis before you investigate them.
10

About Prometheus
● Originally built at SoundCloud
● Has its own time-series database
● Data collection via pull model over HTTP
● Targets are set via static configuration or service discovery
● Metrics have a name, a set of labels, a timestamp and a value
11

Exposing Metrics
● Each application/system we want to monitor must expose metrics
● Instrumentation vs. exporters
When the metrics endpoint is embedded in an existing application it is
referred to as instrumentation.
● Extensive list of Prometheus exporters
∙ https://prometheus.io/docs/instrumenting/exporters/
∙ Node exporter is one of the most widely used
● Easy to build your own exporters
∙ You can monitor almost anything
12

Querying Metrics
● Prometheus has its own query language - PromQL
∙ PromQL is a functional expression language
∙ Allows to easily filter multidimensional time-series
● Example: HTTP internal server errors per second.. an hour ago
∙ rate(api_http_requests_total{status=500}[5m] offset 1h)
● Regex matching
∙ up{instance=~"web-server-.*"} == 0
● Used in all interactions with Prometheus (visualization, alerts)
13

Alerts
● Prometheus has its own alerting system – Alertmanager
∙ Takes care of deduplication, grouping, and routing
● Alerting rules are written in PromQL
● Supports HA setups
● Integration with email, PagerDuty and OpsGenie
● HTTP API and CLI tool: amtool
∙ Can be “plugged” into your existing scripts
14

Grafana
● Used to query and visualize metrics
● Works with Prometheus, but not only
∙ Grafana supports multiple backends
∙ It is possible to combine data from different sources in the same
dashboard
● Fully customizable
∙ Each panel has a wide variety of styling and formatting options
∙ Supports templates
∙ Collection of add-ons and pre-built dashboards
15

How to Get Started?
● Which components do I need to install?
● How to configure Prometheus and Grafana?
● How to configure my systems to expose their metrics?
● How do I get started with building dashboards?
16

Monitoring at Scale
Common data centers go beyond thousands of machines
● Different system types (physical, VMs, containers)
● Different operating systems
● A lot of different metrics from different sources
● What can be automated?
It’s not practical to manually maintain configuration files for all this
diversity!
17

Putting the Pieces Together
18

Uyuni Meets Monitoring
Automate Prometheus Monitoring with Uyuni
19

Uyuni Meets Monitoring
Single Pane of Glass for Monitoring Configuration
● Provisioning and configuration of Prometheus and Grafana
● Pre-built Grafana dashboards
● Enable exporters on managed clients using Salt Formulas
● Group systems to create common configurations
● Prometheus service discovery
● Reproducible setups
20

Coming next
● Support for Prometheus federations
● Improve the existing automation (e.g. more exporters), including:
● cadvisor for Docker containers
● libvirt exporter for KVM hypervisors
● kubernetes
● blackbox exporter
● Alerting templates
● Authentication and TLS encryption
● Automated firewall configuration
22

Questions?
23
https://www.uyuni-project.org/
github.com/uyuni-project
@UyuniProject
uyuni-announce+subscribe@opensuse.org
#uyuni @ irc.freenode.org

OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro

Recommandé

Recommandé

Contenu connexe

Similaire à OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro

Similaire à OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro (20)

Dernier

Dernier (20)

OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructures by João Cavalheiro