SlideShare une entreprise Scribd logo
1  sur  25
Sharing Sensu with Multiple Teams
Deployment & Configuration using Ansible
David Schroeder
August 23, 2018
Short story shorter
2
Overview
› Environment segregation
– Access limits
– Contacts
› Different deployment strategies
› Different thresholds
– Both keepalive and other checks
› Different checks, different platforms (even Windows)
› API calls
– Creating silence
– Gather check results
"Can Sensu do #{this_thing}?"
3
Team Requirements
› Sensu Enterprise RBAC!
› Contact routing!
› Check parameter tokenization!
› API tokens!
› Custom configuration anywhere and everywhere!
"Sensu can do #{this_thing}!"
4
Team Requirements
5
sensu-client sensu-server sensu-enterprise rabbitmq-server
› Installs & configures
› Satisfies dependencies
› Creates client.json
– Maintenance mode
› Configures checks
– Pub/sub
– Aggregate
– API endpoint
– Ping
› Installs handlers & stand-
alone check scripts
› Configures handlers
› Configures contacts
› Installs Sensu Enterprise
› Configures API
› Configures dashboard
– RBAC through LDAP
› Installs and configures
RabbitMQ cluster
› Installs and configures
Redis Sentinel
› Fetches certificates
6
Ansible Roles
sensu-winclient
› Generates configuration
› Bundles installer &
dependencies
sensu-standalone
› Subrepo of community
sensu-ansible role
redis-server
› Installs and configures
Redis
› Installs and configures
Graphite
sensu-client sensu-server sensu-enterprise rabbitmq-server
› Installs & configures
› Satisfies dependencies
› Creates client.json
– Maintenance mode
› Configures checks
– Pub/sub
– Aggregate
– API endpoint
– Ping
› Installs handlers & stand-
alone check scripts
› Configures handlers
› Configures contacts
› Installs Sensu Enterprise
› Configures API
› Configures dashboard
– RBAC through LDAP
› Installs and configures
RabbitMQ cluster
› Installs and configures
Redis Sentinel
› Fetches certificates
7
Ansible Roles
sensu-winclient
› Generates configuration
› Bundles installer &
dependencies
sensu-standalone
› Subrepo of community
sensu-ansible role
redis-server
› Installs and configures
Redis
› Installs and configures
Graphite
› Shared role, "galaxy" style
› Included as 'subrepo'
› sensu/
– group_vars/
▪ framework_pdx_dev/
▪ framework_pdx_stage/
▪ framework_pdx_prod/
▪ sensu_one/
▪ sensu_two/
– roles/
▪ sensu_client/
▪ sensu_winclient/
▪ sensu_server/
▪ sensu_enterprise/
Drilling Down
8
Ansible Structure
› Team Environments
› sensu/
– group_vars/
▪ framework_pdx_dev/
▪ framework_pdx_stage/
▪ framework_pdx_prod/
▪ sensu_one/
▪ sensu_two/
– roles/
▪ sensu_client/
▪ sensu_winclient/
▪ sensu_server/
▪ sensu_enterprise/
Drilling Down
9
Ansible Structure
› Sensu Clusters
› sensu/
– group_vars/
▪ infrastructure_pdx_dev/
– main.yml
– vault.yml
Per Environment
10
Ansible Structure
---
### Environment Definitions ###########################################
host_subscriptions:
- "basic"
- "framework"
- "framework_pdx_dev"
host_environment: "framework_pdx_dev"
host_contact: "framework"
# Keepalive thresholds: number of seconds before warning or alerting
keepalive_warn: 150
keepalive_crit: 210
# Set re-notification time (in seconds) for keepalive alarms. Default is 300.
keepalive_refresh: 3600
› sensu/
– group_vars/
▪ infrastructure_pdx_dev/
– main.yml
– vault.yml
Per Environment
11
Ansible Structure
# To add a subscription based on server role as included in the hostname,
# include the subscription name as the key, and hostname pattern as the
# value. Be sure to escape out backslashes.
role_patterns:
framework_zeromq: "-mq00d"
framework_utility: "^utly"
# Enable Sensu client socket commands
enable_client_socket: true
# Custom client-side configuration
custom_client_configs:
checks:
check_ram:
warning: 101
critical: 100
› sensu/
– group_vars/
▪ infrastructure_pdx_dev/
– main.yml
– vault.yml
Per Environment
12
Ansible Structure
### Communicating with Sensu ##########################################
# Hostname or IP address of the graphite API server for graph rendering
graphite_server: "172.16.20.100"
rabbitmq_params:
port: 5671
user: "sensu"
pass: "{{ vault_rabbitmq['password'] }}"
host1: "172.16.20.101"
host1_cert: "{{ vault_rabbitmq['host1_cert'] }}"
host1_key: "{{ vault_rabbitmq['host1_key'] }}"
host2: "172.16.20.102"
host2_cert: "{{ vault_rabbitmq['host2_cert'] }}"
host2_key: "{{ vault_rabbitmq['host2_key'] }}"
host3: "172.16.20.103"
host3_cert: "{{ vault_rabbitmq['host3_cert'] }}"
host3_key: "{{ vault_rabbitmq['host3_key'] }}"
› sensu/
– group_vars/
▪ sensu_one/
– main.yml
– vault.yml
– aggregatechecks.yml
– endpoints.yml
– handlers.yml
– pingchecks.yml
– site_checks.yml
Sensu Clusters
13
Ansible Structure
ldap:
server: "auth.somewhere.out.there"
port: 636
roles:
framework_team:
name: "framework_team"
readonly: "false"
members:
- "framework"
datacenters: []
subscriptions:
- "framework"
› sensu/
– group_vars/
▪ sensu_one/
– main.yml
– vault.yml
– aggregatechecks.yml
– endpoints.yml
– handlers.yml
– pingchecks.yml
– site_checks.yml
Sensu Clusters
14
Ansible Structure
ldap:
roles:
jenkins_api:
name: "jenkins_api"
readonly: "false"
token: "{{ vault_ldap.jenkins_api.token }}"
members: []
datacenters: []
subscriptions: []
methods:
get:
- aggregates
- clients
- silenced
post:
- silenced
› sensu/
– group_vars/
▪ sensu_one/
– main.yml
– vault.yml
– aggregatechecks.yml
– endpoints.yml
– handlers.yml
– pingchecks.yml
– site_checks.yml
Sensu Clusters
15
Ansible Structure
handler_contacts:
- contacts.json:
contacts:
framework:
hipchatter:
api_token: ChahL8XeiphohBi2eiceiseehaele5eu1aesahyuu
room: 1234
mailer:
mail_to: frameworkteam.dl@wherever.com
sensu_admin:
hipchatter:
api_token: Aivoubah0iexi6eyioQu0eeThee2Aenu6kohw4qui
room: 2345
mailer:
mail_to: sensuteam.dl@wherever.com
› sensu/
– roles/sensu-server/
▪ vars/
– main.yml
– checks.yml
– filters.yml
– mutators.yml
Sensu Server Role
16
Ansible Structure
pubsub_checks:
# Basic Checks
- check_ram.json:
checks:
check_ram:
command: "check-memory-percent.rb –w :::custom.checks.check_ram.warning|95:::
-c :::custom.checks.check_ram.critical|98:::"
interval: "{{ default_interval }}"
subscribers:
- basic
handlers: "{{ default_handlers }}"
occurrences: 5
refresh: "{{ default_renotify }}"
runbook: "{{ runbook_base_url }}/check_ram"
graph: "http://{{ graphite_server }}/render?from={{ graph_time }}&until=now&{{
graph_size}}&target=:::environment:::.:::graphname:::.memory.usedWOBuffersCaches&title=Mem
ory+Used+Without+Buffers+and+Caches&uchiwa_force_image=.jpg"
17
Pull
Request
Code
Review
Client
Deployment
Server
Deployment
Win!
18
Sensu Change Workflow
Problems? Let's be honest: yes.
Classification goes here 19
Ongoing Challenges
API calls
Limited availability in RBAC01
Dashboard
Missing hosts in Events list02
Cleanup
Old checks, forgotten hosts03
Bottlenecks
04
20
Ongoing Challenges
API calls
Limited availability in LDAP RBAC01
21
› Works through RBAC, but without subscription limitations:
– /clients
– /clients/:client/history (deprecated)
– /events (returns all events)
– /silenced (POST ignores 'begin' field)
› Does not work at all through RBAC layer"
– /results
– /events/:client/
– /silenced/subscriptions/:subscription
– /silenced/checks/:check
– ?filter
› Good news: support in Sensu 2.0!
Ongoing Challenges
Dashboard
Missing hosts in Events list02
22
› If a host matches a subscription in RBAC, but the alerting
check does not, it is not visible on the Events page
Ongoing Challenges
Cleanup
Old checks, forgotten hosts03
23
Ongoing Challenges
Bottlenecks
04
24
This guy!
Thank you
#monitoringlove

Contenu connexe

Similaire à Sharing Sensu with Multiple Teams using Ansible

Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationSean Chittenden
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski buildacloud
 
Architecting Secure and Compliant Applications with MongoDB
Architecting Secure and Compliant Applications with MongoDB        Architecting Secure and Compliant Applications with MongoDB
Architecting Secure and Compliant Applications with MongoDB MongoDB
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsOhad Kravchick
 
Practical Chef and Capistrano for Your Rails App
Practical Chef and Capistrano for Your Rails AppPractical Chef and Capistrano for Your Rails App
Practical Chef and Capistrano for Your Rails AppSmartLogic
 
CI/CD and TDD in deploying kamailio
CI/CD and TDD in deploying kamailioCI/CD and TDD in deploying kamailio
CI/CD and TDD in deploying kamailioAleksandar Sosic
 
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...Andrey Devyatkin
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackke4qqq
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Pavel Chunyayev
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup Alberto Paro
 
Gradle: The Build System you have been waiting for!
Gradle: The Build System you have been waiting for!Gradle: The Build System you have been waiting for!
Gradle: The Build System you have been waiting for!Corneil du Plessis
 
Nomad Multi-Cloud
Nomad Multi-CloudNomad Multi-Cloud
Nomad Multi-CloudNic Jackson
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey GordeychikCODE BLUE
 
Content Staging in Drupal 8
Content Staging in Drupal 8Content Staging in Drupal 8
Content Staging in Drupal 8Dick Olsson
 
Living the Nomadic life - Nic Jackson
Living the Nomadic life - Nic JacksonLiving the Nomadic life - Nic Jackson
Living the Nomadic life - Nic JacksonParis Container Day
 
Service discovery and configuration provisioning
Service discovery and configuration provisioningService discovery and configuration provisioning
Service discovery and configuration provisioningSource Ministry
 
Openshift operator insight
Openshift operator insightOpenshift operator insight
Openshift operator insightRyan ZhangCheng
 
Asian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On UblAsian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On Ublnewrforce
 

Similaire à Sharing Sensu with Multiple Teams using Ansible (20)

Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
 
Architecting Secure and Compliant Applications with MongoDB
Architecting Secure and Compliant Applications with MongoDB        Architecting Secure and Compliant Applications with MongoDB
Architecting Secure and Compliant Applications with MongoDB
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
 
Practical Chef and Capistrano for Your Rails App
Practical Chef and Capistrano for Your Rails AppPractical Chef and Capistrano for Your Rails App
Practical Chef and Capistrano for Your Rails App
 
Sails.js Intro
Sails.js IntroSails.js Intro
Sails.js Intro
 
CI/CD and TDD in deploying kamailio
CI/CD and TDD in deploying kamailioCI/CD and TDD in deploying kamailio
CI/CD and TDD in deploying kamailio
 
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
HashiConf Digital 2020: HashiCorp Vault configuration as code via HashiCorp T...
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStack
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup ElasticSearch 5.x -  New Tricks - 2017-02-08 - Elasticsearch Meetup
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
 
Gradle: The Build System you have been waiting for!
Gradle: The Build System you have been waiting for!Gradle: The Build System you have been waiting for!
Gradle: The Build System you have been waiting for!
 
Nomad Multi-Cloud
Nomad Multi-CloudNomad Multi-Cloud
Nomad Multi-Cloud
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
 
Content Staging in Drupal 8
Content Staging in Drupal 8Content Staging in Drupal 8
Content Staging in Drupal 8
 
Living the Nomadic life - Nic Jackson
Living the Nomadic life - Nic JacksonLiving the Nomadic life - Nic Jackson
Living the Nomadic life - Nic Jackson
 
Service discovery and configuration provisioning
Service discovery and configuration provisioningService discovery and configuration provisioning
Service discovery and configuration provisioning
 
Openshift operator insight
Openshift operator insightOpenshift operator insight
Openshift operator insight
 
ivanova-samba_backend.pdf
ivanova-samba_backend.pdfivanova-samba_backend.pdf
ivanova-samba_backend.pdf
 
Asian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On UblAsian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On Ubl
 

Plus de Sensu Inc.

Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...Sensu Inc.
 
Monitoring Graceful Failure
Monitoring Graceful FailureMonitoring Graceful Failure
Monitoring Graceful FailureSensu Inc.
 
The Bonsai Asset Index : A new way for the community to share resources
The Bonsai Asset Index : A new way for the community to share resourcesThe Bonsai Asset Index : A new way for the community to share resources
The Bonsai Asset Index : A new way for the community to share resourcesSensu Inc.
 
PPB's Sensu Journey
PPB's Sensu JourneyPPB's Sensu Journey
PPB's Sensu JourneySensu Inc.
 
Testing and monitoring and broken things
Testing and monitoring and broken thingsTesting and monitoring and broken things
Testing and monitoring and broken thingsSensu Inc.
 
Order from chaos: automating monitoring configuration
Order from chaos: automating monitoring configurationOrder from chaos: automating monitoring configuration
Order from chaos: automating monitoring configurationSensu Inc.
 
Keynote: Measuring the right things
Keynote: Measuring the right thingsKeynote: Measuring the right things
Keynote: Measuring the right thingsSensu Inc.
 
Keynote: Scaling Sensu Go
Keynote: Scaling Sensu GoKeynote: Scaling Sensu Go
Keynote: Scaling Sensu GoSensu Inc.
 
Keynote: Sensu as a multi-cloud monitoring control plane
Keynote: Sensu as a multi-cloud monitoring control planeKeynote: Sensu as a multi-cloud monitoring control plane
Keynote: Sensu as a multi-cloud monitoring control planeSensu Inc.
 
AIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital TransformationAIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital TransformationSensu Inc.
 
Ecosystem session: Sensu + Puppet
Ecosystem session: Sensu + PuppetEcosystem session: Sensu + Puppet
Ecosystem session: Sensu + PuppetSensu Inc.
 
Herding cats & catching fire: Workday's telemetry & middleware
Herding cats & catching fire: Workday's telemetry & middlewareHerding cats & catching fire: Workday's telemetry & middleware
Herding cats & catching fire: Workday's telemetry & middlewareSensu Inc.
 
7 Years of Sensu: Then, Now, and Soon
7 Years of Sensu: Then, Now, and Soon7 Years of Sensu: Then, Now, and Soon
7 Years of Sensu: Then, Now, and SoonSensu Inc.
 
Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Sensu Inc.
 
Assets in Sensu 2.0
Assets in Sensu 2.0Assets in Sensu 2.0
Assets in Sensu 2.0Sensu Inc.
 
The Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to SensuThe Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to SensuSensu Inc.
 
Project 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and MessagingProject 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and MessagingSensu Inc.
 
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & SensuWhere's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & SensuSensu Inc.
 
Reimagining Sensu
Reimagining SensuReimagining Sensu
Reimagining SensuSensu Inc.
 
Alert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course CorrectionAlert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course CorrectionSensu Inc.
 

Plus de Sensu Inc. (20)

Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
 
Monitoring Graceful Failure
Monitoring Graceful FailureMonitoring Graceful Failure
Monitoring Graceful Failure
 
The Bonsai Asset Index : A new way for the community to share resources
The Bonsai Asset Index : A new way for the community to share resourcesThe Bonsai Asset Index : A new way for the community to share resources
The Bonsai Asset Index : A new way for the community to share resources
 
PPB's Sensu Journey
PPB's Sensu JourneyPPB's Sensu Journey
PPB's Sensu Journey
 
Testing and monitoring and broken things
Testing and monitoring and broken thingsTesting and monitoring and broken things
Testing and monitoring and broken things
 
Order from chaos: automating monitoring configuration
Order from chaos: automating monitoring configurationOrder from chaos: automating monitoring configuration
Order from chaos: automating monitoring configuration
 
Keynote: Measuring the right things
Keynote: Measuring the right thingsKeynote: Measuring the right things
Keynote: Measuring the right things
 
Keynote: Scaling Sensu Go
Keynote: Scaling Sensu GoKeynote: Scaling Sensu Go
Keynote: Scaling Sensu Go
 
Keynote: Sensu as a multi-cloud monitoring control plane
Keynote: Sensu as a multi-cloud monitoring control planeKeynote: Sensu as a multi-cloud monitoring control plane
Keynote: Sensu as a multi-cloud monitoring control plane
 
AIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital TransformationAIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital Transformation
 
Ecosystem session: Sensu + Puppet
Ecosystem session: Sensu + PuppetEcosystem session: Sensu + Puppet
Ecosystem session: Sensu + Puppet
 
Herding cats & catching fire: Workday's telemetry & middleware
Herding cats & catching fire: Workday's telemetry & middlewareHerding cats & catching fire: Workday's telemetry & middleware
Herding cats & catching fire: Workday's telemetry & middleware
 
7 Years of Sensu: Then, Now, and Soon
7 Years of Sensu: Then, Now, and Soon7 Years of Sensu: Then, Now, and Soon
7 Years of Sensu: Then, Now, and Soon
 
Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...
 
Assets in Sensu 2.0
Assets in Sensu 2.0Assets in Sensu 2.0
Assets in Sensu 2.0
 
The Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to SensuThe Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to Sensu
 
Project 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and MessagingProject 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and Messaging
 
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & SensuWhere's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
 
Reimagining Sensu
Reimagining SensuReimagining Sensu
Reimagining Sensu
 
Alert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course CorrectionAlert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course Correction
 

Dernier

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Dernier (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Sharing Sensu with Multiple Teams using Ansible

  • 1. Sharing Sensu with Multiple Teams Deployment & Configuration using Ansible David Schroeder August 23, 2018
  • 3. › Environment segregation – Access limits – Contacts › Different deployment strategies › Different thresholds – Both keepalive and other checks › Different checks, different platforms (even Windows) › API calls – Creating silence – Gather check results "Can Sensu do #{this_thing}?" 3 Team Requirements
  • 4. › Sensu Enterprise RBAC! › Contact routing! › Check parameter tokenization! › API tokens! › Custom configuration anywhere and everywhere! "Sensu can do #{this_thing}!" 4 Team Requirements
  • 5. 5
  • 6. sensu-client sensu-server sensu-enterprise rabbitmq-server › Installs & configures › Satisfies dependencies › Creates client.json – Maintenance mode › Configures checks – Pub/sub – Aggregate – API endpoint – Ping › Installs handlers & stand- alone check scripts › Configures handlers › Configures contacts › Installs Sensu Enterprise › Configures API › Configures dashboard – RBAC through LDAP › Installs and configures RabbitMQ cluster › Installs and configures Redis Sentinel › Fetches certificates 6 Ansible Roles sensu-winclient › Generates configuration › Bundles installer & dependencies sensu-standalone › Subrepo of community sensu-ansible role redis-server › Installs and configures Redis › Installs and configures Graphite
  • 7. sensu-client sensu-server sensu-enterprise rabbitmq-server › Installs & configures › Satisfies dependencies › Creates client.json – Maintenance mode › Configures checks – Pub/sub – Aggregate – API endpoint – Ping › Installs handlers & stand- alone check scripts › Configures handlers › Configures contacts › Installs Sensu Enterprise › Configures API › Configures dashboard – RBAC through LDAP › Installs and configures RabbitMQ cluster › Installs and configures Redis Sentinel › Fetches certificates 7 Ansible Roles sensu-winclient › Generates configuration › Bundles installer & dependencies sensu-standalone › Subrepo of community sensu-ansible role redis-server › Installs and configures Redis › Installs and configures Graphite › Shared role, "galaxy" style › Included as 'subrepo'
  • 8. › sensu/ – group_vars/ ▪ framework_pdx_dev/ ▪ framework_pdx_stage/ ▪ framework_pdx_prod/ ▪ sensu_one/ ▪ sensu_two/ – roles/ ▪ sensu_client/ ▪ sensu_winclient/ ▪ sensu_server/ ▪ sensu_enterprise/ Drilling Down 8 Ansible Structure › Team Environments
  • 9. › sensu/ – group_vars/ ▪ framework_pdx_dev/ ▪ framework_pdx_stage/ ▪ framework_pdx_prod/ ▪ sensu_one/ ▪ sensu_two/ – roles/ ▪ sensu_client/ ▪ sensu_winclient/ ▪ sensu_server/ ▪ sensu_enterprise/ Drilling Down 9 Ansible Structure › Sensu Clusters
  • 10. › sensu/ – group_vars/ ▪ infrastructure_pdx_dev/ – main.yml – vault.yml Per Environment 10 Ansible Structure --- ### Environment Definitions ########################################### host_subscriptions: - "basic" - "framework" - "framework_pdx_dev" host_environment: "framework_pdx_dev" host_contact: "framework" # Keepalive thresholds: number of seconds before warning or alerting keepalive_warn: 150 keepalive_crit: 210 # Set re-notification time (in seconds) for keepalive alarms. Default is 300. keepalive_refresh: 3600
  • 11. › sensu/ – group_vars/ ▪ infrastructure_pdx_dev/ – main.yml – vault.yml Per Environment 11 Ansible Structure # To add a subscription based on server role as included in the hostname, # include the subscription name as the key, and hostname pattern as the # value. Be sure to escape out backslashes. role_patterns: framework_zeromq: "-mq00d" framework_utility: "^utly" # Enable Sensu client socket commands enable_client_socket: true # Custom client-side configuration custom_client_configs: checks: check_ram: warning: 101 critical: 100
  • 12. › sensu/ – group_vars/ ▪ infrastructure_pdx_dev/ – main.yml – vault.yml Per Environment 12 Ansible Structure ### Communicating with Sensu ########################################## # Hostname or IP address of the graphite API server for graph rendering graphite_server: "172.16.20.100" rabbitmq_params: port: 5671 user: "sensu" pass: "{{ vault_rabbitmq['password'] }}" host1: "172.16.20.101" host1_cert: "{{ vault_rabbitmq['host1_cert'] }}" host1_key: "{{ vault_rabbitmq['host1_key'] }}" host2: "172.16.20.102" host2_cert: "{{ vault_rabbitmq['host2_cert'] }}" host2_key: "{{ vault_rabbitmq['host2_key'] }}" host3: "172.16.20.103" host3_cert: "{{ vault_rabbitmq['host3_cert'] }}" host3_key: "{{ vault_rabbitmq['host3_key'] }}"
  • 13. › sensu/ – group_vars/ ▪ sensu_one/ – main.yml – vault.yml – aggregatechecks.yml – endpoints.yml – handlers.yml – pingchecks.yml – site_checks.yml Sensu Clusters 13 Ansible Structure ldap: server: "auth.somewhere.out.there" port: 636 roles: framework_team: name: "framework_team" readonly: "false" members: - "framework" datacenters: [] subscriptions: - "framework"
  • 14. › sensu/ – group_vars/ ▪ sensu_one/ – main.yml – vault.yml – aggregatechecks.yml – endpoints.yml – handlers.yml – pingchecks.yml – site_checks.yml Sensu Clusters 14 Ansible Structure ldap: roles: jenkins_api: name: "jenkins_api" readonly: "false" token: "{{ vault_ldap.jenkins_api.token }}" members: [] datacenters: [] subscriptions: [] methods: get: - aggregates - clients - silenced post: - silenced
  • 15. › sensu/ – group_vars/ ▪ sensu_one/ – main.yml – vault.yml – aggregatechecks.yml – endpoints.yml – handlers.yml – pingchecks.yml – site_checks.yml Sensu Clusters 15 Ansible Structure handler_contacts: - contacts.json: contacts: framework: hipchatter: api_token: ChahL8XeiphohBi2eiceiseehaele5eu1aesahyuu room: 1234 mailer: mail_to: frameworkteam.dl@wherever.com sensu_admin: hipchatter: api_token: Aivoubah0iexi6eyioQu0eeThee2Aenu6kohw4qui room: 2345 mailer: mail_to: sensuteam.dl@wherever.com
  • 16. › sensu/ – roles/sensu-server/ ▪ vars/ – main.yml – checks.yml – filters.yml – mutators.yml Sensu Server Role 16 Ansible Structure pubsub_checks: # Basic Checks - check_ram.json: checks: check_ram: command: "check-memory-percent.rb –w :::custom.checks.check_ram.warning|95::: -c :::custom.checks.check_ram.critical|98:::" interval: "{{ default_interval }}" subscribers: - basic handlers: "{{ default_handlers }}" occurrences: 5 refresh: "{{ default_renotify }}" runbook: "{{ runbook_base_url }}/check_ram" graph: "http://{{ graphite_server }}/render?from={{ graph_time }}&until=now&{{ graph_size}}&target=:::environment:::.:::graphname:::.memory.usedWOBuffersCaches&title=Mem ory+Used+Without+Buffers+and+Caches&uchiwa_force_image=.jpg"
  • 17. 17
  • 19. Problems? Let's be honest: yes. Classification goes here 19
  • 20. Ongoing Challenges API calls Limited availability in RBAC01 Dashboard Missing hosts in Events list02 Cleanup Old checks, forgotten hosts03 Bottlenecks 04 20
  • 21. Ongoing Challenges API calls Limited availability in LDAP RBAC01 21 › Works through RBAC, but without subscription limitations: – /clients – /clients/:client/history (deprecated) – /events (returns all events) – /silenced (POST ignores 'begin' field) › Does not work at all through RBAC layer" – /results – /events/:client/ – /silenced/subscriptions/:subscription – /silenced/checks/:check – ?filter › Good news: support in Sensu 2.0!
  • 22. Ongoing Challenges Dashboard Missing hosts in Events list02 22 › If a host matches a subscription in RBAC, but the alerting check does not, it is not visible on the Events page

Notes de l'éditeur

  1. Hi, David Schroeder, Viasat. Last year, I talked about migrating my team from a Nagios-based monitoring solution over to Sensu. And in that talk I touched on an unexpected side-effect of the migration...
  2. Popularity. Other teams saw what we were doing, found it satisfactory, and wanted in on the action. My Ansible playbook for Sensu was only geared toward the one team, so I needed to expand it, make sure it worked for everybody, and there were many lessons learned along the way.
  3. The different teams would of course need to have their own environments, reasonably separated from the other teams, with their own logins and contact profiles. They may have different ways of deploying playbooks on their servers. That's OK, gotta work with that. They may want keepalive alarms to re-notify every hour, or every 10 minutes. They may want the standard memory check on their systems to never warn and go critical at 99%. Others may want different thresholds. They're of course going to have their own unique checks, and they may have different platforms to support, multiple Linux distros, even Windows. Yeah, Windows. And, finally, they're gonna want to be able to access server states and create silence automatically, using API calls, so we need to be able to support that, too.
  4. Fortunately, the answer to all these requests is "yes." Yes, Sensu Enterprise has built-in RBAC to limit access to certain environments per team. Yes, we can use contact routing to support different alert recipients per team. Yes, we can tokenize parameters to checks, and we can have them differ per environment, if that's what they need. We can use API tokens expose some functionality to other teams. And Sensu's flexible approach to configuration parameters means I can inject configs anywhere, and pick them up where needed. This is a huge boon to customization, and one of the things that makes this platform so versatile.
  5. But I'm not here to talk about these features and capabilities, I covered a lot of that in my talk last year. I want to dig into actual implementation in Ansible. Here's how it works.
  6. There are seven separate roles. The ones that get the most action are sensu-client and sensu-server. sensu-client is what another team would run to add new clients to a Sensu monitoring cluster, or make changes to existing clients. Sometimes, that's all they need. There are a bunch of default checks and thresholds that cover all the usual monitoring basics. But other times, they might need to propose changes to this role, perhaps a task to lay down a configuration file used by a check, or to install a new check script. I'll talk about that process later. sensu-server's job is to properly configure the server cluster. If there's a new check to be defined, here's where it's done. The Sensu servers also handle standalone checks like alerting on aggregates, or performing ping checks. Mostly, though, it's pub/sub. If there are contact changes, new team, different e-mail address, new chatroom, this role handles those, too. Most of the handlers we use are custom, so they're stored and applied here, too. Some of the other roles: sensu-enterprise is all about installing and configuring Sensu Enterprise. Outside of building a new cluster, this role is used for applying RBAC-related changes to the dashboard config and adding new API tokens. rabbitmq-server and redis-server are only used when building a new cluster... The RabbitMQ role has the extra step of downloading the auto-generated SSL certs, which are vaulted and given to the clients. sensu-standalone is a way to build a single server running the community edition of Sensu, it's actually the community sensu-ansible role included as a submodule, and the standalone playbook is helpful for development and testing, since the sensu-client and sensu-server roles can work over the top of it... though, I have to say, sensu-ansible has matured significantly since I last tested that out, two years ago, and I'm not sure if it still works here. sensu-winclient is a separate beast. It's barely used, and has a sort of backwards way of deploying Windows clients.
  7. sensu-client has actually been broken-out as a Galaxy-style role, included among the rest a sort of submodule. Some teams only use Galaxy-style roles for deployment, so they needed this available separately.
  8. We follow Ansible's best practices in terms of directory structure, with subdirectories for group_vars/ and roles/. When a team wants to come on board, or wants to add a group of servers, which we call "environments", each logical division gets its own subdirectory in group_vars. This example up here on the screen shows a fictional framework team with a Portland datacenter, and separate environments for dev, staging, and production. If it makes sense to have different check parameters or thresholds or contact profiles, they get their own environment, and therefore their own subdirectory under group_vars. There are a lot of these. Dozens. Typically, if a team adds an environment, or another team comes on board, they just copy an existing directory to use as a template.
  9. The Sensu clusters themselves get their own directories inside of group_vars/. These behave very much like the other environments, but of course have some extra stuff to configure Sensu itself.
  10. I wanted to give you some examples of how each environment may be constructed. Inside each subdirectory under group_vars, there are two files, main and vault. main has all the parameters, vault stores the sensitive ones, like passwords and RabbitMQ certificates. Looking at main.yml here, first, if there are any subscriptions that should be applied to all servers in this environment, those are added here. The environment name is specified, this is used for several things, like building aggregates or subdividing the metric scheme for graphite graphs. The contact name is important, basically the name of the team which is responsible for these servers. Separate keepalive thresholds can be set, warning and critical, and it's nice to vary the re-notify time as well. This example here is a dev environment, you don't need these servers to re-notify quite as often as, say, a production environment.
  11. Digging deeper into main.yml, I've found that assigning different subscriptions based off of the hostname has been helpful, using pattern matching to add clients in this environment to one subscription or another. Like if your hostname has 'db' in the name, you may want to add a subscription to certain database checks. If they need socket commands enabled, by default this is not open, hardly anybody I support uses them, but that option is here. Also in this file, you can set custom check parameters. So here in this example, warning is effectively disabled, and it'll only go critical when the memory percent is maxed out. There are defaults for the memory check, and this overrides them.
  12. The last part of the config in main.yml defines how it talks to Sensu. There are multiple clusters available, so telling it to use the right Graphite server and RabbitMQ servers, along with the vaulted certificates, is essential.
  13. The Sensu cluster environments have a main.yml file which includes all the stuff that the other client environments have, RabbitMQ certs and hostname-pattern-matched subscriptions and the like, but we also have the RBAC confuration here, so I wanted to show you what that looks like. This example shows a typical team declaration, the fictional "framework" team, giving them full UI access to all of their servers, which as we saw above, include the "framework" subscription. One subscription name per team is nice to have, so you don't have to list out, you know, dev and staging and production as separate subscriptions here.
  14. Here is an API token example, providing a token to perform certain Sensu API calls, GETs and POSTs. It's nice to be able to lock this stuff down.
  15. There's a file which contains the contact routing information, different parameters for different teams, with the names matching the contact name you saw in the client config.
  16. I wanted to show an example of a check definition, too. This one is the standard memory check, which leverages the optional parameters which can be specified per environment. This 'basic' subscription, everybody gets that, so it's nice to be able to tweak the threshold depending on a what certain team or environment needs. I also make sure each check includes a runbook link, and if possible, a graph. Not shown here, because I couldn't fit it in this mess, is the metric definition that populates this graph. But being able to embed graphs into the check results page in the UI is a great little feature.
  17. Had enough of that? Let's get into the workflow, the general workflow, how these different teams go from wanting a change in Sensu, to getting one.
  18. Usually, it starts with a pull request. All the roles are hosted in git, so the different teams write their checks or make their changes in a branch and then submit a pull request. I take a look at it, a quick code review process, and once merged, it's back to team. If they have a new check, a new script, they re-run the sensu-client playbook on their end, at least one of the tasks in that playbook, to install the change and satisfy dependencies. Sometimes they've already done this before submitting the pull request. I work with great people who are definitely on top of things. They give me the go-ahead, I run the sensu-server playbook, and make the new check or whatever live. It's the final step, make sure the dependencies on the clients are good first. Put the horse before the cart.
  19. Are there problems with this whole multi-team Sensu thing? You bet! I'd be lying if I said there weren't. But let's call them challenges.
  20. The first pertains to API calls. Teams would like access to data that is simply not available through the RBAC layer. The second one involves host visibility in one part of the web UI, the Events page. The third one is more of an internal thing, many teams aren't interested in cleaning up their old hosts and checks, but that needs to be solved. And finally, there are bottlenecks in the whole workflow that I'd like to point out.
  21. Digging into the first one, all the lovely and versatile API calls that are available in Sensu, if you want them to go through an RBAC layer, LDAP, specifically, most don't really work. Either they return everything, all clients for all teams, regardless of any RBAC restrictions in place. Either that, or they don't work at all, the functionality's not exposed. Fortunately, RBAC is a first-class feature in Sensu 2.0, so I'm looking forward to working with that.
  22. Second, if you're locked down to certain subscriptions using RBAC, you're also limited in what you can see on the Events page. A host with a matching subscription won't show up unless the check also matches that subscription. A global check like the load average, shown here, runs everywhere, it's part of the 'basic' subscription I showed you earlier. But you can't see them in the Events page. I get why that is, it's a strict subscription match on the Events page, but it's simply a usability issue, it keeps the teams from seeing all the alerts on their hosts in one place.
  23. Let's say you've got a check that no longer applies to a host. The result will stick around in the UI. Or let's say you've decom'ed a host, shut it down, silenced it in Sensu. I am pretty anal-retentive about cleaning up that sort of thing, keeping the UI concise and orderly, but I can't expect everyone to be the same way. Before long, you look at the UI and see it cluttered with obsolete check results and silenced hosts with keepalive alarms. Fortunately, this one has a solution. I've got an auto-cleanup task, runs like a check on the Sensu API servers once a day, and cleans up everything that's more than two weeks stale: checks that haven't been run in two weeks, and hosts which have been in keepalive failure for two weeks, which you might be able to make out in this example. Everything except for old silenced entries, when something has long since recovered but is still silenced. I'm afraid to clean that up automatically, though, since it kind of presents an unpredictable user experience. This cleanup check is something I would like to open up to the community, though, I think others might find it helpful.
  24. And finally, the biggest bottleneck is me. I have to approve, merge, and deploy any changes that touch the server side. Usually, this isn't a problem, the turn-around time is maybe an hour, depending on what else is going on. And technically, there are a number of people who can do this part, there are people with the right access as well as clearly documentated procedures. But if you're not in it day to day, you're not going to be comfortable with the process. If I take a vacation, they'll wait until I'm back. Making this all CICD-capable, that'd be nice, at some point, but in the meantime, I'm the bottleneck, the human gate check. I thought that was worth pointing out.
  25. That's all I've got. Hopefully, whether or not you use Ansible, this gave you some ideas on how to work with multiple teams in Sensu. Thank you for giving me the time, and thank you, Sensu team, for a great product and a great event. Questions?