SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Optimizing your Monitoring and Trending tools for the Cloud
                    Nagios World Conference 2012


                                                           Nicolas Brousse
                                                   Lead Operations Engineer
                                                       September 28th 2012
•    About TubeMogul
•    What are some of our challenges?
•    Our environment
•    Amazon Cloud Environment
•    Automated Monitoring
•    Efficient on-call rotation
•    Efficient monitoring
•    What’s next?
•    Q&A
About TubeMogul

 •  Founded in 2006
 •  Formerly a video distribution and analytics
    platform
 •  TubeMogul is a Brand-Focused Video Marketing
    Company
    –  Build for Branding
    –  Integrate real-time media buying, ad serving,
       targeting, optimization and brand measurement

    TubeMogul simplifies the delivery of video ads
  and maximizes the impact of every dollar spent by
                 brand marketers
             http://www.tubemogul.com
What are some of our challenges?

 •  Monitoring between 700 to 1000 servers
 •  Servers spread across 6 different locations
    –  4 Amazon EC2 Regions (our public cloud provider)
    –  1 Hosted (Liquidweb) & 1 VPS (Linode)
 •  Little monitoring resources
    –  Collecting over 115,000 metrics
    –  Monitoring over 20,000 services with Nagios
 •  Multiple billions of HTTP requests a day
    –  Most of it must be served in less than 100ms
    –  Lost of traffic could mean lost of business opportunity
    –  Or worst, over-spending…
Our environment

 •  Over 80 different server profiles
 •  Our stack:
    –  Java (Embedded Jetty, Tomcat)
    –  PHP, RoR
    –  Hadoop: HDFS, M/R, Hbase, Hive
    –  Couchbase
    –  MySQL
 •  Monitoring: Nagios, NSCA
 •  Graphing: Ganglia, sFlow, Graphite
 •  Configuration Management: Puppet
Amazon Cloud Environment
Amazon Cloud Environment


 •  We use EC2, SDB, SQS, EMR, S3, etc.
 •  We don’t use ELB
 •  We heavily use EC2 Tags

 ec2-describe-instances -F tag:hostname=dev-build01
Automated Monitoring
Automated Monitoring

 Configuring Ganglia using Puppet templates
   globals {
    …
    override_hostname = <%= scope.lookupvar('hostname') %>
   …
   }
   udp_send_channel {
     host = <%= scope.lookupvar('ec2_tag_nagios_host') %>
     port = 8649
     ttl = 1
   }
   sflow {
     udp_port = 6343
     accept_jvm_metrics = yes
     multiple_jvm_instances = yes
   }
Automated Monitoring

 Or configuring Host sFlow using Puppet
 templates

            sflow{
              DNSSD = off
              polling = 20
              sampling = 512
              collector{
               ip = <%= ec2_tag_nagios_host %>
               udpport = 6343
              }
            }
Automated Monitoring

 •  Puppet configure our monitoring instances
    –  We use Nagios regex : use_regexp_matching=1
    –  But we don’t use true regex : use_true_regexp_matching=0
    –  We use NSCA with Upstart




    –  We don’t use the perfdata
    –  We use pre-cached objects
    –  We includes our configurations from 3 directories
        •  objects => templates, contacts, commands, event_handlers
        •  servers => contain a configuration file for each server
        •  clusters => contain a configuration file for each cluster
Automated Monitoring
 Process of event when starting a new host and add it to our monitoring:

 1.    We start a new instance using Cerveza and Cloud-init

 2.    Puppet configure Gmond or Host sFlow on the instance

 3.    Our monitoring server running Gmond and Gmetad get data from
       the new instance

 4.    A Nagios check run every minute and check for new hosts
         •    Look for new hosts using EC2 API
         •    Look for EC2 tag “hostname” to confirm it’s a legit host, not a zombie / fail start
         •    Look for EC2 tag “nagios_host” to see if the host belong to this monitoring instance

 5.    If a new host is found:
         •    We build a config for the host based on a template file and doing some string replace
         •    Once all config have been generated, we rebuild pre-cache objects and reload
              Nagios

 6.    If we find “Zombie” host, we generate a Warning alert

 7.    If the config is corrupt, we send a Critical alert
Automated Monitoring
Automated Monitoring
Efficient on-call rotation

  •  Follow the sun
     –  Some of our team is in Ukraine, no more Tier 1 night
        on-call for us
  •  Nagios timeperiod and escalation are a pain to
     maintain
     –  Nagios notification plugged to Google Calendar
        •  Using our own notification script for email and paging
        •  Google Clendar make it easy for each team to manage their
           own on-call calendar
        •  Support for multiple Tier and complex schedules
        •  Caching Google Calendar info locally every hour
     –  Simpler definitions and rules in Nagios contacts
     –  Notify only people on-call, unless they asked for “off
        call” emails
Efficient on-call rotation

  Using Google Calendar…
Efficient on-call rotation

  •  Simple contact definitions
  •  Google Calendar info
  •  Tier Filter (Regex)
  •  Tier Interval (time to wait before escalating
     alert since last tier)
  •  Off call email
Efficient on-call rotation
Efficient on-call rotation
  On-call contact fetched from Google Calendar at the bottom of the alert
  makes our life easier!
Efficient monitoring

  We disable most notification and only care of a
  cluster status
Efficient monitoring

  Most of our checks are based on Ganglia RRD
  files
Efficient monitoring

  It become really easy to monitor any metrics returned
  by Ganglia
Efficient monitoring


  We can check cluster status by hosts/services but also
  per returned messages !
Efficient monitoring
Efficient monitoring

  Using Graphite Federated Storage
     –  One place to see all our metrics from all the world
     –  No delay due to rsync of RRD files
     –  Graph close to real time, delay only due to
        rrdcached flushing interval
What’s next?

 Some hot topics…
 •  Do trending alert with Nagios based on
    Graphite/Ganglia data
 •  Better automation for non-cloud servers
 •  Ensure we can scale our monitoring when
    using hybrid cloud (Eucalyptus) or multiple
    public cloud provider
 •  Get a better centralized view of our different
    Nagios
Ops @ TubeMogul

  All this wouldn’t be possible without a strong
             System Operation team

              Andrey Shestakov
            Eamon Bisson Donahue
              Justin Francisconi
               Marylene Tanfin
               Nicolas Brousse
                Stan Rudenko
Thank you…



              TubeMogul is Hiring !
        http://www.tubemogul.com/jobs
              jobs@tubemogul.com

               Follow us on Twitter

             @TubeMogul               @orieg

Contenu connexe

Tendances

How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service DiscoveryJohn Billings
 
Automating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsAutomating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsYaron Parasol
 
Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIs
Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIsNagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIs
Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIsNagios
 
Mistral Atlanta design session
Mistral Atlanta design sessionMistral Atlanta design session
Mistral Atlanta design sessionRenat Akhmerov
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseChristian McHugh
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStormDmitri Zimine
 
NEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service Overview
NEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service OverviewNEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service Overview
NEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service OverviewAmazon Web Services
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developersDatadog
 
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)Amazon Web Services Korea
 
Mistral Hong Kong Unconference track
Mistral Hong Kong Unconference trackMistral Hong Kong Unconference track
Mistral Hong Kong Unconference trackRenat Akhmerov
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
 
A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018Carmen Mason
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
 
Hacklu2011 tricaud
Hacklu2011 tricaudHacklu2011 tricaud
Hacklu2011 tricaudstricaud
 
AWS vs Azure vs Google Cloud Storage Deep Dive
AWS vs Azure vs Google Cloud Storage Deep DiveAWS vs Azure vs Google Cloud Storage Deep Dive
AWS vs Azure vs Google Cloud Storage Deep DiveRightScale
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Google Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixMax Kuzkin
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 
Kraken
KrakenKraken
KrakenPayPal
 
(GAM403) From 0 to 60 Million Player Hours in 400B Star Systems
(GAM403) From 0 to 60 Million Player Hours in 400B Star Systems(GAM403) From 0 to 60 Million Player Hours in 400B Star Systems
(GAM403) From 0 to 60 Million Player Hours in 400B Star SystemsAmazon Web Services
 

Tendances (20)

How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
 
Automating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsAutomating Application over OpenStack using Workflows
Automating Application over OpenStack using Workflows
 
Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIs
Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIsNagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIs
Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIs
 
Mistral Atlanta design session
Mistral Atlanta design sessionMistral Atlanta design session
Mistral Atlanta design session
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterprise
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStorm
 
NEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service Overview
NEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service OverviewNEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service Overview
NEW LAUNCH IPv6 in the Cloud: Protocol and AWS Service Overview
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)Deep Learning을 위한  AWS 기반 인공 지능(AI) 서비스 (윤석찬)
Deep Learning을 위한 AWS 기반 인공 지능(AI) 서비스 (윤석찬)
 
Mistral Hong Kong Unconference track
Mistral Hong Kong Unconference trackMistral Hong Kong Unconference track
Mistral Hong Kong Unconference track
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Hacklu2011 tricaud
Hacklu2011 tricaudHacklu2011 tricaud
Hacklu2011 tricaud
 
AWS vs Azure vs Google Cloud Storage Deep Dive
AWS vs Azure vs Google Cloud Storage Deep DiveAWS vs Azure vs Google Cloud Storage Deep Dive
AWS vs Azure vs Google Cloud Storage Deep Dive
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Google Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with Zabbix
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Kraken
KrakenKraken
Kraken
 
(GAM403) From 0 to 60 Million Player Hours in 400B Star Systems
(GAM403) From 0 to 60 Million Player Hours in 400B Star Systems(GAM403) From 0 to 60 Million Player Hours in 400B Star Systems
(GAM403) From 0 to 60 Million Player Hours in 400B Star Systems
 

En vedette

Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...
Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...
Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...Nagios
 
Nagios Conference 2012 - John Sellens - Non-Obvious Nagios
Nagios Conference 2012 - John Sellens - Non-Obvious NagiosNagios Conference 2012 - John Sellens - Non-Obvious Nagios
Nagios Conference 2012 - John Sellens - Non-Obvious NagiosNagios
 
Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite
Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: EliteNagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite
Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: EliteNagios
 
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification SyntaxNagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification SyntaxNagios
 
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your BusinessNagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your BusinessNagios
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios
 
Nagios Conference 2012 - Ethan Galstad - Keynote
Nagios Conference 2012 - Ethan Galstad - KeynoteNagios Conference 2012 - Ethan Galstad - Keynote
Nagios Conference 2012 - Ethan Galstad - KeynoteNagios
 
Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...
Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...
Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...Nagios
 

En vedette (8)

Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...
Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...
Nagios Conference 2012 - Yancy Ribbens - Windows Advanced Monitoring with WMI...
 
Nagios Conference 2012 - John Sellens - Non-Obvious Nagios
Nagios Conference 2012 - John Sellens - Non-Obvious NagiosNagios Conference 2012 - John Sellens - Non-Obvious Nagios
Nagios Conference 2012 - John Sellens - Non-Obvious Nagios
 
Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite
Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: EliteNagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite
Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite
 
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification SyntaxNagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
 
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your BusinessNagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
 
Nagios Conference 2012 - Ethan Galstad - Keynote
Nagios Conference 2012 - Ethan Galstad - KeynoteNagios Conference 2012 - Ethan Galstad - Keynote
Nagios Conference 2012 - Ethan Galstad - Keynote
 
Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...
Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...
Nagios Conference 2012 - Sam Lansing - Automating Windows Application Testing...
 

Similaire à Nagios Conference 2012 - Nicolas Brousse - Optimizing your Monitoring and Trending tools for the Cloud

Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...Nicolas Brousse
 
Itsummit2015 blizzard
Itsummit2015 blizzardItsummit2015 blizzard
Itsummit2015 blizzardkevin_donovan
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPiyush Kumar
 
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...Xiaoman DONG
 
Measuring IPv6 using ad-based measurement
Measuring IPv6 using ad-based measurementMeasuring IPv6 using ad-based measurement
Measuring IPv6 using ad-based measurementAPNIC
 
Nagios Conference 2013 - Nicolas Brousse - Bringing Business Awareness
Nagios Conference 2013 - Nicolas Brousse - Bringing Business AwarenessNagios Conference 2013 - Nicolas Brousse - Bringing Business Awareness
Nagios Conference 2013 - Nicolas Brousse - Bringing Business AwarenessNagios
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...Amazon Web Services
 
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...Nicolas Brousse
 
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza DatabasesNagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza DatabasesNagios
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...Brian Grant
 
WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...
WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...
WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...WSO2
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵Amazon Web Services Korea
 
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes FlightAvere Systems
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes FlightMichelle Ray
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...Amazon Web Services
 

Similaire à Nagios Conference 2012 - Nicolas Brousse - Optimizing your Monitoring and Trending tools for the Cloud (20)

Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
 
Itsummit2015 blizzard
Itsummit2015 blizzardItsummit2015 blizzard
Itsummit2015 blizzard
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
Kubernetes Clusters At Scale: Managing Hundreds Apache Pinot Kubernetes Clust...
 
Measuring IPv6 using ad-based measurement
Measuring IPv6 using ad-based measurementMeasuring IPv6 using ad-based measurement
Measuring IPv6 using ad-based measurement
 
Nagios Conference 2013 - Nicolas Brousse - Bringing Business Awareness
Nagios Conference 2013 - Nicolas Brousse - Bringing Business AwarenessNagios Conference 2013 - Nicolas Brousse - Bringing Business Awareness
Nagios Conference 2013 - Nicolas Brousse - Bringing Business Awareness
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
 
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
 
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza DatabasesNagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
 
WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...
WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...
WSO2Con USA 2015: Keynote - Kubernetes – A Platform for Automating Deployment...
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes Flight
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes Flight
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 

Plus de Nagios

Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best PracticesNagios
 
Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewNagios
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The HoodNagios
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsNagios
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionNagios
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsNagios
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceNagios
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksNagios
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationNagios
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Nagios
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosNagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Nagios
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosNagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Nagios
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Nagios
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNagios
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - FeaturesNagios
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios
 

Plus de Nagios (20)

Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture Overview
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The Hood
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient Notifications
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service Checks
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With Nagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson Opening
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - Features
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - Features
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
 

Dernier

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Nagios Conference 2012 - Nicolas Brousse - Optimizing your Monitoring and Trending tools for the Cloud

  • 1. Optimizing your Monitoring and Trending tools for the Cloud Nagios World Conference 2012 Nicolas Brousse Lead Operations Engineer September 28th 2012
  • 2. •  About TubeMogul •  What are some of our challenges? •  Our environment •  Amazon Cloud Environment •  Automated Monitoring •  Efficient on-call rotation •  Efficient monitoring •  What’s next? •  Q&A
  • 3. About TubeMogul •  Founded in 2006 •  Formerly a video distribution and analytics platform •  TubeMogul is a Brand-Focused Video Marketing Company –  Build for Branding –  Integrate real-time media buying, ad serving, targeting, optimization and brand measurement TubeMogul simplifies the delivery of video ads and maximizes the impact of every dollar spent by brand marketers http://www.tubemogul.com
  • 4. What are some of our challenges? •  Monitoring between 700 to 1000 servers •  Servers spread across 6 different locations –  4 Amazon EC2 Regions (our public cloud provider) –  1 Hosted (Liquidweb) & 1 VPS (Linode) •  Little monitoring resources –  Collecting over 115,000 metrics –  Monitoring over 20,000 services with Nagios •  Multiple billions of HTTP requests a day –  Most of it must be served in less than 100ms –  Lost of traffic could mean lost of business opportunity –  Or worst, over-spending…
  • 5. Our environment •  Over 80 different server profiles •  Our stack: –  Java (Embedded Jetty, Tomcat) –  PHP, RoR –  Hadoop: HDFS, M/R, Hbase, Hive –  Couchbase –  MySQL •  Monitoring: Nagios, NSCA •  Graphing: Ganglia, sFlow, Graphite •  Configuration Management: Puppet
  • 7. Amazon Cloud Environment •  We use EC2, SDB, SQS, EMR, S3, etc. •  We don’t use ELB •  We heavily use EC2 Tags ec2-describe-instances -F tag:hostname=dev-build01
  • 9. Automated Monitoring Configuring Ganglia using Puppet templates globals { … override_hostname = <%= scope.lookupvar('hostname') %> … } udp_send_channel { host = <%= scope.lookupvar('ec2_tag_nagios_host') %> port = 8649 ttl = 1 } sflow { udp_port = 6343 accept_jvm_metrics = yes multiple_jvm_instances = yes }
  • 10. Automated Monitoring Or configuring Host sFlow using Puppet templates sflow{ DNSSD = off polling = 20 sampling = 512 collector{ ip = <%= ec2_tag_nagios_host %> udpport = 6343 } }
  • 11. Automated Monitoring •  Puppet configure our monitoring instances –  We use Nagios regex : use_regexp_matching=1 –  But we don’t use true regex : use_true_regexp_matching=0 –  We use NSCA with Upstart –  We don’t use the perfdata –  We use pre-cached objects –  We includes our configurations from 3 directories •  objects => templates, contacts, commands, event_handlers •  servers => contain a configuration file for each server •  clusters => contain a configuration file for each cluster
  • 12. Automated Monitoring Process of event when starting a new host and add it to our monitoring: 1.  We start a new instance using Cerveza and Cloud-init 2.  Puppet configure Gmond or Host sFlow on the instance 3.  Our monitoring server running Gmond and Gmetad get data from the new instance 4.  A Nagios check run every minute and check for new hosts •  Look for new hosts using EC2 API •  Look for EC2 tag “hostname” to confirm it’s a legit host, not a zombie / fail start •  Look for EC2 tag “nagios_host” to see if the host belong to this monitoring instance 5.  If a new host is found: •  We build a config for the host based on a template file and doing some string replace •  Once all config have been generated, we rebuild pre-cache objects and reload Nagios 6.  If we find “Zombie” host, we generate a Warning alert 7.  If the config is corrupt, we send a Critical alert
  • 15. Efficient on-call rotation •  Follow the sun –  Some of our team is in Ukraine, no more Tier 1 night on-call for us •  Nagios timeperiod and escalation are a pain to maintain –  Nagios notification plugged to Google Calendar •  Using our own notification script for email and paging •  Google Clendar make it easy for each team to manage their own on-call calendar •  Support for multiple Tier and complex schedules •  Caching Google Calendar info locally every hour –  Simpler definitions and rules in Nagios contacts –  Notify only people on-call, unless they asked for “off call” emails
  • 16. Efficient on-call rotation Using Google Calendar…
  • 17. Efficient on-call rotation •  Simple contact definitions •  Google Calendar info •  Tier Filter (Regex) •  Tier Interval (time to wait before escalating alert since last tier) •  Off call email
  • 19. Efficient on-call rotation On-call contact fetched from Google Calendar at the bottom of the alert makes our life easier!
  • 20. Efficient monitoring We disable most notification and only care of a cluster status
  • 21. Efficient monitoring Most of our checks are based on Ganglia RRD files
  • 22. Efficient monitoring It become really easy to monitor any metrics returned by Ganglia
  • 23. Efficient monitoring We can check cluster status by hosts/services but also per returned messages !
  • 25. Efficient monitoring Using Graphite Federated Storage –  One place to see all our metrics from all the world –  No delay due to rsync of RRD files –  Graph close to real time, delay only due to rrdcached flushing interval
  • 26. What’s next? Some hot topics… •  Do trending alert with Nagios based on Graphite/Ganglia data •  Better automation for non-cloud servers •  Ensure we can scale our monitoring when using hybrid cloud (Eucalyptus) or multiple public cloud provider •  Get a better centralized view of our different Nagios
  • 27. Ops @ TubeMogul All this wouldn’t be possible without a strong System Operation team Andrey Shestakov Eamon Bisson Donahue Justin Francisconi Marylene Tanfin Nicolas Brousse Stan Rudenko
  • 28. Thank you… TubeMogul is Hiring ! http://www.tubemogul.com/jobs jobs@tubemogul.com Follow us on Twitter @TubeMogul @orieg