SlideShare a Scribd company logo
1 of 50
Download to read offline
Monitoring is easy;
                     why do we suck at it?


                            /   monitoring it all
Tuesday, November 8, 2011
Who is this guy?                                                        @postwait

                Author of “Scalable Internet Architectures”
                Pearson, ISBN: 067232699X

                Contributor to “Web Operations”
                O’Reilly, ISBN: 978-1-4493-7744-1



                Founder of OmniTI, Message Systems, Fontdeck, & Circonus
                I like to tackle problems that are “always on” and “always growing.”




                I am an Engineer
                A practitioner of academic computing.
                IEEE member and Senior ACM member.
                On the Editorial Board of ACM’s Queue magazine.



Tuesday, November 8, 2011
Monitoring: let’s start with a definition.




                       •    analytics

                       •    trending

                       •    fault-detection / alerting

                       •    capacity planning


                       •    it is the collection and use of telemetry data




Tuesday, November 8, 2011
What monitoring is not




                       •    controls


                       •    via a monitoring you observe,
                            you do not influence




Tuesday, November 8, 2011
So why do we suck at it?



                            tl;dr
                            because we think about

                             •   networks,

                             •   systems, and

                             •   applications
                            instead of what matters: business.




Tuesday, November 8, 2011
Your purpose




                       •    Your purpose is to make
                            your company’s web business
                            operate.

                            (hence: “web operations”)




Tuesday, November 8, 2011
Your purpose




                       •    Your purpose is to make
                            your company’s web business
                            operate.

                            (hence: “web operations”)




Tuesday, November 8, 2011
Your purpose




                       •    ensure business success




Tuesday, November 8, 2011
Understanding your purpose




                       •    who defines business success?

                            •   shareholders, ultimately

                            •   the board of directors, in their stead

                            •   the CEO on an operational, day-to-day basis




Tuesday, November 8, 2011
Understanding your purpose




                       •    Assuming your CEO is doing a good job

                            •   the executive team understands these metrics


                       •    Assuming the executive team is competent

                            •   their reports understand these metrics
                                (at least the pertinent ones)




Tuesday, November 8, 2011
Pertinent == Problematic




                       •    You enable all aspects of the business

                       •    All these metrics are pertinent




Tuesday, November 8, 2011
But why?




                       •    You could simply track stuff that is in your purview.

                       •    Why not?




Tuesday, November 8, 2011
Technology



                       •    As a technology operations group,
                            you have the technology.




                                           We can rebuild him.
                                           We have the technology.
                                           We can make him better than he was.
                                           Better...stronger...faster.
                                                                    - Oscar Goldman

Tuesday, November 8, 2011
Why is our technology better?




                       •    Simply put: MTTD




Tuesday, November 8, 2011
Now, what about your purview?




                       •    Obviously monitoring the business is useful.

                       •    However, you cannot directly affect business.

                       •    You indirectly affect it by operating the web portion.




Tuesday, November 8, 2011
What can you change?



                       •    You can control:

                            •   releases,

                            •   performance,

                            •   stability,

                            •   computing resources,

                            •   networking,

                            •   and availability.



Tuesday, November 8, 2011
Visualize!




                       •    All this information must be presented visually.




Tuesday, November 8, 2011
Text.




                       •    Text is incredibly useful.

                       •    Consider: deployment.




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394            15:03:14 2011/06/15
                              previous deploy      1h 42m 18s ago
                                                  11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Text.




                       •    Numbers are trickier.

                       •    So many representations from which to choose.




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Gauges require understanding




                       •    Gauges imply a deep understanding of

                            •   bounds, and

                            •   tolerances




Tuesday, November 8, 2011
Gauges require understanding




                       •    General advice

                            •   If the range will ever change, don’t use gauges




Tuesday, November 8, 2011
Gauges require understanding




                       •    Great for:

                            •   percentages,

                            •   temperature,

                            •   power per rack,

                            •   bandwidth per uplink




Tuesday, November 8, 2011
Gauges require understanding




                       •    Bad for:

                            •   IOPS,

                            •   current visitor counts,

                            •   requests per second,

                            •   bandwidth overall




Tuesday, November 8, 2011
Graphs are often better




Tuesday, November 8, 2011
Even little ones




Tuesday, November 8, 2011
Think relatively




Tuesday, November 8, 2011
Think relatively




                            xxxxxxxxxxxxxxx


                            xxxxxxxxxxxxxxx




Tuesday, November 8, 2011
Users live all around the world




                       •    Users live just about everywhere

                       •    “Where?” is a useful question




Tuesday, November 8, 2011
Geolocation




Tuesday, November 8, 2011
Geolocation is interesting




                       •    to marketing

                       •    to legal

                       •    (okay to everyone)


                       •    but, not so useful to operations




Tuesday, November 8, 2011
Geolocation is interesting




                       •    perhaps more interesting




Tuesday, November 8, 2011
Geolocation is interesting




Tuesday, November 8, 2011
Geolocation




                       •    Internet location != geo-political location




Tuesday, November 8, 2011
ASN location


                       •    The closest thing to geo-political boundaries is peering



       -bash-4.0$ /usr/sbin/bgpctl show rib 66.78.236.243
       flags: * = Valid, > = Selected, I = via IBGP, A = Announced
       origin: i = IGP, e = EGP, ? = Incomplete

       flags destination                  gateway         lpref   med aspath origin
             66.78.236.0/22               64.202.119.7      100     0 23352 4436 2914 3356 32778 i

       ### ASN 327778 is “Smart City Networks, L.P.”




Tuesday, November 8, 2011
ASN location




Tuesday, November 8, 2011
What about the business?




Tuesday, November 8, 2011
What about the business?




                            Authorizations : Hard Failed : Soft Failed : Releases


Tuesday, November 8, 2011
Is that all?




                       •    Hells no.




Tuesday, November 8, 2011
It’s all about real-time




                       •    Everything so far is old hat (maybe)

                       •    Every business unit has visualizations like this


                       •    You need to combine the data

                       •    You need to make it real-time




Tuesday, November 8, 2011
Thanks




                       •    web demo ensues....




Tuesday, November 8, 2011

More Related Content

Viewers also liked

Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everythingTheo Schlossnagle
 
Velocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesVelocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesTheo Schlossnagle
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaTheo Schlossnagle
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet ArchitectureTheo Schlossnagle
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About PerformanceTheo Schlossnagle
 
Wireless telemetry systems
Wireless telemetry systemsWireless telemetry systems
Wireless telemetry systemsSneha Suluru
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Telemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetryTelemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetrysagheer ahmed
 

Viewers also liked (20)

Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everything
 
PostgreSQL on Solaris
PostgreSQL on SolarisPostgreSQL on Solaris
PostgreSQL on Solaris
 
Velocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesVelocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet Architectures
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Atldevops
AtldevopsAtldevops
Atldevops
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet Architecture
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
Telrmetry1
Telrmetry1Telrmetry1
Telrmetry1
 
Wireless telemetry systems
Wireless telemetry systemsWireless telemetry systems
Wireless telemetry systems
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Telemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetryTelemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetry
 

Similar to Monitoring is easy, why are we so bad at it presentation

Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Leonardo Borges
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud SecurityJason Chan
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian
 
JavaSE - The road forward
JavaSE - The road forwardJavaSE - The road forward
JavaSE - The road forwardeug3n_cojocaru
 
LISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps TransformationLISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps Transformationbenrockwood
 
SplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunk
 
Puppet camp europe 2011 hackability
Puppet camp europe 2011   hackabilityPuppet camp europe 2011   hackability
Puppet camp europe 2011 hackabilityPuppet
 
Software on the High Seas
Software on the High SeasSoftware on the High Seas
Software on the High SeasSoren Harner
 
Migration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntoshMigration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntoshlucenerevolution
 
Devopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the UnionDevopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the UnionJohn Willis
 
A Look at the Future of HTML5
A Look at the Future of HTML5A Look at the Future of HTML5
A Look at the Future of HTML5Tim Wright
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycaneYusuke Ando
 
Devops workshop unit2
Devops workshop unit2Devops workshop unit2
Devops workshop unit2John Willis
 
Community Code: Xero
Community Code: XeroCommunity Code: Xero
Community Code: XeroSencha
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Michael McIntosh
 
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02TNR Global
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011Stefane Fermigier
 
Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0Sencha
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling DisqusPyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disquszeeg
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birdscolinbdclark
 

Similar to Monitoring is easy, why are we so bad at it presentation (20)

Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011)
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud Security
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
 
JavaSE - The road forward
JavaSE - The road forwardJavaSE - The road forward
JavaSE - The road forward
 
LISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps TransformationLISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps Transformation
 
SplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrack
 
Puppet camp europe 2011 hackability
Puppet camp europe 2011   hackabilityPuppet camp europe 2011   hackability
Puppet camp europe 2011 hackability
 
Software on the High Seas
Software on the High SeasSoftware on the High Seas
Software on the High Seas
 
Migration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntoshMigration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntosh
 
Devopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the UnionDevopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the Union
 
A Look at the Future of HTML5
A Look at the Future of HTML5A Look at the Future of HTML5
A Look at the Future of HTML5
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycane
 
Devops workshop unit2
Devops workshop unit2Devops workshop unit2
Devops workshop unit2
 
Community Code: Xero
Community Code: XeroCommunity Code: Xero
Community Code: Xero
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
 
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011
 
Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling DisqusPyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disqus
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birds
 

More from Theo Schlossnagle

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to ComplexityTheo Schlossnagle
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or NotTheo Schlossnagle
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service designTheo Schlossnagle
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoringTheo Schlossnagle
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachTheo Schlossnagle
 

More from Theo Schlossnagle (12)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approach
 
Http front-ends
Http front-endsHttp front-ends
Http front-ends
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Monitoring is easy, why are we so bad at it presentation

  • 1. Monitoring is easy; why do we suck at it? / monitoring it all Tuesday, November 8, 2011
  • 2. Who is this guy? @postwait Author of “Scalable Internet Architectures” Pearson, ISBN: 067232699X Contributor to “Web Operations” O’Reilly, ISBN: 978-1-4493-7744-1 Founder of OmniTI, Message Systems, Fontdeck, & Circonus I like to tackle problems that are “always on” and “always growing.” I am an Engineer A practitioner of academic computing. IEEE member and Senior ACM member. On the Editorial Board of ACM’s Queue magazine. Tuesday, November 8, 2011
  • 3. Monitoring: let’s start with a definition. • analytics • trending • fault-detection / alerting • capacity planning • it is the collection and use of telemetry data Tuesday, November 8, 2011
  • 4. What monitoring is not • controls • via a monitoring you observe, you do not influence Tuesday, November 8, 2011
  • 5. So why do we suck at it? tl;dr because we think about • networks, • systems, and • applications instead of what matters: business. Tuesday, November 8, 2011
  • 6. Your purpose • Your purpose is to make your company’s web business operate. (hence: “web operations”) Tuesday, November 8, 2011
  • 7. Your purpose • Your purpose is to make your company’s web business operate. (hence: “web operations”) Tuesday, November 8, 2011
  • 8. Your purpose • ensure business success Tuesday, November 8, 2011
  • 9. Understanding your purpose • who defines business success? • shareholders, ultimately • the board of directors, in their stead • the CEO on an operational, day-to-day basis Tuesday, November 8, 2011
  • 10. Understanding your purpose • Assuming your CEO is doing a good job • the executive team understands these metrics • Assuming the executive team is competent • their reports understand these metrics (at least the pertinent ones) Tuesday, November 8, 2011
  • 11. Pertinent == Problematic • You enable all aspects of the business • All these metrics are pertinent Tuesday, November 8, 2011
  • 12. But why? • You could simply track stuff that is in your purview. • Why not? Tuesday, November 8, 2011
  • 13. Technology • As a technology operations group, you have the technology. We can rebuild him. We have the technology. We can make him better than he was. Better...stronger...faster. - Oscar Goldman Tuesday, November 8, 2011
  • 14. Why is our technology better? • Simply put: MTTD Tuesday, November 8, 2011
  • 15. Now, what about your purview? • Obviously monitoring the business is useful. • However, you cannot directly affect business. • You indirectly affect it by operating the web portion. Tuesday, November 8, 2011
  • 16. What can you change? • You can control: • releases, • performance, • stability, • computing resources, • networking, • and availability. Tuesday, November 8, 2011
  • 17. Visualize! • All this information must be presented visually. Tuesday, November 8, 2011
  • 18. Text. • Text is incredibly useful. • Consider: deployment. Tuesday, November 8, 2011
  • 19. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 20. Code Deployment r82394 15:03:14 2011/06/15 previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 21. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 22. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 23. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 24. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 25. Text. • Numbers are trickier. • So many representations from which to choose. Tuesday, November 8, 2011
  • 30. Gauges require understanding • Gauges imply a deep understanding of • bounds, and • tolerances Tuesday, November 8, 2011
  • 31. Gauges require understanding • General advice • If the range will ever change, don’t use gauges Tuesday, November 8, 2011
  • 32. Gauges require understanding • Great for: • percentages, • temperature, • power per rack, • bandwidth per uplink Tuesday, November 8, 2011
  • 33. Gauges require understanding • Bad for: • IOPS, • current visitor counts, • requests per second, • bandwidth overall Tuesday, November 8, 2011
  • 34. Graphs are often better Tuesday, November 8, 2011
  • 35. Even little ones Tuesday, November 8, 2011
  • 37. Think relatively xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx Tuesday, November 8, 2011
  • 38. Users live all around the world • Users live just about everywhere • “Where?” is a useful question Tuesday, November 8, 2011
  • 40. Geolocation is interesting • to marketing • to legal • (okay to everyone) • but, not so useful to operations Tuesday, November 8, 2011
  • 41. Geolocation is interesting • perhaps more interesting Tuesday, November 8, 2011
  • 43. Geolocation • Internet location != geo-political location Tuesday, November 8, 2011
  • 44. ASN location • The closest thing to geo-political boundaries is peering -bash-4.0$ /usr/sbin/bgpctl show rib 66.78.236.243 flags: * = Valid, > = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin 66.78.236.0/22 64.202.119.7 100 0 23352 4436 2914 3356 32778 i ### ASN 327778 is “Smart City Networks, L.P.” Tuesday, November 8, 2011
  • 46. What about the business? Tuesday, November 8, 2011
  • 47. What about the business? Authorizations : Hard Failed : Soft Failed : Releases Tuesday, November 8, 2011
  • 48. Is that all? • Hells no. Tuesday, November 8, 2011
  • 49. It’s all about real-time • Everything so far is old hat (maybe) • Every business unit has visualizations like this • You need to combine the data • You need to make it real-time Tuesday, November 8, 2011
  • 50. Thanks • web demo ensues.... Tuesday, November 8, 2011