SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Winning the metrics battle (finally)
Winning the metrics battle
         (finally)
       Simon Hildrew           Nick Satterly
  Infrastructure Developer   Monitoring Engineer
        The Guardian           The Guardian
The metrics battlefield
Total metrics


                                180,000




                       50,000

1,400   2,800
http://www.flickr.com/photos/ghostsigns/6676069121



                                              5 minutes


                                                every 15
                                                 seconds

                                                           http://www.flickr.com/photos/millynet/134071210
developer dashboards
Physical screens   Screensaver hacks
20


15


10


 5


 0
dev


hack
business dashboards
metrics + dashboards = culture change
http://www.flickr.com/photos/chrisjames_taylor/5454315456
our approach
         Side project    ➡   Prioritise
Incremental upgrade      ➡   Understand the real problem
Use off the shelf tool   ➡   Question the tools
  Pragmatic solution     ➡   Be ambitious
      Done in a year     ➡   Keep learning
Prioritise
drowning in work




http://www.flickr.com/photos/iampeas/246738971
a dedicated monitoring and
     metrics engineer
Understand the
 real problem
Urgent issue -
current tool end of life
The story so far...
metrics were not helping us
 solve production outages
ballooning number of
     applications
but... difficult to instrument applications
T.T. Detect
                      +
T.T. Fix   =   T.T. Diagnose
                      +
                T.T. Resolve
inaccessible tools




             http://www.flickr.com/photos/kdashy/2678539087
inconsistent data



http://www.flickr.com/photos/sybrenstuvel/2468506922
hypothesising & arguing
 easier than measuring


               http://www.flickr.com/photos/nouqraz/200049988
The ‘right’ thing
• measure everything
• measure frequently
• measure each data point once
• input and output must be open
Question the tools
Brute force?




http://www.flickr.com/photos/epublicist/3546059144
The safe option?




http://www.flickr.com/photos/alicebartlett/2361209195
Unintuitive?




http://www.flickr.com/photos/merlijnhoek/2841785343
Imposing a flawed model?
http://www.flickr.com/photos/evansville/8953838/
Too difficult / no progress?
http://www.flickr.com/photos/ginja_andy/4165849136/
Nagios


•   the “IBM” of monitoring tools

•   compromise over quantity and frequency of checks

•   < insert your criticism of nagios here >
Zabbix


•   metric collection tightly coupled to monitoring tool

•   confusing UI with poor visualisation

•   needed brute force to make limited API work
The ‘right’ thing
• measure everything
• measure frequently
• measure each data point once
• input and output must be open
don’t compromise
Be ambitious
http://www.flickr.com/photos/mugley/2961131550




                                 Throw work away
Draw your dream
http://www.flickr.com/photos/sk8geek/7358702704




                             Get as far as you can
screens           users
                                            db?             alerting?


 Etsy dashboard
                                                         message queue




              graphite                                   SNMP?           syslog?



 FITB                     ganglia                 api?



network      hosts           applications
Develop missing pieces




              http://www.flickr.com/photos/kalexanderson/5969012589
screens           users
                                            mongodb                   alerta       elastic
                                                                                   search


 Etsy dashboard
                                                                 message queue



                                                                          syslog     SNMP
              graphite                          ganglia alerts
                                                                          alerts     alerts




 FITB                     ganglia                ganglia-api




network      hosts           applications
Guardian Management
https://github.com/guardian/guardian-management
Ganglia API
https://github.com/guardian/ganglia-api
rescale image???




                       Alerta
https://github.com/guardian/alerta
Current stack
• Ganglia             • Guardian management
                        https://github.com/guardian/guardian-management


• FITB                • Guardian ganglia-api
                        https://github.com/guardian/ganglia-api
• Graphite
                      • Guardian alerta
• Etsy dashboards       https://github.com/guardian/alerta
Keep learning
we are not there yet
Watch the cultural changes
detecting
diagnosis
diagnosis
performance testing
confirmation
#monitoringsucks
➡ Prioritise
➡ Understand the real problem
➡ Question the tools
➡ Be ambitious
➡ Keep learning
tools can change culture
Thank you
               http://github.com/guardian
                 http://gu.com/p/3ap5f
       Simon Hildrew                    Nick Satterly
            @sihil                      @nicksatterly
simon.hildrew@guardian.co.uk    nick.satterly@guardian.co.uk

Contenu connexe

Similaire à Winning the metrics battle

Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scalejgoulah
 
Google Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended HanoiGoogle Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended HanoiGCPUserGroupVietnam
 
Move out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternativesMove out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternativestzang ms
 
Google Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for ResearchGoogle Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for ResearchCameron Neylon
 
Honeypots for Active Defense
Honeypots for Active DefenseHoneypots for Active Defense
Honeypots for Active DefenseGreg Foss
 
Hyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain WeekHyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain WeekHorea Porutiu
 
Performance - a challenging craft
Performance  - a challenging craftPerformance  - a challenging craft
Performance - a challenging craftFabian Lange
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Chun-Yu Tseng
 
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...PROIDEA
 
Blue team reboot - HackFest
Blue team reboot - HackFest Blue team reboot - HackFest
Blue team reboot - HackFest Haydn Johnson
 
How to fully automate a store.pptx
How to fully automate a store.pptxHow to fully automate a store.pptx
How to fully automate a store.pptxIgor Moiseev
 
Introduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agileIntroduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agileStefano Valle
 
Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)Christian Heilmann
 
Using Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain TransparencyUsing Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain TransparencyHorea Porutiu
 
Adoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for EveryoneAdoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for EveryoneKan Ouivirach, Ph.D.
 
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018Andy Davies
 

Similaire à Winning the metrics battle (20)

Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scale
 
Google Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended HanoiGoogle Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended Hanoi
 
Move out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternativesMove out from AppEngine, and Python PaaS alternatives
Move out from AppEngine, and Python PaaS alternatives
 
Google Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for ResearchGoogle Wave: Ripple or Tsunami for Research
Google Wave: Ripple or Tsunami for Research
 
Honeypots for Active Defense
Honeypots for Active DefenseHoneypots for Active Defense
Honeypots for Active Defense
 
Hyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain WeekHyperleger Fabric Workshop - Denver Blockchain Week
Hyperleger Fabric Workshop - Denver Blockchain Week
 
Performance - a challenging craft
Performance  - a challenging craftPerformance  - a challenging craft
Performance - a challenging craft
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
Ultimate Git Workflow - Seoul 2015
Ultimate Git Workflow - Seoul 2015Ultimate Git Workflow - Seoul 2015
Ultimate Git Workflow - Seoul 2015
 
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
CONFidence 2017: Hackers vs SOC - 12 hours to break in, 250 days to detect (G...
 
Blue team reboot - HackFest
Blue team reboot - HackFest Blue team reboot - HackFest
Blue team reboot - HackFest
 
How to fully automate a store.pptx
How to fully automate a store.pptxHow to fully automate a store.pptx
How to fully automate a store.pptx
 
Introduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agileIntroduzione alle metodologie di sviluppo agile
Introduzione alle metodologie di sviluppo agile
 
Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)Accessibility and web innovation. (no notes)
Accessibility and web innovation. (no notes)
 
Using Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain TransparencyUsing Blockchain to Increase Supply Chain Transparency
Using Blockchain to Increase Supply Chain Transparency
 
Adoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for EveryoneAdoption of AI: The Great Opportunities for Everyone
Adoption of AI: The Great Opportunities for Everyone
 
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
AB Testing, Ads and other 3rd party tags - London WebPerf - March 2018
 
Micro Frontends
Micro FrontendsMicro Frontends
Micro Frontends
 

Dernier

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Dernier (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Winning the metrics battle