SlideShare une entreprise Scribd logo
1  sur  33
Sunday, 19 August 12
Triage
                       Dealing with errors in production
                             PyCon Australia 2012

                            Luke Cawood / @lwcd
                         Lars Yencken / @larsyencken




Sunday, 19 August 12
99designs



Sunday, 19 August 12
Sunday, 19 August 12
Balancer



                                  Cache
                                  Cache


                                  App
                                  App
                                   App
                                    App



                       Memcache     DB
                                    DB       Queue



                                             Worker


Sunday, 19 August 12
Balancer




                                   Cache


                                  App
                                  App
                                   App
                                    App



                       Memcache      DB      Queue



                                             Worker


Sunday, 19 August 12
Errors



Sunday, 19 August 12
Sunday, 19 August 12
Hmmm....




Sunday, 19 August 12
Triage



Sunday, 19 August 12
Triage
                       • Improve signal to noise ratio by aggregating
                         similar errors
                       • Allow for claiming, resolving and ranking
                         errors in terms of importance
                       • Integration with github, build tools
                       • Play with new tools and technology
                       • Provide open source alternative to
                         commercial products in this space
Sunday, 19 August 12
Round 1(Fight!)




Sunday, 19 August 12
Round 1(Fight!)

                       • Errors continue to log directly to mongo
                       • Aggregation via incremental MapReduce
                       • Deliver a prototype in one day


Sunday, 19 August 12
Sunday, 19 August 12
Scalability Fatality!

                       • Worked fine during development
                       • Production load caused the MapReduce to
                         asplode!
                       • (Not that we have a lot of errors, right?!)


Sunday, 19 August 12
Round 2




Sunday, 19 August 12
(sub)zeroMQ

                       •   Async error API using
                           zeroMQ pub/sub
                           sockets

                       •   MessagePack as error
                           format (fast, binary)

                       •   Aggregation in python




Sunday, 19 August 12
Aggregation Method

                       • Generate hash in python based on error
                         document
                       • Query mongo for error hash
                       • Create or update error document based
                         on outcome of query, incrementing
                         counters etc where appropriate



Sunday, 19 August 12
Sunday, 19 August 12
Sunday, 19 August 12
Sunday, 19 August 12
Scalability Fatality 2

                       • Multithreaded experiments
                       • Mongo optimisations
                        • There is no schema
                        • The cake is a lie
                       • Mongo ‘upsert’ rocks!

Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Sunday, 19 August 12
Outcomes & future



Sunday, 19 August 12
Outcomes

                       • Getting the ‘right’ level of grouping hard
                       • What to do with errors that just wont go
                         away?
                       • Error occurrence count - what does this
                         tell us?



Sunday, 19 August 12
Future

                       • Easier installation, package in pypi
                       • Better language support (plz halp)
                       • Drop in replacement for airbrake etc
                       • Client side logging (javascript)
                       • Email style filters & actions - ifttt.com

Sunday, 19 August 12
Thanks
                       •   99designs for research and development time

                       •   Contributors:

                               •   Luke Cawood - Project lead

                               •   Josh Benham - Developer

                               •   Jamison Lu - Developer

                           •   Additional assistance

                               •   Lars Yencken - Operations

                               •   99designs UX team




Sunday, 19 August 12
Thanks for listening!
                          https://github.com/lwc/triage



Sunday, 19 August 12

Contenu connexe

Similaire à Triage: real-world error logging for web applications

99 inception-deck
99 inception-deck99 inception-deck
99 inception-deck
drewz lin
 
[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java Interop[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java Interop
Andrey Breslav
 
Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012
Daum DNA
 
Performance for Product Developers
Performance for Product DevelopersPerformance for Product Developers
Performance for Product Developers
Matthew Wilkes
 

Similaire à Triage: real-world error logging for web applications (20)

[Phind] Miracle
[Phind] Miracle[Phind] Miracle
[Phind] Miracle
 
Rubypalooza 2009
Rubypalooza 2009Rubypalooza 2009
Rubypalooza 2009
 
Disposable Testing Environments: There's Nothing Like Production Except Produ...
Disposable Testing Environments: There's Nothing Like Production Except Produ...Disposable Testing Environments: There's Nothing Like Production Except Produ...
Disposable Testing Environments: There's Nothing Like Production Except Produ...
 
Cloud4all Architecture Overview
Cloud4all Architecture OverviewCloud4all Architecture Overview
Cloud4all Architecture Overview
 
Pagetypes
PagetypesPagetypes
Pagetypes
 
Html5 new sword for interactive app
Html5 new sword for interactive appHtml5 new sword for interactive app
Html5 new sword for interactive app
 
Responsive Web Design & Workflow
Responsive Web Design & WorkflowResponsive Web Design & Workflow
Responsive Web Design & Workflow
 
99 inception-deck
99 inception-deck99 inception-deck
99 inception-deck
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metrics
 
Caching, sharding, distributing - Scaling best practices
Caching, sharding, distributing - Scaling best practicesCaching, sharding, distributing - Scaling best practices
Caching, sharding, distributing - Scaling best practices
 
Cross-platform tools for mobile application development
Cross-platform tools for mobile application developmentCross-platform tools for mobile application development
Cross-platform tools for mobile application development
 
[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java Interop[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java Interop
 
100% JS
100% JS100% JS
100% JS
 
Core Data in Motion
Core Data in MotionCore Data in Motion
Core Data in Motion
 
JS-Everywhere - LocalStorage Hands-on
JS-Everywhere - LocalStorage Hands-onJS-Everywhere - LocalStorage Hands-on
JS-Everywhere - LocalStorage Hands-on
 
Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012
 
Firefoxos bcndevcon
Firefoxos bcndevconFirefoxos bcndevcon
Firefoxos bcndevcon
 
Performance for Product Developers
Performance for Product DevelopersPerformance for Product Developers
Performance for Product Developers
 
Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Triage: real-world error logging for web applications

  • 2. Triage Dealing with errors in production PyCon Australia 2012 Luke Cawood / @lwcd Lars Yencken / @larsyencken Sunday, 19 August 12
  • 5. Balancer Cache Cache App App App App Memcache DB DB Queue Worker Sunday, 19 August 12
  • 6. Balancer Cache App App App App Memcache DB Queue Worker Sunday, 19 August 12
  • 11. Triage • Improve signal to noise ratio by aggregating similar errors • Allow for claiming, resolving and ranking errors in terms of importance • Integration with github, build tools • Play with new tools and technology • Provide open source alternative to commercial products in this space Sunday, 19 August 12
  • 13. Round 1(Fight!) • Errors continue to log directly to mongo • Aggregation via incremental MapReduce • Deliver a prototype in one day Sunday, 19 August 12
  • 15. Scalability Fatality! • Worked fine during development • Production load caused the MapReduce to asplode! • (Not that we have a lot of errors, right?!) Sunday, 19 August 12
  • 16. Round 2 Sunday, 19 August 12
  • 17. (sub)zeroMQ • Async error API using zeroMQ pub/sub sockets • MessagePack as error format (fast, binary) • Aggregation in python Sunday, 19 August 12
  • 18. Aggregation Method • Generate hash in python based on error document • Query mongo for error hash • Create or update error document based on outcome of query, incrementing counters etc where appropriate Sunday, 19 August 12
  • 22. Scalability Fatality 2 • Multithreaded experiments • Mongo optimisations • There is no schema • The cake is a lie • Mongo ‘upsert’ rocks! Sunday, 19 August 12
  • 23. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 24. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 25. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 26. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 27. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 30. Outcomes • Getting the ‘right’ level of grouping hard • What to do with errors that just wont go away? • Error occurrence count - what does this tell us? Sunday, 19 August 12
  • 31. Future • Easier installation, package in pypi • Better language support (plz halp) • Drop in replacement for airbrake etc • Client side logging (javascript) • Email style filters & actions - ifttt.com Sunday, 19 August 12
  • 32. Thanks • 99designs for research and development time • Contributors: • Luke Cawood - Project lead • Josh Benham - Developer • Jamison Lu - Developer • Additional assistance • Lars Yencken - Operations • 99designs UX team Sunday, 19 August 12
  • 33. Thanks for listening! https://github.com/lwc/triage Sunday, 19 August 12