Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty

This is the presentation given for the Docker Meetup in Cordoba, Argentina. Recording should soon be up on http://www.meetup.com/Docker-Cordoba-ARG/events/226995018/

Key Takeaways: Pick your Metrics! Automate It! Fail Bad Builds Faster! Deliver Faster with Better Quality!

To the Docker Audience my main point was that: Just adding Docker doesn't give you free performance and scalability of your app. I walk through many examples of failing apps. What are the metrics that highlight the problem and how to automatically detect bad builds by looking at these Metrics along your Pipeline.

  • Identifiez-vous pour voir les commentaires

Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty

  1. 1. 1 @Dynatrace Application Quality Metrics for your Pipeline (and why Docker is not the solution to all of your problems) Andreas (Andi) Grabner - @grabnerandi Metrics-Driven DevOps
  2. 2. 700 deployments / year 10 + deployments / day 50 – 60 deployments / day Every 11.6 seconds
  3. 3. Example #1: Online Casino 282! Objects on that page9.68MB Page Size 8.8s Page Load Time Most objects are images delivered from your main domain Very long Connect time (1.8s) to your CDN
  4. 4. 879! SQL Queries8!Missing CSS & JS Files 340!Calls to GetItemById Example #2: Lawyer Website based on SharePoint 11s!To load Landing Page
  5. 5. • Waterfall  Agile: 3 years • 220 Apps - 1 deployment per month “EVERYONE can do Continuous Delivery” “Every manual tester does AUTOMATION” “WE DON’T LOG BUGS – WE FIX THEM!” Measures Built-In, Visible to Everyone Promote your Wins, Educate your Peers
  6. 6. Challenges
  7. 7. Fail Faster!?
  8. 8. Its not about blind automation of pushing more bad code through a shiny pipeline
  9. 9. Metrics based Decision
  10. 10. Availability dropped to 0%
  11. 11. Bad Deployment based on Resource Consumption
  12. 12. With increasing load: Which LAYER doesn’t SCALE?
  13. 13. App with Regular Load supported by 10 Containers Twice the Load but 48 (=4.8x!) Containers! App doesn’t scale!!
  14. 14. Technical Debt!
  15. 15. 80% $60B
  16. 16. Insufficient Focus on Quality
  17. 17. The “War Room” Facebook – December 2012
  18. 18. 20% 80%
  19. 19. I learning from others
  20. 20. 4 use cases  WHY did it happen?  HOW to avoid it!  METRICS to guide you.
  21. 21. #1 : Not every Architect makes good decisions
  22. 22. • Symptoms • HTML takes between 60 and 120s to render • High GC Time • Developer Assumptions • Bad GC Tuning • Probably bad Database Performance as rendering was simple • Result: 2 Years of Finger pointing between Dev and DBA Project: Online Room Reservation System
  23. 23. Developers built own monitoring void roomreservationReport(int officeId) { long startTime = System.currentTimeMillis(); Object data = loadDataForOffice(officeId); long dataLoadTime = System.currentTimeMillis() - startTime; generateReport(data, officeId); } Result: Avg. Data Load Time: 45s! DB Tool says: Avg. SQL Query: <1ms!
  24. 24. #1: Loading too much data 24889! Calls to the Database API! High CPU and High Memory Usage to keep all data in Memory
  25. 25. #2: On individual connections 12444! individual connections Classical N+1 Query Problem Individual SQL really <1ms
  26. 26. #3: Putting all data in temp Hashtable Lots of time spent in Hashtable.get Called from their Entity Objects
  27. 27. • … you know what code is doing you inherited!! • … you are not making mistakes like this  • Explore the Right Tools • Built-In Database Analysis Tools • “Logging” options of Frameworks such as Hibernate, … • JMX, Perf Counters, … of your Application Servers • Performance Tracing Tools: Dynatrace, Ruxit, NewRelic, AppDynamics, Your Profiler of Choice … Lessons Learned – Don’t Assume …
  28. 28. Key Metrics # of SQL Calls # of same SQL Execs (1+N) # of Connections Rows/Data Transferred
  29. 29. 41 @Dynatrace
  30. 30. 42 @Dynatrace #2 There is no easy "Migration" to Micro(Services)
  31. 31. 43 @Dynatrace 26.7s Execution Time 33! Calls to the same Web Service 171! SQL Queries through LINQ by this Web Service – request similar data for each call Architecture Violation: Direct access to DB instead from frontend logic
  32. 32. 44 @Dynatrace Key Metrics # Service Calls, # Containers # of Threads, Sync and Wait # SQL executions # of SAME SQL’s Payload (kB) of Service Calls
  33. 33. 45 @Dynatrace
  34. 34. 46 @Dynatrace #3 don't ASSUME you know the environment
  35. 35. Distance calculation issues 480km biking in 1 hour! Solution: Unit Test in Live App reports Geo Calc Problems Finding: Only happens on certain Android versions
  36. 36. 3rd party issues Impact of bad 3rd party calls
  37. 37. 49 @Dynatrace Key Metrics # of functional errors # and Status of 3rd party calls Payload of Calls
  38. 38. 51 @Dynatrace #4 Thinking Big? Then Start Small!
  39. 39. 52 @Dynatrace Load Spike resulted in Unavailability Adonair
  40. 40. 53 @Dynatrace Alternative: “GoDaddy goes DevOps” 1h before SuperBowl KickOff 1h after Game ended
  41. 41. 54 @Dynatrace Key Metrics # Domains Total Size of Content
  42. 42. 55 @Dynatrace What have we learned so far?
  43. 43. 56 @Dynatrace 1. # Resources 2. Size of Resources 3. Page Size 4. # Functional Errors 5. 3rd Party calls 6. # SQL Executions 7. # of SAME SQLs Metric Based Decisions Are Cool
  44. 44. We want to get from here …
  45. 45. To here!
  46. 46. Use these application metrics as additional Quality Gates
  47. 47. 60 What you currently measure What you should measure Quality Metrics in your pipeline # Test Failures Overall Duration Execution Time per test # calls to API # executed SQL statements # Web Service Calls # JMS Messages # Objects Allocated # Exceptions # Log Messages # HTTP 4xx/5xx Request/Response Size Page Load/Rendering Time …
  48. 48. Extend your Continuous Integration 12 0 120ms 3 1 68ms Build 20 testPurchase OK testSearch OK Build 17 testPurchase OK testSearch OK Build 18 testPurchase FAILED testSearch OK Build 19 testPurchase OK testSearch OK Build # Test Case Status # SQL # Excep CPU 12 0 120ms 3 1 68ms 12 5 60ms 3 1 68ms 75 0 230ms 3 1 68ms Test & Monitoring Framework Results Architectural Data We identified a regresesion Problem solved Exceptions probably reason for failed tests Problem fixed but now we have an architectural regression Problem fixed but now we have an architectural regressionNow we have the functional and architectural confidence Let’s look behind the scenes
  49. 49. #1: Analyzing every Unit & Integration test #2: Metrics for each test #3: Detecting regression based on measure Unit/Integration Tests are auto baselined! Regressions auto-detected!
  50. 50. Build-by-Build Quality View Build Quality Overview in Dynatrace or Jenkins Build Quality Overview in Dynatrace & your CI server
  51. 51. Production Data: Real User & Application Monitoring
  52. 52. Recap!
  53. 53. #1: Pick your App Metrics # of Service Calls Bytes Sent & Received # of Worker Threads # of Worker Threads # of SQL Calls, # of Same SQLs # of DB Connections # of SQL Calls, # of Same SQLs # of DB Connections
  54. 54. #2: Figure out how to monitor them http://bit.ly/dtpersonal
  55. 55. #3: Automate it into your Pipeline
  56. 56. #4: Also do it in Production
  57. 57. Draw better Unicorns 
  58. 58. 75 @Dynatrace Questions and/or Demo Slides: slideshare.net/grabnerandi Get Tools: bit.ly/dtpersonal YouTube Tutorials: bit.ly/dttutorials Contact Me: agrabner@dynatrace.com Follow Me: @grabnerandi Read More: blog.dynatrace.com
  59. 59. 76 @Dynatrace Andreas Grabner Dynatrace Developer Advocate @grabnerandi http://blog.dynatrace.com

×