BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!

Andreas Grabner
Your place in DevTOps is not about
finding more bugs, but problem
solutions

That’s why I ended up talking at BTD

The stuff we did
when we were a Start Up
and we All were
Devs, Testers and Ops

Utmost Goal: Minimize Cycle Time
feature cycle time time
Customer / Market Users

minimize Users

This is when you
create value!
minimize
You

Release
Acceptance
Testing
Unit Testing
Performance
Testing
Coding
Reduce Lead Time
Automate & Improve Process

700 Deployments / Year
50-60 Deployments / Day
10+ Deployments / Day
Every 11.6 seconds

Inside the Amazon Numbers!
75% fewer outages since 2006
90% fewer outage minutes
~0.001% of deployments cause a problem
Instantaneous automatic rollback
Deploying every 11.6s

Failures happen!!
Nobody likes it when …

Unless you work for
Google or Microsoft 

The “War Room”
Facebook – December 2012

… and potentially to this …

And this isn’t helping either
…

This ATTITUDE is
the main PROBLEM
as it LEADS to …

4 Use Cases:
WHY, HOW
& METRICS

Mobile Web Site: This SHOULDN’T happen!
434 Resources in total on that page:
230 JPEGs, 75 PNGs, 50 GIFs, …
Total size of ~
20MB

Fifa.com during Worldcup
http://apmblog.compuware.com/2014/05/21/is-the-fifa-world-cup-website-ready-for-the-tournament/

• Developers not using the browser built-in diagnostics tools
• Testers not doing a sanity checks with the same tools
• Some tools for you (Quick Demo)
• Built-in Inspectors via Ctrl-Shift-I in Chrome and Firefox
• YSlow, PageSpeed, SpeedTracer
• Dynatrace
• Level-Up: Automate Testing and Diagnostics Check
Lessons Learned – NO Excuse for …

Not every
Architect makes
good decisions

• Symptoms
• HTML takes between 60 and 120s to render
• High GC Time
• Developer Assumptions
• Bad GC Tuning
• Probably bad Database Performance as rendering was simple
• Result: 2 Years of Finger pointing between Dev and DBA
Project: Online Room Reservation System

Developers built own monitoring
void roomreservationReport(int officeId)
{
long startTime = System.currentTimeMillis();
Object data = loadDataForOffice(officeId);
long dataLoadTime = System.currentTimeMillis() - startTime;
generateReport(data, officeId);
}
Result:
Avg. Data Load Time: 45s!
DB Tool says:
Avg. SQL Query: <1ms!

#1: Loading too much data
24889! Calls to the
Database API!
High CPU and High
Memory Usage to keep all
data in Memory

#2: On individual connections
12444!
individual
connections
Classical N+1
Query Problem
Individual SQL
really <1ms

#3: Putting all data in temp Hashtable
Lots of time
spent in
Hashtable.get
Called from their
Entity Objects

# SQL Executions
# of SAME SQLs
Conn. Acquisition Time

• … you know what code is doing
• Challenge the developers
• Explore Tools that “might seem” out of your league!
• Built-In Database Analysis Tools
• “Logging” options of Frameworks such as Hibernate, …
• JMX, Perf Counters, … of your Application Servers
• Performance Tracing Tools: Dynatrace, NewRelic, AppDynamics,
…
Lessons Learned – Don’t Assume …

Test Environment
Production Environment
8x slower
3x more SQL

Test Environment Production Environment
Hibernate,
Classloading,
XML – The Key
Hotspots
Hibernate,
Classloading, XML
– The Key Hotspots
I/O for Web
Requests doesn’t
even show up!
That’s Normal:
Having I/O for Web
Request as main
contributor

Top Contributor
Class.getInterfaces
Called from Hibernates
FieldInterceptionHelper
These calls all originate
form thousands of calls to
find item by code

Top Methods related to XML
Processing
Classloading is triggered through
CustomMonnkey and the Xalan
Parser
Classloading is triggered through
CustomMonkey and the Xalan
Parser

Time Spent in API
# Calls to API

• Plan enough time for proper testing
• Anticipate changed user behavior during peak load
• Only test what really ends up in Production
Lessons Learned

Load Spike resulted in Unavailability
Adonair

Alternative: “GoDaddy goes DevOps”
1h before
SuperBowl KickOff
1h after
Game ended

• Share your Performance Expertise with Developers
• Implement & Test these “Feature Switch” Scenarios
• Demand Live Production Data for Future Projects
• Read Up & Educate yourself on more stories like this
Lessons Learned

•http://blog.dynatrace.com
•http://www.perfplanet.com/
•http://highscalability.com/
•http://blog.ruxit.com/
More of these Use Cases?

•# Images
•# Redirects
•Size of Resources
•# SQL Executions
•# of SAME SQLs
•# Items per Page
•# AJAX per Page
Remember: New Metrics When Testing Apps
•Time Spent in API
•# Calls into API
•# Functional Errors
•3rd Party calls
•# of Domains
•Total Size
•…

Level-Up Skills
Browser Diagnostics
Wireshark, Fiddler
JBoss, Tomcat, JConsole
IIS, ASP.NET, PerfLib
Oracle, SQL
Garbage Collection Thread Dumps

Architecture
# of Services
# of Databases
# of Servers
# of Calls between Components

Performance
Page Load Time Render Time SQL Query Time Service Call Time

Scalability
Memory Usage per User
# Connections
Cache Utilization
Load Distribution
Component Roundtrips

Collaboration
Automation
Sharing
Measuring
Level-Up Skills
Performance
Scalability
Architecture
4 Pillars of DevOps + Quality Focus

Putting it into Test Automation
12 0 120ms
3 1 68ms
Build 20 testPurchase OK
testSearch OK
testSearch OK
Build 18 testPurchase FAILED
testSearch OK
testSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms
3 1 68ms
12 5 60ms
3 1 68ms
75 0 230ms
3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Exceptions probably reason for
failed tests
Problem fixed but now we have an
architectural regression
Problem fixed but now we have an
architectural regressionNow we have the functional and
architectural confidence
Let’s look behind the
scenes

#1: Analyzing each Test
#2: Metrics for each Test
#3: Detecting Regression
based on Measure

#1: Test Status Overview
based on our new Metrics
#2: Lets the build fail

Release
Acceptance
Testing
Unit Testing
Performance
Testing
Monitor Tests
Analyze Results
Quality Gate in your Build Tool
Every 11.6 seconds

Deploy Faster!!
With Better Quality

Andreas Grabner
Your place in DevTOps is not about finding
more bugs, but problem solutions
Slides: slideshare.net/grabnerandi
Get Tools: bit.ly/dttrial
YouTube Tutorials: bit.ly/dttutorials
Contact Me: agrabner@dynatrace.com
Follow Me: @grabnerandi
Read More: blog.dynatrace.com

Incorrect Sizing
of Pools and
Queues

Online Banking: Slow Balance Check
1.69m (=101s!) To
Check Balance!
87% spent in IIS 600! SQL Executions

#1 Time really spent in IIS?
Tip: Elapsed Time tells us WHEN a
Method was executed!
Finding: Thread 32 in IIS waited 87s to
pass control to Thread 30 in ASP.NET
Tip: Thread# gives us insight on
Thread Queues / Switches

#2 What about these SQL Executions?
Finding: EVERY SQL
statement is executed on
ITS OWN Connection!
Tip: Look at
“GetConnection”

#2 SQL Executions! continued …
#1: Same SQL is executed 67! times
#2: NO PREPARATION
because everything
executed on new
Connection

Lessons Learned!
ASP.NET Worker
Thread Pool Sizing!
DB Connection Pools
More Efficient SQL

How to Monitor: Web Server
Idle vs. Busy
Threads

How to Monitor: Application Server
App Server Threads

Idle vs. Busy Threads
# SQLs / Request
# GetConnection

Collaboration
Automation
Sharing
Measuring
4 Pillars of DevOps + Quality Focus

BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!

Similaire à BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs! (20)

Plus de Andreas Grabner

Plus de Andreas Grabner (17)

BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!

Notes de l'éditeur