SlideShare une entreprise Scribd logo
1  sur  74
Flight Checks:
Quality Assurance for Releases that Prevent
Disasters from Escaping into the Wild
Brie Hoblin
@bhoblin
QA Engineer
Sage Logik, LLC
Imagine…
You work at a start-up, and a client is breathing down
your neck…(or your boss’s neck)
Saying “this next release is absolutely the most important
release ever”
“…And by the way can you just add these 3 little things by
Thursday?” (It is Tuesday.)
…And then your project manager says “SURE!”
…And then all the developer’s heads explode.
And as the QA Engineer, you say
Maybe that’s a
familiar situation…
Imagine…
Or maybe you work at a big company…
And there’s some really big new features that have been in
testing for close to 2 weeks…
And on the last day of testing you find there’s a really big
bug that impacts crucial functionality…
Now what?
Today we’re going to talk about 2
things:
Making Deployment
Decisions…
1. How to make (or at least communicate
recommendations to the people who do make)
decisions to release (with bugs), or delay, and whether
or not to rollback vs. hotfixing in production when bugs
do make it out into the wild
Release / delay?
Rollback / hotfix?
Disaster
Avoidance/Prevention
2. How to reduce the likelihood that those high pressure
moments will happen or, in other words,
How to prevent disasters
But first let’s back up for a minute
How do we know a disaster when
we see one?
What exactly constitutes a
‘disaster’?
First let’s look at some well-known
disasters recognized by history:
Mars Climate Orbiter
• Designed to gather data on Mars’ climate & atmosphere
• & to be the communications relay for Mars Polar Lander
But…
It approached the planet at the
wrong angle…
…And then it exploded.
Why?
Because one of the scientists that worked on
the ground-based computer software
created output in units of pounds/seconds
instead of the SI units of newton/seconds
specified in the contract between NASA and
Lockheed.
Wrong units!!!
Or what about the Prius software
bug in 2014?
• Various warning lights would light up
• Car would enter ‘safe mode’
• Some cars stopped suddenly while being driven
Why?
Because a few software settings caused
higher thermal stress in certain
transistors, causing them to become
damaged or deformed.
Spaghetti Code!
Lack of integration
testing!
“But we work on websites not cars
or spacecraft!”
Amazon 1p price glitch:
Why?
• Caused by a glitch in 3rd party software provided by
Repricer Express
• Repriced thousands of products from mattresses to
Playstation 4’s to just 1p
• Small retailers lost tens of thousands of dollars overnight
and faced bankruptcy
• Both Amazon & RepricerExpress did not offer any
compensation to sellers
3rd party software fail!
Whenever we have a client paying
us to make them software, there’s
important stuff on the line.
So what made those bugs
disasters?
A disaster is (for the purposes of this presentation) a
software bug that harms:
o The client’s bottom line
o The users’ bottom line
o The client’s faith in you
o The users’ trust in the website / app
o Users’ ability to use the website/app
OR
o Your reputation
To an untenable degree.
Caveats
…Or causes maimings or deaths
(Therac 25, Toyota Camry--Barbara Schwarz)
….Otherwise probably has to affect
a significant number of users (if 2% of
users lose faith in your website, may not be a ‘disaster’)
Additionally…
• How will the bug impact internal
resources as you make reparations / fixes?
• Can the user recover by taking a
reasonable action?
• How captive is your audience?
• Have you exposed sensitive data?
Can you recognize a
disaster?
Scenario 1: You work on a website that sells outdoor gear
and it is discovered a few hours after your latest release
that there is a javascript error that causes the quantity of
the last item in the cart to decrease when the user uses the
mouse to scroll down the page to click “Purchase,” leading
to incorrect totals/ missing items in the order that the user
may not notice on the following confirmation page—but
the incorrect total is displayed.
Disaster?
Maybe
Impacts user’s trust in the website to an untenable
degree…(“I didn’t get my medium blue softshell jacket
before I left for vacation!”)
Unless the company makes quick reparations.
So NOT a disaster unless handled poorly by the company,
OR if the cost is too great in terms of internal resources
needed to make reparations, resulting in damage to the
company’s bottom line—was the bug released during a
time of high traffic? Are there a lot of orders to fix?
Scenario 2: You work on a website similar to jumponit and
as you navigate from the homepage to a specific deal and
then back to the homepage, the website loses track of your
geolocation and changes your ‘current location’ to a
random city. There is no way for the user to reselect their
location. This bug is not discovered until after being
released and is reported by a user in a large metropolitan
area almost 24 hours after the release.
Disaster
• If verified, this impacts a large group of users, maybe all
users
• There is no way for users to recover
• Website is completely unusable to anyone who doesn’t
happen to be randomly traveling to wherever the
website decided they were located
• Impacts user’s trust in the website, BUT users will
probably come back because there’s good deals
• More importantly is impact on bottom line from lost
sales for period of time bug was out there
Scenario 3: The client calls and screams at your project
manager because they can’t log in as a full admin and view
important financial reports. They’ve been trying to reach
your company for hours because there is a big finance
review meeting in 10 minutes…
Disaster
(or at least pretty close to it)
At that point you’ve probably
significantly dinged your client’s
trust in you.
When things look like
disasters but aren’t…
Maybe not ideal, but…
• Client reports terrible bug while smoke testing a new
release…turns out they haven’t cleared their cache
• Company admin reports terrible bug while
impersonating user…that does not affect actual users
• Homepage of website initially loads fine, but then goes
blank and reports no results found for your area after
about 20 seconds…but it’s ok because most users click
on something and go to another page in the first 5
seconds
When things don’t look
like disasters but are…
Looks fine, really isn’t!
• Lack of integration testing so everyone knows it’s fine
within their part of the app/website but no one has
tested the whole thing
• Third party software changes that go unnoticed
• Subtle calculation errors when checking out, especially
in fees, taxes, or percentages paid out to vendors
• Works in your neck of the woods but nowhere else
(geolocation issues specific to other geolocations, time
zone issues specific to other time zones)
Looks fine, really isn’t!
• Everyone tests in Chrome but significant % of users has
issue in other browser
• Everyone tests web and completely forgets mobile
• Subtle calculation errors leading to falsely positive
results in business reports
Let’s carry this all one step
further…
Deploy or Delay?
• When possible do a little of both. Three hours of testing is
better than none.
• Start ups are a culture of higher risk—so be prepared to take
more risks and test less than you would in a more established
company—lean towards deploying
• Really focus whatever time for testing you have on core
functionality and on the devices used the most to access your
website/app
• Devote some sliver of time to testing admin functionality
• If core functionality is compromised, DELAY
• Consider the quality of what is currently out there—if latest
release is an improvement, release it even with the bugs
Deploy or Delay?
• Scenario 1: You’re testing an app that is map-based and
it doesn’t load smoothly. You can’t zoom in or out
effectively, and it takes forever for parts of the map to
load. You feel frustrated trying to use the app, so the
users will too. It’s slow and stutters on both iPhones and
Androids. You can find the information you’re looking
for, it just takes a long time and a lot of patience. The
current version of the app crashes every 30 seconds and
you cannot accomplish any basic tasks.
Deploy or Delay?
Scenario 2: The website you’re testing allows users to
barter for services with each other, review each other
publicly, and post items for sale. It is currently very stable
and usable but is only available in your local city. This next
release is to expand the website into 3 additional cities. You
discover in the eleventh hour of testing that new users
cannot successfully sign up—something broke in the most
recent round of fixes. Additionally, existing users can no
longer review each other.
Rollback?
• Have a rollback plan in place. Decide ahead of time what will
trigger a rollback.
• Is core functionality being impacted?
• Will a fix (either coding it or testing it) take a significant
amount of time?
• How easy is it to rollback? Was everything snapshotted at the
same time? Will info be lost in the meantime?
• How recent is your database backup?
• Are there database migrations that are either difficult, or
require more time to rollback?
• How long will it take to rollback vs. hotfixing?
• The risk of rollback failing is equal to the risk of failure to
deploy
Scenario 1: After the latest release you discover during
smoke testing that you cannot successfully purchase
anything on the website. The lead dev determines it will
take them a full day to fix things. You know the team has
been able to snapshot everything for a stable save point
across different components of the website.
Scenario 2: After the latest software release, you discover
that 3 important variables that are used to generate
financial reports concerning overall revenue are being
calculated at inflated values. All 3 are consistently inflated
by 12.5%. The developers need a few days to fix things. To
rollback would be difficult because it turns out there is a
problem with your database backup.
Hotfix?
• Is it a small change?
• Does it involve any dependencies?
• Is it impossible to rollback?
• Is core functionality being compromised?
Scenario 1: You’re smoke testing a release and notice that
the company name is spelled wrong on the homepage, in
the header.
Scenario 2: Your company just did a major release that
adds a new feature to your website where users can upload
photos and automatically create a slideshow for their
listings in your directory. Unfortunately all the existing
photos on the website are now not loading, and, users
cannot upload photos either. And even though everything
worked in your test environment, it also turns out that
users cannot create a new listing in the directory either.
Preventing Disasters
Recipe for Disaster:
• Culture of ego, devs who assume they can pull it off
rather than asking thorough questions
• Lack of communication about requirements
• Project manager who always says yes to the client
• Culture that does not adequately weigh the risks of
moving too fast against the value of testing and doing
things right
Recipe for Disaster:
• Testing has been outsourced overseas
• Team has no diversity
• Lots of third party dependencies
• Lack of time for testing
• When devs are expected to do most of the testing
• When devs are not aware of widely accepted design
standards
• Party culture that is cavalier about business needs of
client
Preventing Disasters:
• When in doubt, ask questions!
• Involve QA in the design process
• Foster interdependence vs competition
• Be thorough about requirements gathering and then firm
with client about not changing them for that release
• Take the potential for disaster seriously. You are not
immune.
• If testing is going to be outsourced, up communication.
Preferably, keep testing in house.
Preventing Disasters:
• Value diversity within your team. If your team lacks
diversity seek other individuals to offer quick feedback.
• Be diligent about integration testing around third party
software. (And be clear about how they will handle bugs
in their software if it leads to loss of revenue.)
• QA time should be roughly equal to dev time. Really.
• Even the best devs mostly test that things work. QA will
actually try to break things. Don’t leave all the testing to
devs.
• Encourage devs to be aware of existing design
standards.
• Work hard, then play hard. Client’s needs are main
focus.
Thank you!
Brie Hoblin
@bhoblin
brie@sagelogik.com
Sage Logik, LLC
Resources
http://www.computerworld.com/article/2515483/enterprise-
applications/epic-failures--11-infamous-software-bugs.html
http://www.reuters.com/article/us-toyota-recall-
idUSBREA1B1B920140212
http://www.safetyresearch.net/blog/articles/toyota-
unintended-acceleration-and-big-bowl-
%E2%80%9Cspaghetti%E2%80%9D-code
http://www.computerworlduk.com/it-vendors/amazon-1p-
price-glitch-who-should-pay-up-3591160/
Resources Cont’d
http://www.cityam.com/205643/amazon-glitch-sparks-
outrage
https://blackpixel.com/writing/2015/06/why-good-qa-
matters-to-businesses.html
http://www.inc.com/adam-vaccaro/diversity-and-
performance.html

Contenu connexe

Tendances

Responsive, adaptive and responsible - keynote at NebraskaJS
Responsive, adaptive and responsible - keynote at NebraskaJSResponsive, adaptive and responsible - keynote at NebraskaJS
Responsive, adaptive and responsible - keynote at NebraskaJS
Christian Heilmann
 
The image problem of the web and how to solve it…
The image problem of the web and how to solve it…The image problem of the web and how to solve it…
The image problem of the web and how to solve it…
Christian Heilmann
 
FutureOfAgile
FutureOfAgileFutureOfAgile
FutureOfAgile
Rob Healy
 

Tendances (18)

WordCamp NL 2016
WordCamp NL 2016WordCamp NL 2016
WordCamp NL 2016
 
Web benefits
Web benefitsWeb benefits
Web benefits
 
9 Productive Tips to Work Faster
9 Productive Tips to Work Faster9 Productive Tips to Work Faster
9 Productive Tips to Work Faster
 
A call to JS Developers - Let’s stop trying to impress each other and start b...
A call to JS Developers - Let’s stop trying to impress each other and start b...A call to JS Developers - Let’s stop trying to impress each other and start b...
A call to JS Developers - Let’s stop trying to impress each other and start b...
 
Responsive, adaptive and responsible - keynote at NebraskaJS
Responsive, adaptive and responsible - keynote at NebraskaJSResponsive, adaptive and responsible - keynote at NebraskaJS
Responsive, adaptive and responsible - keynote at NebraskaJS
 
The Seven DevOps Sins
The Seven DevOps SinsThe Seven DevOps Sins
The Seven DevOps Sins
 
Turning huge ships - Open Source and Microsoft
Turning huge ships - Open Source and MicrosoftTurning huge ships - Open Source and Microsoft
Turning huge ships - Open Source and Microsoft
 
The State of the Web - Helsinki meetup
The State of the Web - Helsinki meetupThe State of the Web - Helsinki meetup
The State of the Web - Helsinki meetup
 
Tips & tricks for building your product quicker
Tips & tricks for building your product quickerTips & tricks for building your product quicker
Tips & tricks for building your product quicker
 
Are You Really Using Kanban?
Are You Really Using Kanban?Are You Really Using Kanban?
Are You Really Using Kanban?
 
Creating a Pipeline - LeanAgileKC 2015
Creating a Pipeline - LeanAgileKC 2015Creating a Pipeline - LeanAgileKC 2015
Creating a Pipeline - LeanAgileKC 2015
 
Breaking out of the Tetris mind set #btconf
Breaking out of the Tetris mind set #btconfBreaking out of the Tetris mind set #btconf
Breaking out of the Tetris mind set #btconf
 
The StartUp Agency - A Case Study on CFPB
The StartUp Agency - A Case Study on CFPBThe StartUp Agency - A Case Study on CFPB
The StartUp Agency - A Case Study on CFPB
 
How We Make Apps And Services
How We Make Apps And ServicesHow We Make Apps And Services
How We Make Apps And Services
 
The image problem of the web and how to solve it…
The image problem of the web and how to solve it…The image problem of the web and how to solve it…
The image problem of the web and how to solve it…
 
FutureOfAgile
FutureOfAgileFutureOfAgile
FutureOfAgile
 
Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
Doing monitoring right
Doing monitoring rightDoing monitoring right
Doing monitoring right
 

Similaire à Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

U test whitepaper_10
U test whitepaper_10U test whitepaper_10
U test whitepaper_10
eshwar83
 
Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and Grails
Steve Pember
 
2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris
Eric Ries
 

Similaire à Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild (20)

Bug Advocacy
Bug AdvocacyBug Advocacy
Bug Advocacy
 
Підтримка легасі-платформи. Погляд менеджера
Підтримка легасі-платформи. Погляд менеджераПідтримка легасі-платформи. Погляд менеджера
Підтримка легасі-платформи. Погляд менеджера
 
bug-advocacy
bug-advocacybug-advocacy
bug-advocacy
 
A Perfect Launch, Every Time
A Perfect Launch, Every TimeA Perfect Launch, Every Time
A Perfect Launch, Every Time
 
The Open Commerce Conference - Premature Optimisation: The Root of All Evil
The Open Commerce Conference - Premature Optimisation: The Root of All EvilThe Open Commerce Conference - Premature Optimisation: The Root of All Evil
The Open Commerce Conference - Premature Optimisation: The Root of All Evil
 
Website qa
Website qaWebsite qa
Website qa
 
Web Page Speed - A Most Important Feature
Web Page Speed - A Most Important FeatureWeb Page Speed - A Most Important Feature
Web Page Speed - A Most Important Feature
 
U test whitepaper_10
U test whitepaper_10U test whitepaper_10
U test whitepaper_10
 
Bug Advocacy
Bug AdvocacyBug Advocacy
Bug Advocacy
 
The art of Bugging
The art of BuggingThe art of Bugging
The art of Bugging
 
Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and Grails
 
Ensuring Your Technology Will Scale
Ensuring Your Technology Will ScaleEnsuring Your Technology Will Scale
Ensuring Your Technology Will Scale
 
Bug Reporting Template
Bug Reporting TemplateBug Reporting Template
Bug Reporting Template
 
Decision Making
Decision MakingDecision Making
Decision Making
 
2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris
 
Mobile Apps for Businesses
Mobile Apps for BusinessesMobile Apps for Businesses
Mobile Apps for Businesses
 
Intro to IBM Bluemix DevOps Services, an open lab for IBM InterConnect
Intro to IBM Bluemix DevOps Services, an open lab for IBM InterConnectIntro to IBM Bluemix DevOps Services, an open lab for IBM InterConnect
Intro to IBM Bluemix DevOps Services, an open lab for IBM InterConnect
 
Session slides
Session slidesSession slides
Session slides
 
Session slides
Session slidesSession slides
Session slides
 
Session slides
Session slidesSession slides
Session slides
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

  • 1. Flight Checks: Quality Assurance for Releases that Prevent Disasters from Escaping into the Wild Brie Hoblin @bhoblin QA Engineer Sage Logik, LLC
  • 2. Imagine… You work at a start-up, and a client is breathing down your neck…(or your boss’s neck)
  • 3. Saying “this next release is absolutely the most important release ever”
  • 4. “…And by the way can you just add these 3 little things by Thursday?” (It is Tuesday.)
  • 5. …And then your project manager says “SURE!”
  • 6. …And then all the developer’s heads explode.
  • 7. And as the QA Engineer, you say
  • 9. Imagine… Or maybe you work at a big company…
  • 10. And there’s some really big new features that have been in testing for close to 2 weeks…
  • 11. And on the last day of testing you find there’s a really big bug that impacts crucial functionality…
  • 13. Today we’re going to talk about 2 things:
  • 14. Making Deployment Decisions… 1. How to make (or at least communicate recommendations to the people who do make) decisions to release (with bugs), or delay, and whether or not to rollback vs. hotfixing in production when bugs do make it out into the wild Release / delay? Rollback / hotfix?
  • 15. Disaster Avoidance/Prevention 2. How to reduce the likelihood that those high pressure moments will happen or, in other words, How to prevent disasters
  • 16. But first let’s back up for a minute
  • 17. How do we know a disaster when we see one?
  • 18. What exactly constitutes a ‘disaster’?
  • 19. First let’s look at some well-known disasters recognized by history:
  • 21. • Designed to gather data on Mars’ climate & atmosphere • & to be the communications relay for Mars Polar Lander
  • 22. But… It approached the planet at the wrong angle…
  • 23. …And then it exploded.
  • 24. Why?
  • 25. Because one of the scientists that worked on the ground-based computer software created output in units of pounds/seconds instead of the SI units of newton/seconds specified in the contract between NASA and Lockheed.
  • 27. Or what about the Prius software bug in 2014?
  • 28. • Various warning lights would light up • Car would enter ‘safe mode’ • Some cars stopped suddenly while being driven
  • 29. Why?
  • 30. Because a few software settings caused higher thermal stress in certain transistors, causing them to become damaged or deformed.
  • 31. Spaghetti Code! Lack of integration testing!
  • 32. “But we work on websites not cars or spacecraft!”
  • 33. Amazon 1p price glitch:
  • 34. Why?
  • 35. • Caused by a glitch in 3rd party software provided by Repricer Express • Repriced thousands of products from mattresses to Playstation 4’s to just 1p • Small retailers lost tens of thousands of dollars overnight and faced bankruptcy • Both Amazon & RepricerExpress did not offer any compensation to sellers
  • 37. Whenever we have a client paying us to make them software, there’s important stuff on the line.
  • 38. So what made those bugs disasters?
  • 39. A disaster is (for the purposes of this presentation) a software bug that harms: o The client’s bottom line o The users’ bottom line o The client’s faith in you o The users’ trust in the website / app o Users’ ability to use the website/app OR o Your reputation To an untenable degree.
  • 40. Caveats …Or causes maimings or deaths (Therac 25, Toyota Camry--Barbara Schwarz) ….Otherwise probably has to affect a significant number of users (if 2% of users lose faith in your website, may not be a ‘disaster’)
  • 41. Additionally… • How will the bug impact internal resources as you make reparations / fixes? • Can the user recover by taking a reasonable action? • How captive is your audience? • Have you exposed sensitive data?
  • 42. Can you recognize a disaster?
  • 43. Scenario 1: You work on a website that sells outdoor gear and it is discovered a few hours after your latest release that there is a javascript error that causes the quantity of the last item in the cart to decrease when the user uses the mouse to scroll down the page to click “Purchase,” leading to incorrect totals/ missing items in the order that the user may not notice on the following confirmation page—but the incorrect total is displayed.
  • 45. Impacts user’s trust in the website to an untenable degree…(“I didn’t get my medium blue softshell jacket before I left for vacation!”) Unless the company makes quick reparations. So NOT a disaster unless handled poorly by the company, OR if the cost is too great in terms of internal resources needed to make reparations, resulting in damage to the company’s bottom line—was the bug released during a time of high traffic? Are there a lot of orders to fix?
  • 46. Scenario 2: You work on a website similar to jumponit and as you navigate from the homepage to a specific deal and then back to the homepage, the website loses track of your geolocation and changes your ‘current location’ to a random city. There is no way for the user to reselect their location. This bug is not discovered until after being released and is reported by a user in a large metropolitan area almost 24 hours after the release.
  • 48. • If verified, this impacts a large group of users, maybe all users • There is no way for users to recover • Website is completely unusable to anyone who doesn’t happen to be randomly traveling to wherever the website decided they were located • Impacts user’s trust in the website, BUT users will probably come back because there’s good deals • More importantly is impact on bottom line from lost sales for period of time bug was out there
  • 49. Scenario 3: The client calls and screams at your project manager because they can’t log in as a full admin and view important financial reports. They’ve been trying to reach your company for hours because there is a big finance review meeting in 10 minutes…
  • 50. Disaster (or at least pretty close to it)
  • 51. At that point you’ve probably significantly dinged your client’s trust in you.
  • 52. When things look like disasters but aren’t…
  • 53. Maybe not ideal, but… • Client reports terrible bug while smoke testing a new release…turns out they haven’t cleared their cache • Company admin reports terrible bug while impersonating user…that does not affect actual users • Homepage of website initially loads fine, but then goes blank and reports no results found for your area after about 20 seconds…but it’s ok because most users click on something and go to another page in the first 5 seconds
  • 54. When things don’t look like disasters but are…
  • 55. Looks fine, really isn’t! • Lack of integration testing so everyone knows it’s fine within their part of the app/website but no one has tested the whole thing • Third party software changes that go unnoticed • Subtle calculation errors when checking out, especially in fees, taxes, or percentages paid out to vendors • Works in your neck of the woods but nowhere else (geolocation issues specific to other geolocations, time zone issues specific to other time zones)
  • 56. Looks fine, really isn’t! • Everyone tests in Chrome but significant % of users has issue in other browser • Everyone tests web and completely forgets mobile • Subtle calculation errors leading to falsely positive results in business reports
  • 57. Let’s carry this all one step further…
  • 58. Deploy or Delay? • When possible do a little of both. Three hours of testing is better than none. • Start ups are a culture of higher risk—so be prepared to take more risks and test less than you would in a more established company—lean towards deploying • Really focus whatever time for testing you have on core functionality and on the devices used the most to access your website/app • Devote some sliver of time to testing admin functionality • If core functionality is compromised, DELAY • Consider the quality of what is currently out there—if latest release is an improvement, release it even with the bugs
  • 59. Deploy or Delay? • Scenario 1: You’re testing an app that is map-based and it doesn’t load smoothly. You can’t zoom in or out effectively, and it takes forever for parts of the map to load. You feel frustrated trying to use the app, so the users will too. It’s slow and stutters on both iPhones and Androids. You can find the information you’re looking for, it just takes a long time and a lot of patience. The current version of the app crashes every 30 seconds and you cannot accomplish any basic tasks.
  • 60. Deploy or Delay? Scenario 2: The website you’re testing allows users to barter for services with each other, review each other publicly, and post items for sale. It is currently very stable and usable but is only available in your local city. This next release is to expand the website into 3 additional cities. You discover in the eleventh hour of testing that new users cannot successfully sign up—something broke in the most recent round of fixes. Additionally, existing users can no longer review each other.
  • 61. Rollback? • Have a rollback plan in place. Decide ahead of time what will trigger a rollback. • Is core functionality being impacted? • Will a fix (either coding it or testing it) take a significant amount of time? • How easy is it to rollback? Was everything snapshotted at the same time? Will info be lost in the meantime? • How recent is your database backup? • Are there database migrations that are either difficult, or require more time to rollback? • How long will it take to rollback vs. hotfixing? • The risk of rollback failing is equal to the risk of failure to deploy
  • 62. Scenario 1: After the latest release you discover during smoke testing that you cannot successfully purchase anything on the website. The lead dev determines it will take them a full day to fix things. You know the team has been able to snapshot everything for a stable save point across different components of the website.
  • 63. Scenario 2: After the latest software release, you discover that 3 important variables that are used to generate financial reports concerning overall revenue are being calculated at inflated values. All 3 are consistently inflated by 12.5%. The developers need a few days to fix things. To rollback would be difficult because it turns out there is a problem with your database backup.
  • 64. Hotfix? • Is it a small change? • Does it involve any dependencies? • Is it impossible to rollback? • Is core functionality being compromised?
  • 65. Scenario 1: You’re smoke testing a release and notice that the company name is spelled wrong on the homepage, in the header.
  • 66. Scenario 2: Your company just did a major release that adds a new feature to your website where users can upload photos and automatically create a slideshow for their listings in your directory. Unfortunately all the existing photos on the website are now not loading, and, users cannot upload photos either. And even though everything worked in your test environment, it also turns out that users cannot create a new listing in the directory either.
  • 68. Recipe for Disaster: • Culture of ego, devs who assume they can pull it off rather than asking thorough questions • Lack of communication about requirements • Project manager who always says yes to the client • Culture that does not adequately weigh the risks of moving too fast against the value of testing and doing things right
  • 69. Recipe for Disaster: • Testing has been outsourced overseas • Team has no diversity • Lots of third party dependencies • Lack of time for testing • When devs are expected to do most of the testing • When devs are not aware of widely accepted design standards • Party culture that is cavalier about business needs of client
  • 70. Preventing Disasters: • When in doubt, ask questions! • Involve QA in the design process • Foster interdependence vs competition • Be thorough about requirements gathering and then firm with client about not changing them for that release • Take the potential for disaster seriously. You are not immune. • If testing is going to be outsourced, up communication. Preferably, keep testing in house.
  • 71. Preventing Disasters: • Value diversity within your team. If your team lacks diversity seek other individuals to offer quick feedback. • Be diligent about integration testing around third party software. (And be clear about how they will handle bugs in their software if it leads to loss of revenue.) • QA time should be roughly equal to dev time. Really. • Even the best devs mostly test that things work. QA will actually try to break things. Don’t leave all the testing to devs. • Encourage devs to be aware of existing design standards. • Work hard, then play hard. Client’s needs are main focus.