My first deck showing all the common mistakes and screwups that plague AB testers. Learn how to avoid the problems of biased tests, broken recording and baffling results - be in control of your testing accuracy and great lifts!
2. Top Fuckups for 2013
1. Testing in the wrong place
2. Your hypothesis inputs are crap
3. No analytics integration
4. Your test will finish after you die
5. Not testing for long enough
6. No QA for your split test
7. Opportunities are not prioritised
8. Testing cycles are too slow
9. Your test fails
10. The result is ‘about the same’
11. Test flips or moves around
12. Nobody ‘feels’ the test
13. You forgot you were responsive
@OptimiseOrDie
3. @OptimiseOrDie
• UX and Analytics (1999)
• User Centred Design (2001)
• Agile, Startups, No budget (2003)
• Funnel optimisation (2004)
• Multivariate & A/B (2005)
• Conversion Optimisation (2005)
• Persuasive Copywriting (2006)
• Joined Twitter (2007)
• Lean UX (2008)
• Holistic Optimisation (2009)
Was : Group eBusiness Manager, Belron
Now : Consulting
6. #1 : You’re doing it in the wrong place
@OptimiseOrDie
7. #1 : You’re doing it in the wrong place
There are 4 areas a CRO expert always looks at:
1. Inbound attrition (medium, source, landing page, keyword,
intent and many more…)
2. Key conversion points (product, basket, registration)
3. Processes and steps (forms, logins, registration, checkout)
4. Layers of engagement (search, category, product, add)
1.
2.
3.
4.
Use visitor flow reports for attrition – very useful.
For key conversion points, look at loss rates & interactions
Processes and steps – look at funnels or make your own
Layers and engagement – make a model
Let’s look at an example I’ve used recently
@OptimiseOrDie
12. 6.3 – Within a layer
Exit
Page 1
Page 3
Page 2
Page 4
Wishlist
Contact
Page 5
Email
Like
Deeper
Layer
@OptimiseOrDie
Micro
Conversions
13. #1 : You’re doing it in the wrong place
• Get to know the flow and loss (leaks) inbound, inside and
through key processes or conversion points.
• Once you know the key steps you’re losing people at and
how much traffic you have – make a money model.
• Let’s say 1,000 people see the page a month. Of those,
20% (200) convert to checkout.
• Estimate the influence your test can bring. How much
money or KPI improvement would a 10% lift in the
checkouts deliver?
• Congratulations – you’ve now built the worlds first IT
plan with a return on investment estimate attached!
• I’ll talk more about prioritising later – but a good real
world analogy for you to use:
@OptimiseOrDie
14. Think like a
store owner!
If you can’t refurbish
the entire store,
which floors or
departments will you
invest in optimising?
Wherever there is:
•
•
•
@OptimiseOrDie
Footfall
Low return
Opportunity
15. #2 : Your hypothesis inputs are all wrong
Insight - Inputs
Opinion
Cherished
notions
Marketing
whims
Cosmic rays
Not ‘on
brand’
enough
Ego
IT
inflexibility
Panic
Internal
company
needs
#FAIL
Competitor
change
An article
the CEO
read
Some
dumbass
consultant
Competitor
copying
Dice rolling
Guessing
Knee jerk
reactons
Shiny
feature
blindness
@OptimiseOrDie
16. #2 : These are the inputs you need…
Insight - Inputs
Usability
testing
Forms
analytics
Search
analytics
Voice of
Customer
Market
research
Eye tracking
Customer
contact
A/B and
MVT testing
Big &
unstructured
data
Insight
Social
analytics
Session
Replay
Web
analytics
Segmentation
Sales and
Call Centre
Surveys
Customer
services
Competitor
evals
@OptimiseOrDie
17. #2 : Solutions
• You need multiple tool inputs
– Tool decks are here : www.slideshare.net/sullivac
• Usability testing and User facing teams
– If you’re not using these properly, you’re hosed
• Session replay tools provide vital input
– Get vital additional customer evidence
• Simple page Analytics don’t cut it
– Invest in your analytics, especially event tracking
• Ego, Opinion, Cherished notions – fill gaps
– Fill these vacuums with insights and data
• Champion the user
– Give them a chair at every meeting
@OptimiseOrDie
18. #3 : No analytics integration
•
•
•
•
•
•
Investigating problems with tests
Segmentation of results
Tests that fail, flip or move around
Tests that don’t make sense
Broken test setups
What drives the averages you see?
@OptimiseOrDie
19. These
We still keep
Danish porn
watching our
sites are so
old AB tests in
hardcore!
retirement
• Use a test length calculator like this one:
#4 : The test will finish after you die
• visualwebsiteoptimizer.com/ab-split-test-duration/
20. #5 : You don’t test for long enough
• The minimum length
–
–
–
–
–
2 business cycles (comparison)
Always test ‘whole’ not partial cycles
Don’t self stop!
Usually a week, 2 weeks, Month
Be aware of multiple cycles
• How long after that
–
–
–
–
–
–
–
–
95% confidence or higher is my aim – and often hit higher than this
I aim for a minimum 250 outcomes, ideally 350+ for each ‘creative’
If you test 4 recipes, that’s 1400 outcomes needed
You should have worked out how long each batch of 350 needs before you start!
If you segment, you’ll need more data
It may need a bigger sample if the response rates are similar*
Use a test length calculator but be aware of minimums
Important insider tip – watch the error bars! The +/- stuff – let’s explain
* Stats geeks know I’m glossing over something here. That test time depends on how the two
experiments separate in terms of relative performance as well as how volatile the test
response is. I’ll talk about this when I record this one! This is why testing similar stuff sux. 0
2
21. #2 : The tennis court
– Let’s say we want to estimate, on average, what height Roger Federer
and Nadal hit the ball over the net at. So, let’s start the match:
@OptimiseOrDie
22. First Set Federer 6-4
– We start to collect values
63.5cm
+/- 2cm
62cm
+/- 2cm
@OptimiseOrDie
23. Second Set – Nadal 7-6
– Nadal starts sending them low over the net
62.5cm
+/- 1cm
62cm
+/- 1cm
@OptimiseOrDie
24. Final Set Nadal 7-6
– We start to collect values
62cm
+/- .3cm
61.8cm
+/- .3cm
25. Let’s look at this a different way
9.1 ± 0.3%
62.5cm
+/- 1cm
@OptimiseOrDie
27. #5 : Summary
• The minimum length:
–
–
–
–
2 business cycles minimum, regardless of outcomes
250+, prefer 350+ outcomes in each
95%+ confidence
Error bar separation between creatives
• Pay attention to:
–
–
–
–
–
Time it will take for the number of ‘recipes’ in the test
The actual footfall to the test – not sitewide numbers
Test results that don’t separate – makes the test longer
This is why you need brave tests – to drive difference
The error bars – the numbers in your AB testing tool are not precise –
they’re fuzzy regions that depend on response and sample size.
– Sudden changes in test performance or response
– Monitor early tests like a chef!
@OptimiseOrDie
29. #6 : What QA testing should I do?
•
•
•
•
•
•
•
Cross Browser Testing
Testing from several locations (office, home, elsewhere)
Testing the IP filtering is set up
Test tags are firing correctly (analytics and the test tool)
Test as a repeat visitor and check session timeouts
Cross check figures from 2+ sources
Monitor closely from launch, recheck, watch
@OptimiseOrDie
30. #7 : Opportunities are not prioritised
Once you have a list of
potential test areas, rank them
by opportunity vs. effort.
The common ranking metrics I
use include:
•Opportunity (profit, revenue)
•Dev resource
•Time to market
•Risk / Complexity
Make yourself a quadrant
diagram and plot them
31. #8 : Your cycles are too slow
Conversion
0
6
12
18
Months
@OptimiseOrDie
32. #8 : Solutions
• Give Priority Boarding for opportunities
– The best seats reserved for metric shifters
• Release more often to close the gap
– More testing resource helps, analytics ‘hawk eye’
• Kaizen – continuous improvement
– Others call it JFDI (just f***ing do it)
• Make changes AS WELL as tests, basically!
– These small things add up
• RUSH Hair booking – Over 100 changes
– No functional changes at all – 37% improvement
• Inbetween product lifecycles?
– The added lift for 10 days work, worth 360k
@OptimiseOrDie
35. #9 : Your test fails
• Learn from the failure! If you can’t learn from the failure, you’ve
designed a crap test.
• Next time you design, imagine all your stuff failing. What would
you do? If you don’t know or you’re not sure, get it changed so
that a negative becomes insightful.
• So : failure itself at a creative or variable level should tell you
something.
• On a failed test, always analyse the segmentation and analytics
• One or more segments will be over and under
• Check for varied performance
• Now add the failure info to your Knowledge Base:
• Look at it carefully – what does the failure tell you? Which
element do you think drove the failure?
• If you know what failed (e.g. making the price bigger) then you
have very useful information
• You turned the handle the wrong way
• Now brainstorm a new test
@OptimiseOrDie
36. #10 : The test is ‘about the same’
•
•
•
•
•
Analyse the segmentation
Check the analytics and instrumentation
One or more segments may be over and under
They may be cancelling out – the average is a lie
The segment level performance will help you (beware of
small sample sizes)
• If you genuinely have a test which failed to move any
segments, it’s a crap test – be bolder
• This usually happens when it isn’t bold or brave enough in
shifting away from the original design, particularly on
lower traffic sites
• Get testing again!
@OptimiseOrDie
37. #11 : The test keeps moving around
• There are three reasons it is moving around
– Your sample size (outcomes) is still too small
– The external traffic mix, customers or reaction has
suddenly changed or
– Your inbound marketing driven traffic mix is
completely volatile (very rare)
•
•
•
•
Check the sample size
Check all your marketing activity
Check the instrumentation
If no reason, check segmentation
@OptimiseOrDie
38. #11 : The test has flipped on me
•
•
•
•
•
Something like this can happen:
Check your sample size. If it’s still small, then expect this until the test settles.
If the test does genuinely flip – and quite severely – then something has changed with
the traffic mix, the customer base or your advertising. Maybe the PPC budget ran
out? Seriously!
To analyse a flipped test, you’ll need to check your segmented data. This is why you
have a split testing package AND an analytics system.
The segmented data will help you to identify the source of the shift in response to your
test. I rarely get a flipped one and it’s always something changing on me, without
being told. The heartless bastards.
39. #12 : Nobody feels the test
•
•
•
•
•
•
•
•
You promised a 25% rise in checkouts - you only see 2%
Traffic, Advertising, Marketing may have changed
Check they’re using the same precise metrics
Run a calibration exercise
I often leave a 5 or 10% stub running in a test
This tracks old creative once new one goes live
If conversion is also down for that one, BINGO!
Remember – the AB test is an estimate – it doesn’t
precisely record future performance
• This is why infrequent testing is bad
• Always be trying a new test instead of basking in the
glory of one you ran 6 months ago. You’re only as good
as your next test.
@OptimiseOrDie
40. #13 : You forgot you were responsive
•
•
•
•
•
•
•
•
If you’re AB testing a responsive site, pay attention
Content will break differently on many screens
Know thy users and their devices
Use bango or google analytics to define a test list
Make sure you test mobile devices & viewports
What looks good on your desk may not be for the user
Harder to design cross device tests
You’ll need to segment mobile, tablet & desktop
response in the analytics or AB testing package
• Your personal phone is not a device mix
@OptimiseOrDie
41. Top Fuckups for 2013
1. Testing in the wrong place
2. Your hypothesis inputs are crap
3. No analytics integration
4. Your test will finish after you die
5. Not testing for long enough
6. No QA for your split test
7. Opportunities are not prioritised
8. Testing cycles are too slow
9. Your test fails
10. The result is ‘about the same’
11. Test flips or moves around
12. Nobody ‘feels’ the test
13. You forgot you were responsive
@OptimiseOrDie
42. BONUS : What is a good conversion
rate?
Higher than the one
you had last month!
42
43. Is there a way to fix this then?
Conversion
Heroes!
43
@OptimiseOrDie
And here’s a boring slide about me – and where I’ve been driving over 400M of additional revenue in the last few years. In two months this year alone, I’ve found an additional ¾ M pounds annual profit for clients. For the sharp eyed amongst you, you’ll see that Lean UX hasn’t been around since 2008. Many startups and teams were doing this stuff before it got a new name, even if the approach was slightly different. For the last 4 years, I’ve been optimising sites using the combination of techniques I’ll show you today.
Tomorrow - Go forth and kick their flabby low converting asses