In today’s digital economy, consumers expect to buy everything online, even a purchase as significant as a car. Cox Automotive, parent company to brands including Autotrader and Kelley Blue Book, has more than 40,000 auto dealer clients across five continents, and is enabling them to keep up with consumer behavior and insights by implementing an enterprise wide experimentation program. Cox is bridging the gap between consumers, manufacturers, dealers, and lenders at every stage of the automotive experience.
In this session, you’ll hear how this multi-brand, international, matrixed organization has built an enterprise experimentation program that is flexible enough to adapt to various business models and modern-day demands for speed while maintaining testing best practices, r
Testing Across the Enterprise: How Cox Automotive Scales Experimentation to Multiple Brands
1. Congratulations to our
Outperform Award Winners!
Most Dramatic
Business Impact
Most Customer-
Obsessed Company
Culture
Most Transformative
Innovation
Most Inspiring Social
Impact
2. 2
Sessions today will be recorded and will be available after
Opticon
Join the conversation on Twitter at #Opticon19
Like what you’ve seen today? Give feedback and rate
sessions on the mobile app
7. HISTORY Cox Automotive
Cox Enterprises
The foundation of Cox
Enterprises starts with the
purchase of the Dayton Daily
News by Governor James M.
Cox
Manheim Auto
Auction
The first step in the evolution of
Cox Automotive began with the
acquisition of Manheim Auto
Auction which now has more
than 100 locations worldwide
Autotrader.com
Autotrader.com was launched
as an online vehicle classifieds
site
Kelley Blue Book
Autotrader.com acquired Kelley
Blue Book along with vAuto
and HomeNet
Autotrader Group
Autotrader.com acquires
VinSolutions and the
Autotrader Group forms
Manheim Expands
Manheim acquires DealShield,
NextGear Capital, and Ready
Auto Transport
Cox Automotive
Cox Automotive forms as a
division of Cox Enterprises, Inc.
Transforming the way the world buys, owns, sells and uses cars
1898
1968
1999
2010
2011 2014
2012
12. 12
Start Date
e.g. Sept 1
End Date
e.g. Sept 30
IDEATE DESIGN BUILD
SETUP &
QA
TEST ANALYZE DEPLOYdiscussion | … … testing happens … … | deploymenttesting
THE CHRONOLOGY OF A TEST…
13. Start Date
e.g. Sept 1
End Date
e.g. Sept 30
Ideate Design Build
Setup & QA
Test Analyze Deploy
THE CHRONOLOGY OF A TEST…
14. Start Date
e.g. Sept 1
End Date
e.g. Sept 30
Inefficiency
Setup & QA
THE CHRONOLOGY OF A TEST…
Ideate Design Build
Setup & QA
Test Analyze Deploy
15. Start Date
e.g. Sept 1
End Date
e.g. Sept 30
Setup & QA
How engineers define “time to test”
How analysts define “time to test”
How leadership defines “time to test”
THE CHRONOLOGY OF A TEST…
Ideate Design Build Test Analyze Deploy
16. Start Date
e.g. Sept 1
End Date
e.g. Sept 30
How can you go from this…
To something like this?
Start Date
e.g. Sept 1
NEW End Date
e.g. Sept 10
17. 17
Ways To Shrink The Phases And The Gaps Between Them…
1. Ideate
2. Design
3. Build
4. Setup & QA
5. Test
6. Analyze
Agile Development, Full-Stack testing w/ Feature Flags
Stats Engine, Test Results Dashboard, Templates
Accelerated Learning/Impact
RACI, Workflow, Previewable Test Experiences
Client-side Testing, Single Testing Tool
Test Plan, Learning Plan
Scoring, Democratization of Ideas, Central Repo
7. Deploy
18. 18
Ways To Shrink The Phases And The Gaps Between Them…
1. Ideate
2. Design
3. Build
4. Setup & QA
5. Test
6. Analyze
Agile Development, Full-Stack testing w/ Feature Flags
Stats Engine, Test Results Dashboard, Templates
Accelerated Learning/Impact
RACI, Workflow, Previewable Test Experiences
Client-side Testing, Single Testing Tool
7. Deploy
Test Plan, Learning Plan
Scoring, Democratization of Ideas, Central Repo
19. 19
“If I had an hour to
solve a problem, I’d
spend 55 minutes
thinking about the
problem and 5
minutes thinking
about solutions.”
20. 20
TEST DESIGN TEST STRATEGY
ORIGINAL
EXPERIENCE
Test 1
Test 2
Test 3 Test 7
Test 5
Test 8
Test 4
Test 6
Test 1
Test 2
Test 3
vs.
ORIGINAL EXPERIENCE
21. 21
Ways To Shrink The Phases And The Gaps Between Them…
1. Ideate
2. Design
3. Build
4. Setup & QA
5. Test
6. Analyze Stats Engine, Test Results Dashboard, Templates
Accelerated Learning/Impact
RACI, Workflow, Previewable Test Experiences
Client-side Testing, Single Testing Tool
Test Plan, Learning Plan
Scoring, Democratization of Ideas, Central Repo
Agile Development, Full-Stack testing w/ Feature Flags7. Deploy
22. 22
SHRINKING THE “TESTING” AND “ANALYSING” PHASES
With the help of Optimizely’s Stats Engine, we get answers ASAP
The test results indicated that changing the CTA would lift clicks on that button (per relevant visitor) from ~1.15% to
~1.82% (confidence interval between ~1.59% and 2.53%)
23. 23
SHRINKING THE “TESTING” AND “ANALYSING” PHASES
With the help of Optimizely’s Stats Engine, we get answers ASAP
The test results indicated that changing the
CTA would lift clicks on that button (per
relevant visitor) from ~1.15% to ~1.82%
(confidence interval between ~1.59% and
2.53%)
24. 24
POST-LAUNCH VALIDATION
The test results indicated that changing the CTA would lift clicks on that button (per relevant visitor)
from ~1.15% to ~1.82% (confidence interval between ~1.59% and 2.53%)
The winning variant was deployed 100% to production on March 1. A quick post-launch validation of
this deployment showed an avg. CTR of ~1.89%
25. 25
SHRINKING THE “TESTING” PHASE
With the help of Optimizely’s Accelerated Learnings, our Average test duration has been cut in half
TestDuration–inDays
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95
Average Test Duration
26. 26
SHORTER TESTS = MORE TESTS IN THE SAME AMOUNT OF TIME
On pace to run 70% more tests this year
0
2
4
6
8
10
12
January February March April May June July August September October November December
Autotrader Tests
2017 2018 2019
27. 27
TEMPLATIZING APPROACH TO SHORTEN ANALYSIS
Shorter time to analyze = More learning in the same amount of time
Key Paths of
navigation
29. 29
UX Changes Total Revenue Impact
Subpage revenue
Subpages
$
$
$
Shorter time to analyze = More learning in the same amount of time
TEMPLATIZING APPROACH TO SHORTEN ANALYSIS
31. 31
KNOWING WHEN TO TEST
When the clarity of a pre/post just won’t be enough
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Page Conversion
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Page Conversion
Week 1 Week 2
Pre Post
Experiment w/ concurrent
control and test groups
32. 32
KNOWING WHEN TO TEST
Types of Tests
Discovery and Light-
Weight Prototyping
• Used to very quickly
answer whether an idea
has legs – or to help size
up the opportunity
• Common types of tests
include: Fake door, A/B,
Usability, Focus Group
Optimization
• Used to compare multiple
design variants to
determine which has
optimal performance
towards a given goal
• Common types of tests
include: A/B or A/B/n
tests
De-risking or Validation
• Used to determine how
functionality that has already
been developed will perform
– generally as a means for
ensuring that the new
product or design performs
as expected
• Also helpful for
retrospectives (looking back
over a given time period to
assess what has had the
most impact)
• Common types of tests
include: A/B or Multi-variate
tests
Blue Sky
• Used to answer
heavy business
questions – often,
the thing being
tested would never
be deployed
• Intended to help
shape up
hypotheticals and
limit speculation in
strategy
33. 33
Discovery and Light-
Weight Prototyping
• Used to very quickly
answer whether an idea
has legs – or to help size
up the opportunity
• Common types of tests
include: Fake door, A/B,
Usability, Focus Group
De-risking or Validation
• Used to determine how
functionality that has already
been developed will perform
– generally as a means for
ensuring that the new
product or design performs
as expected
• Also helpful for
retrospectives (looking back
over a given time period to
assess what has had the
most impact)
• Common types of tests
include: A/B or Multi-variate
tests
Blue Sky
• Used to answer
heavy business
questions – often,
the thing being
tested would never
be deployed
• Intended to help
shape up
hypotheticals and
limit speculation in
strategy
KNOWING WHEN TO TEST
Types of Tests
Optimization
• Used to compare multiple
design variants to
determine which has
optimal performance
towards a given goal
• Common types of tests
include: A/B or A/B/n
tests
34. 34
Discovery and Light-
Weight Prototyping
• Used to very quickly
answer whether an idea
has legs – or to help size
up the opportunity
• Common types of tests
include: Fake door, A/B,
Usability, Focus Group
De-risking or Validation
• Used to determine how
functionality that has already
been developed will perform
– generally as a means for
ensuring that the new
product or design performs
as expected
• Also helpful for
retrospectives (looking back
over a given time period to
assess what has had the
most impact)
• Common types of tests
include: A/B or Multi-variate
tests
Blue Sky
• Used to answer
heavy business
questions – often,
the thing being
tested would never
be deployed
• Intended to help
shape up
hypotheticals and
limit speculation in
strategy
KNOWING WHEN TO TEST
Types of Tests
Optimization
• Used to compare multiple
design variants to
determine which has
optimal performance
towards a given goal
• Common types of tests
include: A/B or A/B/n
tests
35. 35
Discovery and Light-
Weight Prototyping
• Used to very quickly
answer whether an idea
has legs – or to help size
up the opportunity
• Common types of tests
include: Fake door, A/B,
Usability, Focus Group
De-risking or Validation
• Used to determine how
functionality that has already
been developed will perform
– generally as a means for
ensuring that the new
product or design performs
as expected
• Also helpful for
retrospectives (looking back
over a given time period to
assess what has had the
most impact)
• Common types of tests
include: A/B or Multi-variate
tests
Blue Sky
• Used to answer
heavy business
questions – often,
the thing being
tested would never
be deployed
• Intended to help
shape up
hypotheticals and
limit speculation in
strategy
KNOWING WHEN TO TEST
Types of Tests
Optimization
• Used to compare multiple
design variants to
determine which has
optimal performance
towards a given goal
• Common types of tests
include: A/B or A/B/n
tests
36. 36
Uncertainty Certainty
High Level
of Effort
Low Level
of Effort
“Just do” launch with
no Pre/Post Analysis
“Just do” launch with
Pre/Post Analysis
Optimization
A/B Tests
Blue Sky Testing with
Iterative Learning Plan
De-Risking
and
Validation
Testing
Discovery and
Light-Weight
Prototyping
KNOWING WHEN TO TEST
When the (un)certainty of a pre/post just won’t be enough
38. 38
PERSONAS
Product
Focused on ideation,
speed to market,
performance
improvements
Testing Analytics
Focused on tagging,
metrics, statistically valid
insights, speed to insights
Leadership
Focused on speed to market,
business impact, scaling
across enterprise
Engineering
Focused on build stages,
technical issues, latency, site
performance
50. 50
De-risking to Blue Sky
A/B Test #2
HP Redesign:
Control Challenger A
A/B Test #n
HP Redesign:
Control Challenger A
Clicks to main KPI increased
+7%
52. 52
Example of a Blue Sky
Sponsored Search Links
Sponsorships
Sponsored Hero
Image and Search Links
Advertisement
53. 53
Example of a Blue Sky
Sponsored Search Links
Sponsorships
Sponsored Hero
Image and Search Links
Advertisement
Testing in Progress…
54. 54
Example of a Blue Sky
Sponsored Search Links
Sponsorships
Sponsored Hero
Image and Search Links
Advertisement
Removing ads from the site
created a 15.1% lift in both
value events and VDPs
55. Man, all this
extra work that
we’ll never
launch…
Let’s bring
Engineering
along…
ENGINEERING PRODUCT & ANALYTICS
57. 57
Creating an Enterprise Testing Program
High-Level Learnings
What HAS Worked?
• Speed to market:
• developing test and learning plans
• decrease of analysis
time through Optimizely's Stats Engine
• decrease of testing time through Optimizely's
Accelerated Learnings
• Leveraging larger, more experienced testing
programs and personnel
• Funding
• Workflows and RACI
• Pro-bono support
• Quarterly Summit
What HAS NOT Worked?
• Skipping testing in favor of “just
do” mentality
• Jumping straight to cross-brand testing
• Federating access
to individual teams who don’t embrace basi
c best practices
• Over-reliance on Analytics to support
all aspects of the test