SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
A/B Testing @ Internet Scale
Ya Xu
8/12/2014 @ Coursera
A/B Testing in One Slide
20%80%
Collect results to determine which one is better
Join now
Control Treatment
Outline
§ Culture Challenge
–  Why A/B testing
–  What to A/B test
§ Building a scalable experimentation system
§ Best practices
3
Why A/B Testing
Amazon Shopping Cart Recommendation
5
•  At Amazon, Greg Linden had this idea of showing
recommendations based on cart items
•  Trade-offs
•  Pro: cross-sell more items (increase average basket size)
•  Con: distract people from checking out (reduce conversion)
•  HiPPO (Highest Paid Person’s Opinion) : stop the project
From Greg Linden’s Blog: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html
http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
MSN Real Estate
§ “Find a house” widget variations
§ Revenue to MSN generated every time a user
clicks search/find button
6
A B
http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
Take-away
Experiments
are the only way to prove causality.
7
Use A/B testing to:
§ Guide product development
§ Measure impact (assess ROI)
§ Gain “real” customer feedback
What to A/B Test
8
Ads CTR Drop
9
Sudden drop
on 11/11/2013
Profile top ads
Root-Cause
10
5 Pixels!!
Navigation bar
Profile top ads
What to A/B Test
§ Evaluating new ideas:
–  Visual changes
–  Complete redesign of web page
–  Relevance algorithms
–  …
§ Platform changes
§ Code refactoring
§ Bug fixes
11
Test Everything!
Startups vs. Big Websites
§ Do startups have enough users to A/B test?
–  Startups typically look for larger effects
–  5% vs. 0.5% difference è 100 times more users!
§ Startups should establish A/B testing culture
early
12
A Scalable Experimentation
System
13
A/B Testing 3 Steps
14
Design
•  What/Whom to experiment on
Deploy
•  Code deployment
Analyze
•  Impact on metrics
A/B Testing Platform Architecture
1.  Experiment Management
2.  Online Infrastructure
3.  Offline Analysis
15
Example: Bing A/B
1. Experiment Management
§ Define experiments
–  Whom to target?
–  How to split traffic?
§ Start/stop an experiment
§ Important addition:
–  Define success criteria
–  Power analysis
16
2. Online Infrastructure
1)  Hash & partition: random & consistent
2)  Deploy: server-side, as a change to
–  The default configuration (Bing)
–  The default code path (LinkedIn)
3)  Data logging
17
0% 100%
Treatment1
D20% D20%
Hash (ID)
Treatment2 Control
Hash & Partition @ Scale (I)
§ Pure bucket system (Google/Bing before 200X)
18
0% 100%
Exp. 1
D20% D20%
Exp. 2 Exp. 3
60%
red green yellow
15% 15%30%
•  Does not scale
•  Traffic management
Hash & Partition @ Scale (II)
§ Fully overlapping system
0% 100%
D
Exp. 2
A2 B2 control
Exp.1
controlA1
D
B1
D
•  Each experiment gets 100% traffic
•  A user is in “all” experiments simultaneously
•  Randomization btw experiments are independent
(unique hashID)
•  Cannot avoid interaction
Hash & Partition @ Scale (III)
§ Hybrid: Layer + Domain
20
•  Centralized management (Bing)
•  Central exp. team creates/manages layers/domains
•  De-centralized management (LinkedIn)
•  Each experiment is one “layer” by default
•  Experimenter controls hashID to create a “domain”
Data Logging
§  Trigger
§  Trigger-based logging
–  Log whether a request is actually affected by the
experiment
–  Log for both factual & counter-factual
21
All LinkedIn members
300MM +
Triggered:
Members visiting
contacts page
3. Automated Offline Analysis
§  Large-scale data processing, e.g. daily @LinkedIn
–  200+ experiments
–  700+ metrics
–  Billions of experiment trigger events
§  Statistical analysis
–  Metrics design
–  Statistical significance test (p-value, confidence interval)
–  Deep-dive: slicing & dicing capability
§  Monitoring & alerting
–  Data quality
–  Early termination
22
Best Practices
23
Example: Unified Search
What to Experiment?
Measure one change at a time.
Unified Search Experiments 1+2+…N50%
En-US
Pre-unified search
50%
En-US
What to Measure?
§ Success metrics: summarize whether
treatment is better
§ Puzzling example:
–  Key metrics for Bing: number of searches &
revenue
–  Ranking bug in experiment resulted in poor search
results
–  Number of searches up +10% and revenue up
+30%
Success metrics should reflect long
term impact
Scientific Experiment Design
§ How long to run the experiment?
§ How much traffic to allocate to treatment?
Story:
§  Site speed matters
–  Bing: +100msec = -0.6% revenue
–  Amazon: +100msec = -1.0% revenue
–  Google: +100msec = -0.2% queries
§  But not for Etsy.com?
“Faster results better? … meh”
27
Power
§ Power: the chance of detecting a
difference when there really is one.
§ Two reasons your feature doesn’t move
metrics
1.  No “real” impact
2.  Not enough power
28
Properly power up your experiment!
Statistical Significance
§ Which experiment has a bigger impact?
29
Experiment 1 Experiment 2
Pageviews 1.5% 12.9%
Revenue 0.8% 2.4%
Statistical Significance
§ Which experiment has a bigger impact?
30
Experiment 1 Experiment 2
Pageviews 1.5% 12.9%
Revenue 0.8% Stat. significant 2.4%
Statistical Significance
31
§ Must consider statistical significance
–  A 12.9% delta can still be noise!
–  Identify signal from noise; focus on the “real” movers
–  Ensure results are reproducible
Experiment 1 Experiment 2
Pageviews 1.5% 12.9%
Revenue 0.8% Stat. significant 2.4%
Multiple Testing
§ Famous xkcd comic on Jelly Beans
32
Multiple Testing Concerns
§ Multiple ramps
–  Pre-decide a ramp to base decision on (e.g. 50/50)
§ Multiple “peeks”
–  Rely on “full”-week results
§ Multiple variants
–  Choose the best, then rerun to see if replicate
§ Multiple metrics
An irrelevant metric is statistically
significant. What to do?
§  Which metric?
§  How “significant”? (p-value)
34
34
All
metrics
2nd order
metrics
1st order
metrics
p-value < 0.05
p-value < 0.01
p-value < 0.001
Directly impacted by exp.
Maybe impacted by exp.
Watch out for multiple testing
With 100 metrics, how many would you see stat. significant
even if your experiment does NOTHING? 5
References
§  Tang, Diane, et al. Overlapping Experiment Infrastructure: More, Better,
Faster Experimentation. Proceedings 16th Conference on Knowledge
Discovery and Data Mining. 2010.
§  Kohavi, Ron, et al. Online Controlled Experiments at Large Scale. KDD
2013: Proceedings of the 19th ACM SIGKDD international conference on
Knowledge discovery and data mining. 2013.
§  LinkedIn blog post:
http://engineering.linkedin.com/ab-testing/xlnt-platform-driving-ab-testing-linkedin
Additional Resources: RecSys’14 A/B testing workshop
35

Contenu connexe

Tendances

Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMProduct School
 
A/B Testing Framework Design
A/B Testing Framework DesignA/B Testing Framework Design
A/B Testing Framework DesignPatrick McKenzie
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at SpotifyAli Sarrafi
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at NetflixSteve Urban
 
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingSAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingZack Notes
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B TestingJanessa Lantz
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyDanielle Jabin
 
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesOptimizely
 
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...Maria Lígia Klokner
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMProduct School
 
21 Actionable Growth Hacking Tactics
21 Actionable Growth Hacking Tactics21 Actionable Growth Hacking Tactics
21 Actionable Growth Hacking TacticsJon Yongfook
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
 
Using Your Growth Model to Drive Smarter High Tempo Testing
Using Your Growth Model to Drive Smarter High Tempo TestingUsing Your Growth Model to Drive Smarter High Tempo Testing
Using Your Growth Model to Drive Smarter High Tempo TestingSean Ellis
 
App analytics
App analyticsApp analytics
App analyticsSid Shah
 
6 Step Content Strategy
6 Step Content Strategy6 Step Content Strategy
6 Step Content StrategyZeeland Family
 
Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationOptimizely
 

Tendances (20)

Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
 
A/B Testing Framework Design
A/B Testing Framework DesignA/B Testing Framework Design
A/B Testing Framework Design
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at Spotify
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at Netflix
 
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingSAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
 
Email A/B Test
Email A/B Test Email A/B Test
Email A/B Test
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at Spotify
 
Ab testing
Ab testingAb testing
Ab testing
 
The Power of A/B Testing
The Power of A/B TestingThe Power of A/B Testing
The Power of A/B Testing
 
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
 
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PM
 
21 Actionable Growth Hacking Tactics
21 Actionable Growth Hacking Tactics21 Actionable Growth Hacking Tactics
21 Actionable Growth Hacking Tactics
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Using Your Growth Model to Drive Smarter High Tempo Testing
Using Your Growth Model to Drive Smarter High Tempo TestingUsing Your Growth Model to Drive Smarter High Tempo Testing
Using Your Growth Model to Drive Smarter High Tempo Testing
 
Go-to-Market Strategies
Go-to-Market StrategiesGo-to-Market Strategies
Go-to-Market Strategies
 
App analytics
App analyticsApp analytics
App analytics
 
6 Step Content Strategy
6 Step Content Strategy6 Step Content Strategy
6 Step Content Strategy
 
Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive Experimentation
 

Similaire à Talks@Coursera - A/B Testing @ Internet Scale

DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015
Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015
Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015Craig Sullivan
 
Agile 2014 Software Moneyball (Troy Magennis)
Agile 2014   Software Moneyball (Troy Magennis)Agile 2014   Software Moneyball (Troy Magennis)
Agile 2014 Software Moneyball (Troy Magennis)Troy Magennis
 
Making Strategic Decisions by fmr Capital One Dir. Digital PM
Making Strategic Decisions by fmr Capital One Dir. Digital PMMaking Strategic Decisions by fmr Capital One Dir. Digital PM
Making Strategic Decisions by fmr Capital One Dir. Digital PMProduct School
 
Optimizely Partner Ecosystem
Optimizely Partner EcosystemOptimizely Partner Ecosystem
Optimizely Partner EcosystemOptimizely
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test libraryNir Hartmann
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Joni Salminen
 
Ria Sankar on Building AI Products
Ria Sankar on Building AI ProductsRia Sankar on Building AI Products
Ria Sankar on Building AI ProductsRia Sankar
 
Test Case Design
Test Case DesignTest Case Design
Test Case Designacatalin
 
7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce Hygiene7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce HygieneCloudFixer
 
Data-Driven Marketing
Data-Driven MarketingData-Driven Marketing
Data-Driven MarketingPerformable
 
Surviving the hype cycle Shortcuts to split testing success
Surviving the hype cycle   Shortcuts to split testing successSurviving the hype cycle   Shortcuts to split testing success
Surviving the hype cycle Shortcuts to split testing successCraig Sullivan
 
Advanced Google Analytics #SearchFest
Advanced Google Analytics #SearchFestAdvanced Google Analytics #SearchFest
Advanced Google Analytics #SearchFestMike P.
 
Tips & Tricks for Getting Things Done Using Analytics Data
Tips & Tricks for Getting Things Done Using Analytics DataTips & Tricks for Getting Things Done Using Analytics Data
Tips & Tricks for Getting Things Done Using Analytics DataCharles Meaden
 
Designing speed with progressive enhancement
Designing speed with progressive enhancementDesigning speed with progressive enhancement
Designing speed with progressive enhancementSergeyChernyshev
 
CRO analytics - How to Continually Optimise
CRO analytics - How to Continually OptimiseCRO analytics - How to Continually Optimise
CRO analytics - How to Continually OptimisePhil Pearce
 
Google Analytics Powerups and Smartcuts
Google Analytics Powerups and Smartcuts Google Analytics Powerups and Smartcuts
Google Analytics Powerups and Smartcuts Charles Meaden
 

Similaire à Talks@Coursera - A/B Testing @ Internet Scale (20)

DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015
Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015
Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015
 
Agile 2014 Software Moneyball (Troy Magennis)
Agile 2014   Software Moneyball (Troy Magennis)Agile 2014   Software Moneyball (Troy Magennis)
Agile 2014 Software Moneyball (Troy Magennis)
 
Making Strategic Decisions by fmr Capital One Dir. Digital PM
Making Strategic Decisions by fmr Capital One Dir. Digital PMMaking Strategic Decisions by fmr Capital One Dir. Digital PM
Making Strategic Decisions by fmr Capital One Dir. Digital PM
 
It Worked for Ustream
It Worked for UstreamIt Worked for Ustream
It Worked for Ustream
 
Optimizely Partner Ecosystem
Optimizely Partner EcosystemOptimizely Partner Ecosystem
Optimizely Partner Ecosystem
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test library
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)
 
Ria Sankar on Building AI Products
Ria Sankar on Building AI ProductsRia Sankar on Building AI Products
Ria Sankar on Building AI Products
 
Test Case Design
Test Case DesignTest Case Design
Test Case Design
 
7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce Hygiene7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce Hygiene
 
Data-Driven Marketing
Data-Driven MarketingData-Driven Marketing
Data-Driven Marketing
 
Petri for kyiv.pptx
Petri for kyiv.pptxPetri for kyiv.pptx
Petri for kyiv.pptx
 
Surviving the hype cycle Shortcuts to split testing success
Surviving the hype cycle   Shortcuts to split testing successSurviving the hype cycle   Shortcuts to split testing success
Surviving the hype cycle Shortcuts to split testing success
 
Advanced Google Analytics #SearchFest
Advanced Google Analytics #SearchFestAdvanced Google Analytics #SearchFest
Advanced Google Analytics #SearchFest
 
Tips & Tricks for Getting Things Done Using Analytics Data
Tips & Tricks for Getting Things Done Using Analytics DataTips & Tricks for Getting Things Done Using Analytics Data
Tips & Tricks for Getting Things Done Using Analytics Data
 
Designing speed with progressive enhancement
Designing speed with progressive enhancementDesigning speed with progressive enhancement
Designing speed with progressive enhancement
 
CRO analytics - How to Continually Optimise
CRO analytics - How to Continually OptimiseCRO analytics - How to Continually Optimise
CRO analytics - How to Continually Optimise
 
Google Analytics Powerups and Smartcuts
Google Analytics Powerups and Smartcuts Google Analytics Powerups and Smartcuts
Google Analytics Powerups and Smartcuts
 

Dernier

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 

Dernier (20)

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 

Talks@Coursera - A/B Testing @ Internet Scale

  • 1. A/B Testing @ Internet Scale Ya Xu 8/12/2014 @ Coursera
  • 2. A/B Testing in One Slide 20%80% Collect results to determine which one is better Join now Control Treatment
  • 3. Outline § Culture Challenge –  Why A/B testing –  What to A/B test § Building a scalable experimentation system § Best practices 3
  • 5. Amazon Shopping Cart Recommendation 5 •  At Amazon, Greg Linden had this idea of showing recommendations based on cart items •  Trade-offs •  Pro: cross-sell more items (increase average basket size) •  Con: distract people from checking out (reduce conversion) •  HiPPO (Highest Paid Person’s Opinion) : stop the project From Greg Linden’s Blog: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
  • 6. MSN Real Estate § “Find a house” widget variations § Revenue to MSN generated every time a user clicks search/find button 6 A B http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
  • 7. Take-away Experiments are the only way to prove causality. 7 Use A/B testing to: § Guide product development § Measure impact (assess ROI) § Gain “real” customer feedback
  • 8. What to A/B Test 8
  • 9. Ads CTR Drop 9 Sudden drop on 11/11/2013 Profile top ads
  • 11. What to A/B Test § Evaluating new ideas: –  Visual changes –  Complete redesign of web page –  Relevance algorithms –  … § Platform changes § Code refactoring § Bug fixes 11 Test Everything!
  • 12. Startups vs. Big Websites § Do startups have enough users to A/B test? –  Startups typically look for larger effects –  5% vs. 0.5% difference è 100 times more users! § Startups should establish A/B testing culture early 12
  • 14. A/B Testing 3 Steps 14 Design •  What/Whom to experiment on Deploy •  Code deployment Analyze •  Impact on metrics
  • 15. A/B Testing Platform Architecture 1.  Experiment Management 2.  Online Infrastructure 3.  Offline Analysis 15 Example: Bing A/B
  • 16. 1. Experiment Management § Define experiments –  Whom to target? –  How to split traffic? § Start/stop an experiment § Important addition: –  Define success criteria –  Power analysis 16
  • 17. 2. Online Infrastructure 1)  Hash & partition: random & consistent 2)  Deploy: server-side, as a change to –  The default configuration (Bing) –  The default code path (LinkedIn) 3)  Data logging 17 0% 100% Treatment1 D20% D20% Hash (ID) Treatment2 Control
  • 18. Hash & Partition @ Scale (I) § Pure bucket system (Google/Bing before 200X) 18 0% 100% Exp. 1 D20% D20% Exp. 2 Exp. 3 60% red green yellow 15% 15%30% •  Does not scale •  Traffic management
  • 19. Hash & Partition @ Scale (II) § Fully overlapping system 0% 100% D Exp. 2 A2 B2 control Exp.1 controlA1 D B1 D •  Each experiment gets 100% traffic •  A user is in “all” experiments simultaneously •  Randomization btw experiments are independent (unique hashID) •  Cannot avoid interaction
  • 20. Hash & Partition @ Scale (III) § Hybrid: Layer + Domain 20 •  Centralized management (Bing) •  Central exp. team creates/manages layers/domains •  De-centralized management (LinkedIn) •  Each experiment is one “layer” by default •  Experimenter controls hashID to create a “domain”
  • 21. Data Logging §  Trigger §  Trigger-based logging –  Log whether a request is actually affected by the experiment –  Log for both factual & counter-factual 21 All LinkedIn members 300MM + Triggered: Members visiting contacts page
  • 22. 3. Automated Offline Analysis §  Large-scale data processing, e.g. daily @LinkedIn –  200+ experiments –  700+ metrics –  Billions of experiment trigger events §  Statistical analysis –  Metrics design –  Statistical significance test (p-value, confidence interval) –  Deep-dive: slicing & dicing capability §  Monitoring & alerting –  Data quality –  Early termination 22
  • 25. What to Experiment? Measure one change at a time. Unified Search Experiments 1+2+…N50% En-US Pre-unified search 50% En-US
  • 26. What to Measure? § Success metrics: summarize whether treatment is better § Puzzling example: –  Key metrics for Bing: number of searches & revenue –  Ranking bug in experiment resulted in poor search results –  Number of searches up +10% and revenue up +30% Success metrics should reflect long term impact
  • 27. Scientific Experiment Design § How long to run the experiment? § How much traffic to allocate to treatment? Story: §  Site speed matters –  Bing: +100msec = -0.6% revenue –  Amazon: +100msec = -1.0% revenue –  Google: +100msec = -0.2% queries §  But not for Etsy.com? “Faster results better? … meh” 27
  • 28. Power § Power: the chance of detecting a difference when there really is one. § Two reasons your feature doesn’t move metrics 1.  No “real” impact 2.  Not enough power 28 Properly power up your experiment!
  • 29. Statistical Significance § Which experiment has a bigger impact? 29 Experiment 1 Experiment 2 Pageviews 1.5% 12.9% Revenue 0.8% 2.4%
  • 30. Statistical Significance § Which experiment has a bigger impact? 30 Experiment 1 Experiment 2 Pageviews 1.5% 12.9% Revenue 0.8% Stat. significant 2.4%
  • 31. Statistical Significance 31 § Must consider statistical significance –  A 12.9% delta can still be noise! –  Identify signal from noise; focus on the “real” movers –  Ensure results are reproducible Experiment 1 Experiment 2 Pageviews 1.5% 12.9% Revenue 0.8% Stat. significant 2.4%
  • 32. Multiple Testing § Famous xkcd comic on Jelly Beans 32
  • 33. Multiple Testing Concerns § Multiple ramps –  Pre-decide a ramp to base decision on (e.g. 50/50) § Multiple “peeks” –  Rely on “full”-week results § Multiple variants –  Choose the best, then rerun to see if replicate § Multiple metrics
  • 34. An irrelevant metric is statistically significant. What to do? §  Which metric? §  How “significant”? (p-value) 34 34 All metrics 2nd order metrics 1st order metrics p-value < 0.05 p-value < 0.01 p-value < 0.001 Directly impacted by exp. Maybe impacted by exp. Watch out for multiple testing With 100 metrics, how many would you see stat. significant even if your experiment does NOTHING? 5
  • 35. References §  Tang, Diane, et al. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings 16th Conference on Knowledge Discovery and Data Mining. 2010. §  Kohavi, Ron, et al. Online Controlled Experiments at Large Scale. KDD 2013: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013. §  LinkedIn blog post: http://engineering.linkedin.com/ab-testing/xlnt-platform-driving-ab-testing-linkedin Additional Resources: RecSys’14 A/B testing workshop 35