SlideShare une entreprise Scribd logo
1  sur  11
Why Didn’t My Feature Improve
              the Metric?
                                  Ya Xu
           Based on two papers (KDD’2012 and WSDM’2013) with
                  Ronny Kohavi, Alex Deng, Toby Walker,
                    Brian Frasca and Roger Longbotham


Experimentation Panel 3/20/2013
What Metric?
• Overall Evaluation Criterion (OEC): metric(s) used
  to decide whether A or B is better.
• Long term goal for        : query share & revenue
• Puzzling outcome:
   – Ranking bug in an experiment resulted in very poor
     search results
   – Query up +10% and revenue up +30%
   – What should a search engine use as OEC?
• We use Sessions-Per-User.
REASON #1


The feature just wasn’t as good as you thought…

We are poor at assessing the value of ideas.
Jim Manzi: “Google ran approximately 12,000 randomized
experiment in 2009, with [only] about 10% of these
leading to business changes.”
REASON #2: CARRYOVER EFFECT
Background
• Puzzling outcome:
  – Several experiments showed surprising results
  – Reran and effects disappeared
  – Why?
• Bucket system (Bing/Google/Yahoo)
  – Assign users into buckets, then assign buckets to
    experiments.
  – Buckets are reused from one experiment to next.
Carryover Effect
• Explanation:
  – bucket system recycles users; prior experiment
    had carryover effects
  – Effects last for months
• Solution:
  – Run A/A test                start   end

  – Local Re-randomization
REASON #3: STATISTICAL
SENSITIVITY
Background
• Performance matters
  – Bing: +100msec = -0.6% revenue
  – Amazon: +100msec = -1% revenue
  – Google: +100msec = -0.2% query
• But not for Etsy.com?
          “faster results better? Meh”

Insensitive experimentation can lead to wrong
conclusion that a feature has no impact.
How to Achieve Better Sensitivity?
1. Get more users
2. Run longer experiments:
  – We recruit users continuously.
  – Longer experiment = more users = more power?
  – Wrong! This doesn’t always get us more power
3. CUPED
   Controlled Experiments Using Pre-Experiment Data
   Confidence interval for Sessions-
   per-User doesn’t shrink over a
   month!
CUPED
• Currently live in     ’s experiment system
• Allows for running experiments with
   – Half the users, or
   – Half the duration
• Leveraging pre-exp data to improve sensitivity
• Intuition: mixture model



  total variance
      = between-group variance + within-group variance
• One top reason not discussed:
  Instrumentation bugs
• For more insights, check out our papers
  (KDD’2012 and WSDM’2013) or find me at the
  networking session

Contenu connexe

En vedette

Groupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The HiveGroupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The HiveThe Hive
 
Susheel Patel, Pivotal_Hadoop&SQL
Susheel Patel, Pivotal_Hadoop&SQLSusheel Patel, Pivotal_Hadoop&SQL
Susheel Patel, Pivotal_Hadoop&SQLThe Hive
 
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...The Hive
 
My magazine edited
My magazine editedMy magazine edited
My magazine editedsofiamorana1
 
Chictopia for Mobile & Social Commerce panel discussion
Chictopia for Mobile & Social Commerce panel discussionChictopia for Mobile & Social Commerce panel discussion
Chictopia for Mobile & Social Commerce panel discussionThe Hive
 
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013The Hive
 
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonNotes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonThe Hive
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventThe Hive
 
Redbook
RedbookRedbook
Redbookens007
 
Pre production planning
Pre production planningPre production planning
Pre production planningsofiamorana1
 
Alan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLAlan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLThe Hive
 
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...The Hive
 
Leanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The HiveLeanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The HiveThe Hive
 
1.nigam shah stanford_meetup
1.nigam shah stanford_meetup1.nigam shah stanford_meetup
1.nigam shah stanford_meetupThe Hive
 
Untethered health in a networked society by James Mathews
Untethered health in a networked society by James MathewsUntethered health in a networked society by James Mathews
Untethered health in a networked society by James MathewsThe Hive
 

En vedette (17)

La musica
La musicaLa musica
La musica
 
Groupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The HiveGroupon_Controlled Experimentation_Panel_The Hive
Groupon_Controlled Experimentation_Panel_The Hive
 
Susheel Patel, Pivotal_Hadoop&SQL
Susheel Patel, Pivotal_Hadoop&SQLSusheel Patel, Pivotal_Hadoop&SQL
Susheel Patel, Pivotal_Hadoop&SQL
 
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
 
My magazine edited
My magazine editedMy magazine edited
My magazine edited
 
Chictopia for Mobile & Social Commerce panel discussion
Chictopia for Mobile & Social Commerce panel discussionChictopia for Mobile & Social Commerce panel discussion
Chictopia for Mobile & Social Commerce panel discussion
 
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
 
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonNotes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
 
Redbook
RedbookRedbook
Redbook
 
Pre production planning
Pre production planningPre production planning
Pre production planning
 
Alan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQLAlan Gates, Hortonworks_Hadoop&SQL
Alan Gates, Hortonworks_Hadoop&SQL
 
San martin 2013 2014
San martin 2013 2014San martin 2013 2014
San martin 2013 2014
 
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
 
Leanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The HiveLeanplum_Controlled Experimentation_Panel_The Hive
Leanplum_Controlled Experimentation_Panel_The Hive
 
1.nigam shah stanford_meetup
1.nigam shah stanford_meetup1.nigam shah stanford_meetup
1.nigam shah stanford_meetup
 
Untethered health in a networked society by James Mathews
Untethered health in a networked society by James MathewsUntethered health in a networked society by James Mathews
Untethered health in a networked society by James Mathews
 

Plus de The Hive

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie MuirheadThe Hive
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...The Hive
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the EnterpriseThe Hive
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...The Hive
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell AutomationThe Hive
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroThe Hive
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 

Plus de The Hive (20)

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
 
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve Omohundro
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at Twitter
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 

Bing_Controlled Experimentation_Panel_The Hive

  • 1. Why Didn’t My Feature Improve the Metric? Ya Xu Based on two papers (KDD’2012 and WSDM’2013) with Ronny Kohavi, Alex Deng, Toby Walker, Brian Frasca and Roger Longbotham Experimentation Panel 3/20/2013
  • 2. What Metric? • Overall Evaluation Criterion (OEC): metric(s) used to decide whether A or B is better. • Long term goal for : query share & revenue • Puzzling outcome: – Ranking bug in an experiment resulted in very poor search results – Query up +10% and revenue up +30% – What should a search engine use as OEC? • We use Sessions-Per-User.
  • 3. REASON #1 The feature just wasn’t as good as you thought… We are poor at assessing the value of ideas. Jim Manzi: “Google ran approximately 12,000 randomized experiment in 2009, with [only] about 10% of these leading to business changes.”
  • 5. Background • Puzzling outcome: – Several experiments showed surprising results – Reran and effects disappeared – Why? • Bucket system (Bing/Google/Yahoo) – Assign users into buckets, then assign buckets to experiments. – Buckets are reused from one experiment to next.
  • 6. Carryover Effect • Explanation: – bucket system recycles users; prior experiment had carryover effects – Effects last for months • Solution: – Run A/A test start end – Local Re-randomization
  • 8. Background • Performance matters – Bing: +100msec = -0.6% revenue – Amazon: +100msec = -1% revenue – Google: +100msec = -0.2% query • But not for Etsy.com? “faster results better? Meh” Insensitive experimentation can lead to wrong conclusion that a feature has no impact.
  • 9. How to Achieve Better Sensitivity? 1. Get more users 2. Run longer experiments: – We recruit users continuously. – Longer experiment = more users = more power? – Wrong! This doesn’t always get us more power 3. CUPED Controlled Experiments Using Pre-Experiment Data Confidence interval for Sessions- per-User doesn’t shrink over a month!
  • 10. CUPED • Currently live in ’s experiment system • Allows for running experiments with – Half the users, or – Half the duration • Leveraging pre-exp data to improve sensitivity • Intuition: mixture model total variance = between-group variance + within-group variance
  • 11. • One top reason not discussed: Instrumentation bugs • For more insights, check out our papers (KDD’2012 and WSDM’2013) or find me at the networking session