Bing_Controlled Experimentation_Panel_The Hive

•Télécharger en tant que PPTX, PDF•

2 j'aime•1,219 vues

The Hive

Why Didn’t My Feature Improve
the Metric?
Ya Xu
Based on two papers (KDD’2012 and WSDM’2013) with
Ronny Kohavi, Alex Deng, Toby Walker,
Brian Frasca and Roger Longbotham

Experimentation Panel 3/20/2013

What Metric?
• Overall Evaluation Criterion (OEC): metric(s) used
to decide whether A or B is better.
• Long term goal for : query share & revenue
• Puzzling outcome:
– Ranking bug in an experiment resulted in very poor
search results
– Query up +10% and revenue up +30%
– What should a search engine use as OEC?
• We use Sessions-Per-User.

REASON #1

The feature just wasn’t as good as you thought…

We are poor at assessing the value of ideas.
Jim Manzi: “Google ran approximately 12,000 randomized
experiment in 2009, with [only] about 10% of these
leading to business changes.”

Background
• Puzzling outcome:
– Several experiments showed surprising results
– Reran and effects disappeared
– Why?
• Bucket system (Bing/Google/Yahoo)
– Assign users into buckets, then assign buckets to
experiments.
– Buckets are reused from one experiment to next.

Carryover Effect
• Explanation:
– bucket system recycles users; prior experiment
had carryover effects
– Effects last for months
• Solution:
– Run A/A test start end

– Local Re-randomization

Background
• Performance matters
– Bing: +100msec = -0.6% revenue
– Amazon: +100msec = -1% revenue
– Google: +100msec = -0.2% query
• But not for Etsy.com?
“faster results better? Meh”

Insensitive experimentation can lead to wrong
conclusion that a feature has no impact.

How to Achieve Better Sensitivity?
1. Get more users
2. Run longer experiments:
– We recruit users continuously.
– Longer experiment = more users = more power?
– Wrong! This doesn’t always get us more power
3. CUPED
Controlled Experiments Using Pre-Experiment Data
Confidence interval for Sessions-
per-User doesn’t shrink over a
month!

CUPED
• Currently live in ’s experiment system
• Allows for running experiments with
– Half the users, or
– Half the duration
• Leveraging pre-exp data to improve sensitivity
• Intuition: mixture model

total variance
= between-group variance + within-group variance

• One top reason not discussed:
Instrumentation bugs
• For more insights, check out our papers
(KDD’2012 and WSDM’2013) or find me at the
networking session

Recommandé

Expt panel hive_data_rp_20130320_final-1The Hive

[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29The Hive

Bizitzaren historiaJoserra Abarretegui

Startup Series: Lean Analytics, Innovation, and Tilting at WindmillsThe Hive

The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...The Hive

San martin 2013 2014Joserra Abarretegui

Tomer Shiran, MapR_Hadoop&SQLThe Hive

Redefine healthcare with IT by Niranjan ThirumaleThe Hive

Recommandé

Expt panel hive_data_rp_20130320_final-1The Hive

[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29The Hive

Bizitzaren historiaJoserra Abarretegui

Startup Series: Lean Analytics, Innovation, and Tilting at WindmillsThe Hive

The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...The Hive

San martin 2013 2014Joserra Abarretegui

Tomer Shiran, MapR_Hadoop&SQLThe Hive

Redefine healthcare with IT by Niranjan ThirumaleThe Hive

La musicasandramucua

Groupon_Controlled Experimentation_Panel_The HiveThe Hive

Susheel Patel, Pivotal_Hadoop&SQLThe Hive

Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...The Hive

My magazine editedsofiamorana1

Chictopia for Mobile & Social Commerce panel discussionThe Hive

[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013The Hive

Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonThe Hive

Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventThe Hive

Redbookens007

Pre production planningsofiamorana1

Alan Gates, Hortonworks_Hadoop&SQLThe Hive

San martin 2013 2014Joserra Abarretegui

[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...The Hive

Leanplum_Controlled Experimentation_Panel_The HiveThe Hive

1.nigam shah stanford_meetupThe Hive

Untethered health in a networked society by James MathewsThe Hive

"Responsible AI", by Charlie MuirheadThe Hive

Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...The Hive

Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive

Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive

The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive

Contenu connexe

En vedette

La musicasandramucua

Groupon_Controlled Experimentation_Panel_The HiveThe Hive

Susheel Patel, Pivotal_Hadoop&SQLThe Hive

Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...The Hive

My magazine editedsofiamorana1

Chictopia for Mobile & Social Commerce panel discussionThe Hive

[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013The Hive

Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, AltizonThe Hive

Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventThe Hive

Redbookens007

Pre production planningsofiamorana1

Alan Gates, Hortonworks_Hadoop&SQLThe Hive

San martin 2013 2014Joserra Abarretegui

[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...The Hive

Leanplum_Controlled Experimentation_Panel_The HiveThe Hive

1.nigam shah stanford_meetupThe Hive

Untethered health in a networked society by James MathewsThe Hive

En vedette (17)

La musica

Groupon_Controlled Experimentation_Panel_The Hive

Susheel Patel, Pivotal_Hadoop&SQL

Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...

My magazine edited

Chictopia for Mobile & Social Commerce panel discussion

[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013

Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon

Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event

Redbook

Pre production planning

Alan Gates, Hortonworks_Hadoop&SQL

San martin 2013 2014

[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...

Leanplum_Controlled Experimentation_Panel_The Hive

1.nigam shah stanford_meetup

Untethered health in a networked society by James Mathews

Plus de The Hive

"Responsible AI", by Charlie MuirheadThe Hive

Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...The Hive

Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive

Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive

The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive

Data Science in the EnterpriseThe Hive

AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive

“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...The Hive

"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell AutomationThe Hive

Social Impact & Ethics of AI by Steve OmohundroThe Hive

The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive

The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive

The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive

The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive

The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive

Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive

The Hive Think Tank: Heron at TwitterThe Hive

The Hive Think Tank: Unpacking AI for Healthcare The Hive

The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive

Plus de The Hive (20)

"Responsible AI", by Charlie Muirhead

Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...

Digital Transformation; Digital Twins for Delivering Business Value in IIoT

Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18

The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...

Data Science in the Enterprise

AI in Software for Augmenting Intelligence Across the Enterprise

“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...

"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation

Social Impact & Ethics of AI by Steve Omohundro

The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan

The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...

The Hive Think Tank: The Future Of Customer Support - AI Driven Automation

The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...

The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change

Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik

The Hive Think Tank: Heron at Twitter

The Hive Think Tank: Unpacking AI for Healthcare

The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...

Bing_Controlled Experimentation_Panel_The Hive

1. Why Didn’t My Feature Improve the Metric? Ya Xu Based on two papers (KDD’2012 and WSDM’2013) with Ronny Kohavi, Alex Deng, Toby Walker, Brian Frasca and Roger Longbotham Experimentation Panel 3/20/2013

2. What Metric? • Overall Evaluation Criterion (OEC): metric(s) used to decide whether A or B is better. • Long term goal for : query share & revenue • Puzzling outcome: – Ranking bug in an experiment resulted in very poor search results – Query up +10% and revenue up +30% – What should a search engine use as OEC? • We use Sessions-Per-User.

3. REASON #1 The feature just wasn’t as good as you thought… We are poor at assessing the value of ideas. Jim Manzi: “Google ran approximately 12,000 randomized experiment in 2009, with [only] about 10% of these leading to business changes.”

4. REASON #2: CARRYOVER EFFECT

5. Background • Puzzling outcome: – Several experiments showed surprising results – Reran and effects disappeared – Why? • Bucket system (Bing/Google/Yahoo) – Assign users into buckets, then assign buckets to experiments. – Buckets are reused from one experiment to next.

6. Carryover Effect • Explanation: – bucket system recycles users; prior experiment had carryover effects – Effects last for months • Solution: – Run A/A test start end – Local Re-randomization

7. REASON #3: STATISTICAL SENSITIVITY

8. Background • Performance matters – Bing: +100msec = -0.6% revenue – Amazon: +100msec = -1% revenue – Google: +100msec = -0.2% query • But not for Etsy.com? “faster results better? Meh” Insensitive experimentation can lead to wrong conclusion that a feature has no impact.

9. How to Achieve Better Sensitivity? 1. Get more users 2. Run longer experiments: – We recruit users continuously. – Longer experiment = more users = more power? – Wrong! This doesn’t always get us more power 3. CUPED Controlled Experiments Using Pre-Experiment Data Confidence interval for Sessions- per-User doesn’t shrink over a month!

10. CUPED • Currently live in ’s experiment system • Allows for running experiments with – Half the users, or – Half the duration • Leveraging pre-exp data to improve sensitivity • Intuition: mixture model total variance = between-group variance + within-group variance

11. • One top reason not discussed: Instrumentation bugs • For more insights, check out our papers (KDD’2012 and WSDM’2013) or find me at the networking session