Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making

•Télécharger en tant que PPTX, PDF•

1 j'aime•2,517 vues

indeedeng

Link to this presentation: http://engineering.indeedblog.com/talks/data-driven-off-a-cliff/

Technologie

Data-Driven Off a Cliff
Anti-patterns in evidence-based decision making
Ketan Gangatirkar & Tom Wilbur

The only reliable way is to see what works

We’ve also used data to make bad decisions

Problem
Running an experiment can ruin the experiment

Change Effect on productivity
Brighter light UP
Dimmer light UP
Warmer UP
Cooler UP
Shorter breaks UP
Longer breaks UP

Change Effect on productivity
Brighter light UP (temporarily)
Dimmer light UP (temporarily)
Warmer UP (temporarily)
Cooler UP (temporarily)
Shorter breaks UP (temporarily)
Longer breaks UP (temporarily)

Anscombe’s Quartet
Wikipedia http://bit.ly/2dlTUci

Simpson’s Paradox
Wikipedia http://bit.ly/1OHFSOk

You don’t need me to teach you
to be bad at math

I’ll teach you to be bad at everything else

p-value is the standard measure of
statistical significance

p-value is by measurement, not experiment

If you check results on Monday,
that’s one measurement

If you check results on Tuesday,
that’s another measurement

Move quickly! Because
results and p-values can shift fast

of “winning” A/B tests stopped early
are false-positives
80%
http://bit.ly/1LtaLkV

Shoppers specifying price, mileage
or year do better

Nudge shoppers to specify price,
mileage or year

We’d taken a shortcut in our
test assignment code
X

❤ jobs: +16%
Clicks: no change
Applies: no change
Hires: no change

We formed an “upsell team”
and measured their results

So why isn’t revenue moving?
Overall Revenue

Redefine success to include all outcomes

Anti-Lesson 03: Reloaded
Look at all the metrics

It's better for them. Is it better for us?

Job applications: Up
Job clicks: Down
Recommended Jobs traffic: Up
Job views: Sideways
New resumes: Up
Return visits: Down
Logins: Up
Revenue: Down
(and it goes on…)

Anti-Lesson 04
Be sloppy with your analysis

Specification
Source control
Code review
Automated tests
Manual QA
Metrics
Monitors
...

200 million job seekers don’t care
about our sales projections

So we don’t try as hard with analysis code

South Carolinians wanted to move to Dublin

IP blocks for South Carolina
got reallocated to London, England

“Growth in a Time of Debt”
Carmen Reinhart and Kenneth Rogoff
2010

Public debt > 90% GDP
leads to slower economic growth

Fixing the error eliminated the effect
Source: https://goo.gl/zAcd1e

20% of genetics papers have Excel errors
Source: http://wapo.st/2cWyrpJ

Employer budgets ran out
before the end of the day

Clicks received
1260
Out of budget time
20:00
% of day w/o budget
0.1667
Potential clicks
1260 / (1 - 0.1667) = 1512
Missed clicks
1512 * 0.1667 = 260
Missed Clicks Report
Dear Customer,
You got 1,260 clicks yesterday.
Your daily budget ran out at 8:00pm.
If you funded your budget through
the whole day, you’d get another
260 clicks - a +20% improvement!
Get More Clicks

Assumption
100
75
50
25
0
0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00
Missed = 260 clicks (+20%)
0:00

Reality
100
75
50
25
0
Missed = 100 clicks (+8%)
2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:000:00

Anti-Lesson 05
Only look for expected outcomes

Zero results pages from misspelled locations

Treatment on
homepage
Effect on
search page

One employer hired 4500 people in 45 minutes!

They don't care about using
the product “right”

Right metrics + wrong story = wrong conclusion

Anti-Lesson 06: Parte Deux
Story over metrics

New subscriptions: Good
Email opens: Good
Clicking on stuff: Good
Unsubscribing: Bad

People with new jobs don't need job alerts

Hawthorne Revisited
… the variance in productivity could be fully accounted for
by the fact that the lighting changes were made on
Sundays and therefore followed by Mondays when
workers’ productivity was refreshed by a day off.”
https://en.wikipedia.org/wiki/Hawthorne_effect

When we live the story, we live in the fog

If you didn’t find any, you’re exceptional

Or you’re making mistakes you don’t find

The first step is admitting you have a problem

There are 174 cognitive biases
[citation needed]

Data can’t make you a better decision-maker

Good data + bad decision-maker = bad decision

Our anti-lessons teach you
how to use data badly

Lesson 01
Lesson 02
Lesson 03
Lesson 04
Lesson 05
Lesson 06
Lesson 07
Be patient
Sampling is hard
Focus on a few, carefully chosen metrics
Be rigorous with your analysis
Watch out for side effects
Use metrics and stories
Plan for fallibility

Learn More
Engineering blog & talks http://indeed.tech
Open Source http://opensource.indeedeng.io
Careers http://indeed.jobs
Twitter @IndeedEng

Questions?
Contact us
ketan@indeed.com | twilbur@indeed.com

Seriously, that was the end
Contact us
ketan@indeed.com | twilbur@indeed.com

There are no more slides
Contact us
ketan@indeed.com | twilbur@indeed.com

Stop here
Contact us
ketan@indeed.com | twilbur@indeed.com

Recommandé

Weapons of Math Instruction: Evolving from Data0-Driven to Science-Drivenindeedeng

Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...indeedeng

Improving the development process with metrics driven insights presentationindeedeng

[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John EkmanCXL

Be Data Informed Without Being a Data ScientistPamela Pavliscak

SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's ComplicatedDistilled

Validation and hypothesis based product management by Abdallah Al-KhalidiAbdallah Al-Khalidi

Recommandé

Weapons of Math Instruction: Evolving from Data0-Driven to Science-Drivenindeedeng

Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...indeedeng

Improving the development process with metrics driven insights presentationindeedeng

[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John EkmanCXL

Be Data Informed Without Being a Data ScientistPamela Pavliscak

SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's ComplicatedDistilled

Validation and hypothesis based product management by Abdallah Al-KhalidiAbdallah Al-Khalidi

Is data visualisation bullshit?Alban Gérôme

How to Increase Your Testing Success by Combining Qualitative and Quantitativ...Optimizely

Prioritization – 10 different techniques for optimizing what to start next ...Troy Magennis

Test & Learn: The Alchemy & Science of Product Metrics - Choosing Metrics Tha...Optimizely

[CXL Live 16] Opening Keynote by Peep LajaCXL

The Optimisation Grand Unified Theory @ ConversionXL LiveConversionista

Master the Essentials of Conversion Optimizationjoshuapaulharper

A/B testing, optimization and results analysis by Mariia Bocheva, ATD'18Mariia Bocheva

[CXL Live 16] What to Test Next - Prioritizing Your Tests by Pauline MarolCXL

LKCE18 Nicolas Brown - Coaching in a data-driven worldLean Kanban Central Europe

Gaps in the algorithmWill Critchlow

SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OKDistilled

Acceptance, accessible, actionable and auditableAlban Gérôme

Lean MetricsPhilip Ledgerwood

A/B Testing and the Infinite Monkey TheoryUseItBetter

[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...CXL

Always Be Testing - Learn from Every A/B Test (Hiten Shah)Future Insights

[CXL Live 16] How to Create Landing Pages That Address the Emotional Needs of...CXL

Data Science and Goodhart's LawDomino Data Lab

Cognifide content usabilitytesting-csa2017-v0.1Kate Kenyon

Automation and Developer Infrastructure — Empowering Engineers to Move from I...indeedeng

[@IndeedEng Talk] Diving deeper into data-driven product designindeedeng

Contenu connexe

Tendances

Is data visualisation bullshit?Alban Gérôme

How to Increase Your Testing Success by Combining Qualitative and Quantitativ...Optimizely

Prioritization – 10 different techniques for optimizing what to start next ...Troy Magennis

Test & Learn: The Alchemy & Science of Product Metrics - Choosing Metrics Tha...Optimizely

[CXL Live 16] Opening Keynote by Peep LajaCXL

The Optimisation Grand Unified Theory @ ConversionXL LiveConversionista

Master the Essentials of Conversion Optimizationjoshuapaulharper

A/B testing, optimization and results analysis by Mariia Bocheva, ATD'18Mariia Bocheva

[CXL Live 16] What to Test Next - Prioritizing Your Tests by Pauline MarolCXL

LKCE18 Nicolas Brown - Coaching in a data-driven worldLean Kanban Central Europe

Gaps in the algorithmWill Critchlow

SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OKDistilled

Acceptance, accessible, actionable and auditableAlban Gérôme

Lean MetricsPhilip Ledgerwood

A/B Testing and the Infinite Monkey TheoryUseItBetter

[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...CXL

Always Be Testing - Learn from Every A/B Test (Hiten Shah)Future Insights

[CXL Live 16] How to Create Landing Pages That Address the Emotional Needs of...CXL

Data Science and Goodhart's LawDomino Data Lab

Cognifide content usabilitytesting-csa2017-v0.1Kate Kenyon

Tendances (20)

Is data visualisation bullshit?

How to Increase Your Testing Success by Combining Qualitative and Quantitativ...

Prioritization – 10 different techniques for optimizing what to start next ...

Test & Learn: The Alchemy & Science of Product Metrics - Choosing Metrics Tha...

[CXL Live 16] Opening Keynote by Peep Laja

The Optimisation Grand Unified Theory @ ConversionXL Live

Master the Essentials of Conversion Optimization

A/B testing, optimization and results analysis by Mariia Bocheva, ATD'18

[CXL Live 16] What to Test Next - Prioritizing Your Tests by Pauline Marol

LKCE18 Nicolas Brown - Coaching in a data-driven world

Gaps in the algorithm

SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK

Acceptance, accessible, actionable and auditable

Lean Metrics

A/B Testing and the Infinite Monkey Theory

[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...

Always Be Testing - Learn from Every A/B Test (Hiten Shah)

[CXL Live 16] How to Create Landing Pages That Address the Emotional Needs of...

Data Science and Goodhart's Law

Cognifide content usabilitytesting-csa2017-v0.1

En vedette

Automation and Developer Infrastructure — Empowering Engineers to Move from I...indeedeng

[@IndeedEng Talk] Diving deeper into data-driven product designindeedeng

@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Dayindeedeng

[@IndeedEng] Imhotep Workshopindeedeng

[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving Systemindeedeng

@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...indeedeng

[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...indeedeng

[@IndeedEng] Large scale interactive analytics with Imhotepindeedeng

[@IndeedEng] Boxcar: A self-balancing distributed services protocol indeedeng

[@IndeedEng] Logrepo: Enabling Data-Driven Decisionsindeedeng

[@IndeedEng] Managing Experiments and Behavior Dynamically with Proctorindeedeng

En vedette (11)

Automation and Developer Infrastructure — Empowering Engineers to Move from I...

[@IndeedEng Talk] Diving deeper into data-driven product design

@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day

[@IndeedEng] Imhotep Workshop

[@IndeedEng] From 1 To 1 Billion: Evolution of Indeed's Document Serving System

@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...

[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...

[@IndeedEng] Large scale interactive analytics with Imhotep

[@IndeedEng] Boxcar: A self-balancing distributed services protocol

[@IndeedEng] Logrepo: Enabling Data-Driven Decisions

[@IndeedEng] Managing Experiments and Behavior Dynamically with Proctor

Similaire à Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making

Being Right Starts By Knowing You're WrongData Con LA

The Art of Speaking Data.David Wellman

AI Fails: Avoiding bias in your systemsDr Janet Bastiman

Automated decision making with predictive applications – Big Data FrankfurtLars Trieloff

Slides from Growthcon 2014 Lean Analytics masterclassLean Analytics

Conversion Hotel 2016 - John EkmanWebanalisten .nl

DIT Digitial Marketing Forum: AnalyticsLar Veale

[Elite Camp 2016] Martijn Scheijbeler - Optimization for Content SitesCXL

Slides for the day-long Lean Analytics workshop at the 2014 Lean Startup conf...Lean Analytics

Startup Metrics: The Data That Will Make or Break Your Business by Alistair C...Lean Startup Co.

Melbourne Business School - mba talk october 14 - croll - 40m - lean analyticsLean Analytics

Lean Analytics @ MicroConfLean Analytics

ECRDA: Loan officer training - Session 1Co-founder Ignitor

SearchLove San Diego 2017 | Joel Klettke | Don't Buy Your Customer a Beer: Ho...Distilled

Next Generation Impact Measurement | July 19, 2016United Way of the National Capital Area

Alistaire croll lean analytics - montreal lean startup circle - september 2018Lean Startup Circle Montreal

College Admission Essay Yahoo. Online assignment writing service.April Eide

Exit strategy Berkeley 2016Stanford University

How to Perform Website Experiments [+ SEJ Experiment Walk-Through & Results]Search Engine Journal

How PBworks Used Lean Startup TechniquesDavid E. Weekly

Similaire à Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making (20)

Being Right Starts By Knowing You're Wrong

The Art of Speaking Data.

AI Fails: Avoiding bias in your systems

Automated decision making with predictive applications – Big Data Frankfurt

Slides from Growthcon 2014 Lean Analytics masterclass

Conversion Hotel 2016 - John Ekman

DIT Digitial Marketing Forum: Analytics

[Elite Camp 2016] Martijn Scheijbeler - Optimization for Content Sites

Slides for the day-long Lean Analytics workshop at the 2014 Lean Startup conf...

Startup Metrics: The Data That Will Make or Break Your Business by Alistair C...

Melbourne Business School - mba talk october 14 - croll - 40m - lean analytics

Lean Analytics @ MicroConf

ECRDA: Loan officer training - Session 1

SearchLove San Diego 2017 | Joel Klettke | Don't Buy Your Customer a Beer: Ho...

Next Generation Impact Measurement | July 19, 2016

Alistaire croll lean analytics - montreal lean startup circle - september 2018

College Admission Essay Yahoo. Online assignment writing service.

Exit strategy Berkeley 2016

How to Perform Website Experiments [+ SEJ Experiment Walk-Through & Results]

How PBworks Used Lean Startup Techniques

Plus de indeedeng

Alchemy and Science: Choosing Metrics That Workindeedeng

Indeed My Jobs: A case study in ReactJS and Redux (Meetup talk March 2016)indeedeng

Data Day Texas - Recommendationsindeedeng

Vectorized VByte Decodingindeedeng

[@IndeedEng] Redundant Array of Inexpensive Datacentersindeedeng

[@IndeedEng] Building Indeed Resume Searchindeedeng

Plus de indeedeng (6)

Alchemy and Science: Choosing Metrics That Work

Indeed My Jobs: A case study in ReactJS and Redux (Meetup talk March 2016)

Data Day Texas - Recommendations

Vectorized VByte Decoding

[@IndeedEng] Redundant Array of Inexpensive Datacenters

[@IndeedEng] Building Indeed Resume Search

Dernier

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

A Call to Action for Generative AI in 2024Results

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Dernier (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Google AI Hackathon: LLM based Evaluator for RAG

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Finology Group – Insurtech Innovation Award 2024

[2024]Digital Global Overview Report 2024 Meltwater.pdf

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

My Hashitalk Indonesia April 2024 Presentation

Breaking the Kubernetes Kill Chain: Host Path Mount

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Maximizing Board Effectiveness 2024 Webinar.pptx

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

A Call to Action for Generative AI in 2024

Presentation on how to chat with PDF using ChatGPT code interpreter

Scaling API-first – The story of a global engineering organization

Injustice - Developers Among Us (SciFiDevCon 2024)

Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making

1. Data-Driven Off a Cliff Anti-patterns in evidence-based decision making Ketan Gangatirkar & Tom Wilbur

2. Data-Driven Off a Cliff Anti-patterns in evidence-based decision making Ketan Gangatirkar & Tom Wilbur

3. I help people get jobs.

4. Indeed is the #1 job site worldwide

6. Headquartered in Austin, Texas

7. We have tons of ideas

8. We have tons of bad ideas

9. Occasionally, we have good ideas

10. It’s hard to tell the difference

11. What helps people get jobs?

12. The only reliable way is to see what works

13. XKCD http://bit.ly/1JWz6Qh

14. We set up experiments

15. We collect results

16. We use the data to decide what to do

17. We’ve used data to make good decisions

18. But having data is not a silver bullet

19. We’ve also used data to make bad decisions

20. Science is hard

21.

22. Problem Running an experiment can ruin the experiment

23. Wikipedia http://bit.ly/1LkLPiP

24.

25. Change Effect on productivity Brighter light UP Dimmer light UP Warmer UP Cooler UP Shorter breaks UP Longer breaks UP

26. Change Effect on productivity Brighter light UP (temporarily) Dimmer light UP (temporarily) Warmer UP (temporarily) Cooler UP (temporarily) Shorter breaks UP (temporarily) Longer breaks UP (temporarily)

27. Change Effect on productivity Brighter light UP (temporarily) Dimmer light UP (temporarily) Warmer UP (temporarily) Cooler UP (temporarily) Shorter breaks UP (temporarily) Longer breaks UP (temporarily)

28. Problem Statistics are hard

29. Anscombe’s Quartet Wikipedia http://bit.ly/2dlTUci

30. Simpson’s Paradox

31. Simpson’s Paradox Wikipedia http://bit.ly/1OHFSOk

32. Using data is more than just statistics

33. + + + + = Good math. Bad idea.

34. Bad practices can undermine good math

35. You don’t need me to teach you to be bad at math

36. I’ll teach you to be bad at everything else

37. Anti-Lesson 01 Be impatient

38. p-value is the standard measure of statistical significance

39. p-value is by measurement, not experiment

40. If you check results on Monday, that’s one measurement

41. If you check results on Tuesday, that’s another measurement

42.

43. Got the result you want?

44. Declare victory!

45. Move quickly! Because results and p-values can shift fast

46. of “winning” A/B tests stopped early are false-positives 80% http://bit.ly/1LtaLkV

47. Anti-Lesson 02 Sampling is easy

48. Beware the IEdes of March Story

49. Building Used Cars Search

50. Shoppers specifying price, mileage or year do better

51. Nudge shoppers to specify price, mileage or year

52.

53. +3% conversion

54. After rollout, conversion > +3%

55. Why?

56. We’d taken a shortcut in our test assignment code X

57. Users on oldest browsers got ignored

58. Distorted sample Distorted results

59. Anti-Lesson 03 Look only at one metric

60. If a little bit is good, a lot is great

61.

62. Indeed has a heart Story

63.

64.

65. ❤ > ★ ?

66. +16% Saves on search results page

67. Everyone ❤s ❤s!

68. ❤s everywhere!

69.

70.

71. Hearted

72.

73. Not so fast

74. Did ❤ help people get jobs?

75. ❤ jobs: +16% Clicks: no change Applies: no change Hires: no change

76. I help people ❤ jobs.

77. Upsell team Story

78. We formed an “upsell team” and measured their results

79. + = Success measure

80. It’s working! Upsells

81. So why isn’t revenue moving? Overall Revenue

82. + 0 -

83. = ⅓+⅓ -⅓

84. What you measure is what you motivate

85. Redefine success to include all outcomes

86. Upsell Team revenue +200%

87. Anti-Lesson 03: Reloaded Look at all the metrics

88.

89.

90. It's better for them. Is it better for us?

91. Job applications: Up Job clicks: Down Recommended Jobs traffic: Up Job views: Sideways New resumes: Up Return visits: Down Logins: Up Revenue: Down (and it goes on…)

92. We didn’t really know what we wanted

93. Too much noise from too many metrics

94.

95. I help people get jobs.

96. Anti-Lesson 04 Be sloppy with your analysis

97. We engineer features rigorously

98. Specification Source control Code review Automated tests Manual QA Metrics Monitors ...

99. But analysis…

100. Bad analysis won’t take down Indeed.com

101. 200 million job seekers don’t care about our sales projections

102. So we don’t try as hard with analysis code

103. Specification Source control Code review Automated tests Manual QA Metrics Monitors ...

104. Dubliners Story

105. Indeed reports on economic trends

106. South Carolinians wanted to move to Dublin

107. Dublin?

108. No, the other one

109. Incorrect IP location mapping

110. IP blocks for South Carolina got reallocated to London, England

111. Worse things can happen

112. Growth and Debt Story

113. “Growth in a Time of Debt” Carmen Reinhart and Kenneth Rogoff 2010

114. Public debt > 90% GDP leads to slower economic growth

115. Governments made policy based on this

116.

117. Fixing the error eliminated the effect Source: https://goo.gl/zAcd1e

118. Genetic Mutation Story

119. 20% of genetics papers have Excel errors Source: http://wapo.st/2cWyrpJ

120. SEPT2 to a geneticist is Septin 2

121. SEPT2 to Excel is 42615

122.

123. Does your company use spreadsheets?

124. How do you know they’re correct?

125. Under-spending Advertisers Story

126. Employer budgets ran out before the end of the day

127. So no evening job seekers saw the jobs

128. How big was this missed opportunity?

129. Clicks received 1260 Out of budget time 20:00 % of day w/o budget 0.1667 Potential clicks 1260 / (1 - 0.1667) = 1512 Missed clicks 1512 * 0.1667 = 260 Missed Clicks Report Dear Customer, You got 1,260 clicks yesterday. Your daily budget ran out at 8:00pm. If you funded your budget through the whole day, you’d get another 260 clicks - a +20% improvement! Get More Clicks

130. Assumption 100 75 50 25 0 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Missed = 260 clicks (+20%) 0:00

131. Reality 100 75 50 25 0 Missed = 100 clicks (+8%) 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:000:00

132. Naive analysis bad recommendation

133. Anti-Lesson 05 Only look for expected outcomes

134. Zero results pages from misspelled locations

135.

136.

137. Goals: fewer ZRPs, more job clicks

138. Zero-results pages -2.7%

139. Job clicks +8%

140. +1,410% Ad revenue

141.

142. +1,410% Ad revenue

143.

144. ads

145.

146.

147.

148.

149. Ad revenue after fix

150. Treatment on homepage Effect on search page

151. Anti-Lesson 06 Metrics, not stories

152. I help people get jobs.

153. How do I know if people got jobs?

154. I need employers to tell me

155.

156. One employer hired 4500 people in 45 minutes!

157.

158. Nope

159. Accurate recording of outcomes helps us

160. It doesn’t help employers

161. They don't care about using the product “right”

162. Go away!

163. There is no “user story”

164. Right metrics + wrong story = wrong conclusion

165. Anti-Lesson 06: Parte Deux Story over metrics

166. Stories are seductive

167. Even incorrect stories are seductive

168. Taste Buds Story

169. Taste map

170. Totally wrong

171. Every bite you eat proves it’s wrong

172. People still believe it

173. Job Alerts Story

174.

175. Success for emails is well understood

176. New subscriptions: Good Email opens: Good Clicking on stuff: Good Unsubscribing: Bad

177. I help people get emails.

178. I help people get jobs.

179. What does job seeker success look like?

180. 01 Search for jobs

181. 02 Sign up for alerts

182. 03 Click on some jobs

183. 04 Apply to some jobs

184. 05 Get a job!

185. 06 Unsubscribe from emails

186. People with new jobs don't need job alerts

187. The standard story for email fails here

188. Light and Dark Redux Story

189.

190. It’s a persuasive story

191. But the original study was flawed

192. Hawthorne Revisited … the variance in productivity could be fully accounted for by the fact that the lighting changes were made on Sundays and therefore followed by Mondays when workers’ productivity was refreshed by a day off.” https://en.wikipedia.org/wiki/Hawthorne_effect

193. We con people with stories

194. We con ourselves with stories

195. Anti-Lesson 07 Believe in yourself

196. Believing in yourself can be good

197. “My startup will succeed.”

198. Often it’s bad

199. “I’d never fall for a scam like that.”

200. “I knew it all along.”

201. “I’m too smart to make that mistake.”

202. Every story of mistakes is deceptive

203. We tell stories with 20/20 hindsight

204. When we live the story, we live in the fog

205. You won’t think you’re making a mistake

206. Search your past for mistakes

207. Painful, embarrassing mistakes

208. If you didn’t find any, you’re exceptional

209. Either you’re making mistakes you find

210. Or you’re making mistakes you don’t find

211. How do you defend against mistakes?

212. The first step is admitting you have a problem

213. There are 174 cognitive biases [citation needed]

214. Data can help you make better decisions

215. Or more confidently make bad decisions

216. Data can’t make you a better decision-maker

217. Good data + bad decision-maker = bad decision

218. Our anti-lessons teach you how to use data badly

219. Do the opposite to do better

220. Lesson 01 Lesson 02 Lesson 03 Lesson 04 Lesson 05 Lesson 06 Lesson 07 Be patient Sampling is hard Focus on a few, carefully chosen metrics Be rigorous with your analysis Watch out for side effects Use metrics and stories Plan for fallibility

221. Learn from our mistakes

222. Be prepared for your own

223. Learn More Engineering blog & talks http://indeed.tech Open Source http://opensource.indeedeng.io Careers http://indeed.jobs Twitter @IndeedEng

224. Questions? Contact us ketan@indeed.com | twilbur@indeed.com

225. Seriously, that was the end Contact us ketan@indeed.com | twilbur@indeed.com

226. There are no more slides Contact us ketan@indeed.com | twilbur@indeed.com

227. Stop here Contact us ketan@indeed.com | twilbur@indeed.com

Notes de l'éditeur

Good evening, thanks for coming to our @IndeedEng Tech Talk tonight.
This is “Data-driven off a cliff, anti-patterns in evidence-based decision making”. I’m Tom Wilbur, and I’m a product manager at Indeed, and...
I help people get jobs.
Indeed is the #1 job site worldwide. We serve over 200M monthly unique users, across more than 60 countries and in 29 languages.
The primary place that jobseekers start on Indeed is here - the search experience. It’s simple -- you type in some keywords and a location and you get a ranked list of jobs that are relevant to you..
Indeed is headquartered here in Austin, Texas, the capital of the Lone Star State. Austin is also the location of our largest engineering office, and we have engineering offices around the world in Tokyo, Seattle, San Francisco and Hyderabad. So we have tons of smart engineers and product teams working around the clock to make a better Indeed. https://en.wikipedia.org/wiki/Flag_of_Texas#/media/File:Flag_of_Texas.svg
We have tons of ideas.... BUT
We have tons of bad ideas, too.
Now occasionally we do have good ideas, but
It’s hard to tell the difference. What we really want to know, is --
What helps people get jobs? We believe...
The only reliable way to know is just try stuff and see what works. (NEXT TO JOKE)
(pause) So at Indeed,
We set up experiments. We run A/B tests on our site where users are randomly assigned to different experiences.
We collect results. We observe the users’ behavior. Our LogRepo system adds about 6TB of new data every day.
And we use that data to decide what to do. To see which features and capabilities do help people get jobs, and which don’t.
We’ve used data to make good decisions,
But having a ton of data is not a silver bullet.
We’ve also used data to make bad decisions. Because the truth is,
Science is hard. (NEXT TO JOKE)
(pause) For example, one serious problem is that the very act of just
Running an experiment, can ruin the experiment itself. Let me tell you a quick story.
There was a famous experiment conducted in the late 1920s at an electrical factory outside of Chicago, Illinois. Called the Hawthorne Works. The factory managers wanted to improve worker productivity, so they decided to try some changes to the worker environment.
They changed the lighting conditions, sometimes brighter, sometimes dimmer. They changed the temperature in the factory, and length of breaks. Initially they were excited, as their early experiments resulted in improvements in worker productivity.
Brighter lights? Productivity goes up! Dimmer lights? Productivity goes up! Warmer? Up! Cooler? Up. Shorter breaks, longer breaks, it seemed that everything they tried improved worker productivity. And on top of that, none of these improvements stuck.
It all quickly faded. Ultimately the conclusion of the researchers was that the very fact of changing the conditions, of running the test, of observing the results, affected the workers’ behavior. This effect is now known as -- the Hawthorne Effect. Those of us that run experiments to optimize websites all over the world know this well. When we see a change in user behavior, we often ask the question, “but will it last? Is that change real, or is it just the Hawthorne Effect?” So science is hard. And if that wasn’t enough,
It all quickly faded. Ultimately the conclusion of the researchers was that the very fact of changing the conditions, of running the test, of observing the results, affected the workers’ behavior. This effect is now known as -- the Hawthorne Effect. Those of us that run experiments to optimize websites all over the world know this well. When we see a change in user behavior, we often ask the question, “but will it last? Is that change real, or is it just the Hawthorne Effect?” So science is hard. And if that wasn’t enough,
Statistics are hard. There are plenty of ways where an analysis can produce surprising if not contradictory results.
For example, consider “Anscombe’s quartet”. In 1973, statistician Francis Anscombe described four very different sets of 11 points that all have the same basic statistical properties -- mean, variance, correlation, and as the blue line shows, regression. This demonstrates that looking at a statistical calculation isn’t at all sufficient to understand your data, especially when there are outliers. https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Another example is Simpson’s Paradox. This is where a statistician goes back in time with his toaster and starts accidentally changing the future and the more he tries to fix it, the worse it gets. There are no donuts, and people have lizard-tongues, and that’s just no way to make data-driven decisions. Wait no, that’s Homer Simpson’s Paradox from Treehouse of Horror V. Sorry.
Edwin Simpson’s Paradox is something else. This result describes the situation where individual groups of data tell a different story than when the data are combined. On this chart for example, the four blue dots and four red dots each show a positive trend, but when combined, you get the black dotted line that shows a negative trend overall. Imagine if you saw that revenue for mobile was increasing, and revenue for desktop was increasing, but overall revenue appears to be decreasing. Now what do you do? Usually this situation means you don’t understand underlying causal relationships in your data. Because statistics are hard.
But using data correctly is more than just statistics. If you apply good math to a bad idea...
Just because it’s mathematically correct, doesn’t mean you won’t seriously regret the outcome of that test. http://www.glamour.com/images/health-fitness/2011/06/0606-tequila_at.jpg
So bad practices can undermine good math.
You don’t need me to teach you how to be bad at math.
But tonight, I’ll teach you to be bad at everything else. On top of the inherent challenges of science, statistics and bad ideas, we’ll share with you our powerful techniques of how to make data-driven decisions… the wrong way.
So, we’ll start with Anti-Lesson number 1. Be Impatient. One of the best ways to be bad at evidence-based decision making is to be impatient.
A p-value is the standard measure of statistical significance. It represents the probability that the observed result would happen if the null hypothesis were true, or, informally, the chance that what you’ve measured is just random chance. For a successful A/B test, we want to see positive results with a p-value below some threshold, typically 5% or .05.
But the calculation of p-value is by measurement, not the whole experiment. It only tells you how confident to be in your results given the circumstances of the test thus far.
If you check results on Tuesday, that’s another measurement. Now your boss is asking if it’s significant yet. So you keep checking and checking,
And your data scientist is muttering, saying you should just wait to get to the necessary sample size she estimated. It’s really frustrating. (pause) There’s a better way.
Got the result you want? On that test that you knew was a good idea. Are the results already positive after only two days? And when you checked the p-value on your phone while in line at Starbucks was it at less than 0.05?
Declare victory! Turn off the test and roll it 100%. Don’t waste your valuable time with that statistical wah wah wah about regression to the mean and probability of null hypothesis something.
http://www.qubit.com/sites/default/files/pdf/mostwinningabtestresultsareillusory_0.pdf (Martin Goodson, Research Lead at Qubit, a UK web consultancy) In fact, Martin Goodson shows that if you were to do a check for significance every day, and stop positive tests as soon as they show significance, 80% of those “winning” A/B tests are likely false-positives. Are bogus results. And that’s why being impatient is a great way to make bad decisions.
Another great way to do data-driven product development wrong is to believe that sampling is easy. I mean, it’s hard and time-consuming to make sure that you’ve got representative users in your A/B tests.
Let me illustrate this with a story I call, “Beware the IEdes of March.” And you’ll see how well this anti-pattern worked for me.
At a previous company where I worked, we were building Used Car search experiences for major media brands, and we were doing A/B tests to try to increase the probability that we successfully connect a car shopper to a dealer with matching inventory.
One of the things we had observed when we analyzed successful user behavior, was that shoppers specifying price, mileage or year in their search do better. They’re more successful at finding cars they are interested in. So we had a hypothesis --
Could we encourage shoppers to specify price, mileage or year, and improve conversion?
We tried a couple ideas, including moving the price, mileage and year facets up in the search UI to make it easier to find, and we also tried a tooltip nudge, directly encouraging users to add these terms to their search.
Of all the variants, the tooltip nudge wins, we saw a 3% lift in unique conversion (with a p-value of .04). So we decided to roll it out.
It turns out, we’d taken a shortcut in our test assignment code. This was the summer of 2009, when IE had 60%+ of the US browser market, and my company, like many others, was sick and tired of supporting IE6 (the browser that PC World called “the least secure software on the planet”). So to work around a problem in our code that assigned users to test variants, we just didn’t handle IE6.
So the users on the oldest browsers got ignored. This turned out to be 20%+ of users. And even worse, we learned those 20% didn’t behave the same as the remaining 80%. From later analysis and user research, we came to believe that users on the oldest browsers also shopped differently, for different cars. They were on average more price sensitive and benefitted more from that nudge.
We’d depended on a distorted sample of the population. We went through all the effort to run a test, and a technical shortcut we took meant that we didn’t measure the results accurately. And we made an ill-informed decision. Because we thought sampling was easy.
Which brings us to the third way I’ll teach you how to do data-driven decision making wrong. Look only at one metric. If there’s one thing we know in life, it’s that
If a little bit is good, a lot is great. Anything worth doing is worth overdoing. I mean, there’s never a downside to that, is there?
http://www.magpictures.com/resources/presskits/bsf/10.jpg
Our first story I want to share about looking only at one metric, is called “Indeed has a heart,” and it’s about a test we ran in our mobile app.
As jobseekers explore available jobs, they have the option to Save a job so they can easily come back to it later. We decided to test changing the icon associated with a Save from a star to a heart. We did this on job details page,
And on the search results page.
So, were hearts better than stars?
They were! We observed a 16% increase in Saves on the search results page.
Now, everyone loves hearts! We rolled our test out 100%. But why stop there? The obvious thing to do is
To have hearts everywhere!
Stars on your Amazon reviews?
Nope! Hearts now.
We sent our test results to Google, and in the next version of Gmail the Starred folder will be replaced with Hearted!
And we’ve got a bill in front of the new state legislature. We’re all gonna live and work in the Lone Heart State!
[sigh] Not so fast. Changing the stars to hearts improved the one metric we were looking at - usage of the “Save this job” feature, but
Did Hearts help people get jobs?
Sadly, no. There was no discernable impact on job seeker success. When we analyzed longer-term behavior of jobseekers, there was no evidence of an improvement in the primary metrics -- clicks, applies, hires. Which is unfortunate, because that’s our goal, not
To help people heart jobs. What we had done was to focus only on one metric.
If you really want to do evidence-based decision making wrong, you should make sure you look only at one metric in situations beyond your A/B tests. This anti-lesson can do damage all across your company. For example, at Indeed, we have a talented client services team that works with our customers to keep them engaged and highlight the value they’re receiving. Growing revenue from existing customers is clearly important, and we had a hypothesis that if we had a team focused only on that, we could be more successful.
So, we formed a dedicated “upsell team” and measured their results on a dashboard.
What we looked for was when there was a upsell contact with a customer, and then subsequently the customer’s spend went up, we credited the rep for that increase on the dashboard. This was also tied to a bonus program. So we started off, and
the dashboard told us it’s working! Reported upsells on the dashboard showed lots of wins, 10s of thousands of $$.
But when we stepped back, revenue for the total pool of accounts wasn’t increasing.
As it turned out, not every contact between a rep and a customer results in an increase in spend. Our naive dashboard looked only at one metric - the positive outcomes.
But in reality, Some are neutral. Some are negative. And so it didn’t measure the right result. In fact, when you’re showing people a metric about their performance,
What you measure is what you motivate. In talking to the reps, because our dashboard only looked at the positive outcomes, they were less interested in contacting customers who were planning to lower their spend. The incentives were only about getting to an increase, nothing else mattered. So we made a change.
We redefined success to include all the outcomes, updated the dashboard and continued the experiment of the upsell team. After that one change, we saw more diverse interactions, and better results!
The Upsell Team’s revenue increased by 200%, and we decided to continue the experiment and grow the team. So we saw two examples there about how looking only at one metric, especially when it’s an easily-computed feature metric or maybe the first metric you thought of, is a great way to do evidence-based decision making wrong. Now, that anti-lesson has a flip-side, too -- Caveats: not an A/B test, lots of confounding factors, small sample size, team got better at their job, grain of salt, etc. But we also can directly observe the actors in this story, so we focus on how the metric affected behavior.
Because another secret to making bad data-driven decisions is to look at all the metrics. For this anti-lesson, we’ll return to Indeed’s mobile app.
We were comparing our mobile app to other companies’ apps and noticed a growing adoption of a particular way to indicate a menu. They were using what’s now popularly known as the “hamburger menu”. One of our product managers stole the idea...
(pause) And we decided to test a hamburger menu to improve Indeed’s mobile app.
It’s better for them, is it better for us? Let’s look at the results.
[read through list, growing more confused] <click> at Logins (pause) What we realized was...
We didn’t really know what we wanted. We didn’t start our test with a goal in mind for what the hamburger menu was supposed to do. So when the metrics came back with conflicting answers, we couldn’t know if the change was any good.
There was too much noise from too many metrics. We ended up leaving this test running for a looong time hoping the right decision would become clear. It didn’t. We had lots of discussions and email threads and meetings where “seriously we need to make a decision about the hamburger test.” In the end, we turned it off, so there’s no hamburger in the Indeed mobile app. In this case, by not starting with a clear goal, and by looking at all the metrics, we spent a lot of time and energy and failed at making a good evidence-based decision.
Tom: Now I’d like to introduce my colleague Ketan who will teach us about even more exciting ways to make bad decisions. Ketan?
Who’s got time for rigorous analysis? Just give me an Excel spreadsheet.
We often don’t even see it as code
https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-alarming-number-of-scientific-papers-contain-excel-errors/ http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
https://www.buzzfeed.com/scott/disaster-girl
Do you have a spec?
Do you have a spec?
Do you have a spec?
http://go.indeed.com/RZtb4csgenvm
http://go.indeed.com/RZ3rtm7ddtot http://go.indeed.com/RZbcaq1a72dd <<<
http://go.indeed.com/RZ2sqmo2u6kk http://go.indeed.com/RZg3llr4991l << TODO
http://go.indeed.com/RZ2sqmo2u6kk http://go.indeed.com/RZg3llr4991l
There are no keyword ads on this page
http://go.indeed.com/RZ6afqh4pci2 changed to: http://go.indeed.com/RZmmebqb3uvj
http://winetimeshk.com/admin/wp-content/uploads/2015/08/tongue-map.gif
We didn't.
Some of the mistakes I'm telling you about were painful and embarrassing for us.
Or you think you are
…. But there’s one big lesson remaining
…. But there’s one big lesson remaining
…. But there’s one big lesson remaining