Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

When in doubt, go live

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 70 Publicité

When in doubt, go live

Télécharger pour lire hors ligne

Find out how to validate hypotheses quickly using feedback that comes from a (large enough) number of actual users interacting with your product. In this talk, we will show you the technical foundations, research techniques and organisational setup that we have used successfully on large-scale products. These will save you development time, enable you to go live with confidence, make decisions based on real behaviour instead of best guesses, and solve the actual problems your users are facing.

Find out how to validate hypotheses quickly using feedback that comes from a (large enough) number of actual users interacting with your product. In this talk, we will show you the technical foundations, research techniques and organisational setup that we have used successfully on large-scale products. These will save you development time, enable you to go live with confidence, make decisions based on real behaviour instead of best guesses, and solve the actual problems your users are facing.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à When in doubt, go live (20)

Publicité

Plus par Thoughtworks (20)

Plus récents (20)

Publicité

When in doubt, go live

  1. 1. When in doubt, go live Techniques for decision making based on real user behavior © 2020 ThoughtWorks Irene Torres Klaus Fleerkötter
  2. 2. You save time and make better decisions by establishing shorter feedback loops from feature idea to feature usage. © 2020 ThoughtWorks
  3. 3. Irene Torres Developer @ TW PhD Neuroscience Science perspective Klaus Fleerkötter Developer @ TW Information Systems Techie perspective Klaus Who’s talking? © 2020 ThoughtWorks
  4. 4. What is this talk about? Specific use cases that worked for us Tech & Research And what is it not... © 2020 ThoughtWorks Extensive coverage of user research Software testing
  5. 5. One of Germany’s biggest online retailers Top 5 highest traffic e-commerce sites (Germany) Orders: <= 10 per second Qualified visits: Ø 1.6 million / day Examples © 2020 ThoughtWorks
  6. 6. PO Establishing Feedback Loops Users Team Stakeholders Users
  7. 7. PO Establishing Feedback Loops Users Team Stakeholders Users
  8. 8. PO Delivery Pipeline Feature Toggle Shadow TrafficLab Test Focus Group Survey Visual Report A/B Test Establishing Feedback Loops
  9. 9. Prerequisites © 2020 ThoughtWorks
  10. 10. PO An Iterative and Incremental development process © 2020 ThoughtWorks
  11. 11. Services that can be built independently by cross-functional teams that are structured around business domains © 2020 ThoughtWorks Dev PO QA Ops UX DA
  12. 12. The Delivery Pipeline © 2020 ThoughtWorks Delivery Pipeline Iterative and Incremental development Independent Teams
  13. 13. The Delivery Pipeline © 2020 ThoughtWorks Build Test Deploy
  14. 14. Gain situational awareness Knowing that you went live and nothing’s on fire © 2020 ThoughtWorks
  15. 15. Feature Toggles © 2020 ThoughtWorks Delivery Pipeline Feature Toggle Iterative and Incremental development Independent Teams
  16. 16. Feature Toggles Decouple go-live from deployment © 2020 ThoughtWorks © CC BY 2.0 "Switch" Jon_Callow_Images if (toggleIsOn) then { executeNewBehavior() } else { executeOldBehavior() }
  17. 17. Feature Toggles Flip for experimentation © CC BY-ND 2.0 "Off?" Nicholas Liby Without Recompile? Without Restart? Per Request? © 2020 ThoughtWorks
  18. 18. While developing, go live © 2020 ThoughtWorks
  19. 19. Shadow Traffic © 2020 ThoughtWorks Delivery Pipeline Feature Toggle Shadow Traffic Iterative and Incremental development Independent Teams
  20. 20. Shadow Traffic Not just for testing © 2020 ThoughtWorks User Old Behavior New Behavior sees no difference Run both Team
  21. 21. Shadow Traffic Get early feedback 60% 40% Min 3 items? Mostly fashion? Not sold out? Max 1 of each kind? Maximize! © 2020 ThoughtWorks
  22. 22. Visual Report © 2020 ThoughtWorks Delivery Pipeline Feature Toggle Shadow Traffic Visual Report Iterative and Incremental development Independent Teams
  23. 23. Visual Report Quality of a feature © 2020 ThoughtWorks
  24. 24. Visual Report Quality of a feature © 2020 ThoughtWorks
  25. 25. Visual Report Quality of a feature © 2020 ThoughtWorks
  26. 26. Assess that the MVP has the correct business rules ● Visual report (e.g. html page) Visual Report Quality of a feature Beach pants manual auto Leather bags Jackets © 2020 ThoughtWorks
  27. 27. Go live without flying blind © 2020 ThoughtWorks
  28. 28. A/B Testing © 2020 ThoughtWorks Delivery Pipeline Feature Toggle Shadow Traffic A/B Test Visual Report Iterative and Incremental development Independent Teams
  29. 29. A/B testing © 2020 ThoughtWorks “You want your data to inform, to guide, to improve your business model, to help you decide on a course of action.” Lean Analytics
  30. 30. A/B testing © 2020 ThoughtWorks “You want your data to inform, to guide, to improve your business model, to help you decide on a course of action.” Lean Analytics Focus on the understanding of the underlying statistics that drives the calculation of a sample size. STATS
  31. 31. A/B testing © 2020 ThoughtWorks “You want your data to inform, to guide, to improve your business model, to help you decide on a course of action.” Lean Analytics A/B testing ≡ a set of statistical tests that evaluate two independent groups, a control and a test group “Independent groups” -> between-subjects design STATS Focus on the understanding of the underlying statistics that drives the calculation of a sample size. groups = variants “Independent groups” -> between-subjects design
  32. 32. A/B testing © 2020 ThoughtWorks Control [A] Test [B]
  33. 33. A/B testing A/B testing mostly uses statistical hypothesis testing to calculate the likelihood of a change in your website being meaningful. Null hypothesis (H0): The state of the world. There is no effect, no difference when you apply changes. H0: Our <KPIs> remained the “same” in the control group and in the test group Alternative hypothesis (H1): the changes in the test group had a real effect. H1: Our users are actively engaged in clicking the button and therefore our A2B is relatively increased by 5% © 2020 ThoughtWorks
  34. 34. A/B testing © 2020 ThoughtWorks Alternative hypothesis (H1): the changes in the test group had a real effect. H1: Our users are actively engaged in clicking the button and therefore our A2B is relatively increased by 5%
  35. 35. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/
  36. 36. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ Metrics we know
  37. 37. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ Metrics we know We decide from previous data or knowledge about this variable [effect size]
  38. 38. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ Metrics we know We decide from previous data or knowledge about this variable [effect size] Dependent on the variable and what we are looking for [normally two-sided]
  39. 39. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ Metrics we know We decide from previous data or knowledge about this variable [effect size] We can play but mostly by convention and dependent on traffic [accuracy] Dependent on the variable and what we are looking for [normally two-sided]
  40. 40. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ Effect size The magnitude of the effect, how important the difference is
  41. 41. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ Test conversion rate = 15 * 2 + 2 = 2.3% (± 0.3%) Effect size The magnitude of the effect, how important the difference is Improvement that is meaningful for your business Test conversion rate - Control conversion rate Control conversion rate Relative improvement*100 = 100
  42. 42. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ One-sided or two-sided? ControlTest Mean test Mean control Is the difference significant enough to reject the null hypothesis? H0 : 𝝻t = 𝝻c 𝝻t : mean test 𝝻c : mean control difference in means
  43. 43. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ One-sided or two-sided? H1 : 𝝻t > 𝝻c (one-sided) directional H1 : 𝝻t ≠ 𝝻c (two-sided) Two-sided tends to be the best option 𝝻t : mean test 𝝻c : mean control
  44. 44. A/B testing © 2020 ThoughtWorks Power, significance level & confidence level
  45. 45. A/B testing © 2020 ThoughtWorks Power of a test: the probability of finding an effect when it is really there. It is the inverse of the type II error (false negatives) Source: https://towardsdatascience.com/a-guide-for-selecting-an-appropriate-metric-for-your-a-b-test-9068cccb7fb Typical value is 80% (a convention) Power Chance to miss a true effect Sample size Power, significance level & confidence level
  46. 46. A/B testing © 2020 ThoughtWorks Source: https://www.youtube.com/watch?v=CSBCKVQLf8c Our study Effect present Effect absent Real World Effect present Reject H0 Type II error (miss) Effect absent Type I error (false alarm) Reject H1 Type II error : probability to miss an effect that is really there (the odds to not detect it)
  47. 47. A/B testing © 2020 ThoughtWorks Source: https://www.youtube.com/watch?v=CSBCKVQLf8c Our study Effect present Effect absent Real World Effect present Reject H0 (power 1-𝛃) Type II error (miss) ( 𝛃 risk) Effect absent Type I error (false alarm) Reject H1 Type II error : miss -> probability less than 20% (𝛃) Power is 1-𝛃 -> 80% Power Chance to miss a true effect Sample size
  48. 48. A/B testing © 2020 ThoughtWorks Source: https://towardsdatascience.com/a-guide-for-selecting-an-appropriate-metric-for-your-a-b-test-9068cccb7fb Typical value is 95% (a convention) Significance level (𝛂): the probability of detecting an effect that is really not there Power, significance level & confidence level
  49. 49. A/B testing © 2020 ThoughtWorks Source: https://www.youtube.com/watch?v=CSBCKVQLf8c Type I error : false alarm -> probability less than 5% (𝛂) Confidence level is 1- 𝛂 : 95% Significance level 𝛂 related to p-value: 𝛂 > p-value Our study Effect present Effect absent Real World Effect present Reject H0 Type II error (miss) Effect absent Type I error (false alarm) (𝛂 risk) Reject H1
  50. 50. A/B testing © 2020 ThoughtWorks Confidence level: the inverse of the significance level. The probability that the value of a parameter falls within a specified range of values Source: https://towardsdatascience.com/a-guide-for-selecting-an-appropriate-metric-for-your-a-b-test-9068cccb7fb Typical value is 95% (a convention) Significance level (𝛂) Confidence level Sample size (significance level 𝛂 tells you about the probability that the effect you found was just chance; 𝛂 > p-value) Power, significance level & confidence level Significance level ~ 0.05 (5%) P-value < 0.05
  51. 51. A/B testing © 2020 ThoughtWorks Source: https://abtestguide.com/abtestsize/ Meaningful for your business Power and confidence level influence your sample size and the probability of finding a true effect
  52. 52. A/B testing © 2020 ThoughtWorks High traffic Low traffic Important points Choose KPIs wisely, low effect size Choose KPIs with high increase (large effect size)
  53. 53. A/B testing © 2020 ThoughtWorks High traffic Low traffic Choose KPIs wisely, low effect size Choose KPIs with high increase (large effect size) Important points +5% +0.5%
  54. 54. A/B testing © 2020 ThoughtWorks High traffic Low traffic Choose KPIs wisely, low effect size Accuracy, minimise risk Choose KPIs with high increase (large effect size) Important points
  55. 55. A/B testing © 2020 ThoughtWorks High traffic Low traffic Choose KPIs wisely, low effect size Preferably AB but also MVT Choose KPIs with high increase (large effect size) AB Run Qualitative tests Never stop an experiment before time even if you “find” significant results (danger! False positives raising!) Source: https://www.evanmiller.org/how-not-to-run-an-ab-test.html https://vwo.com/blog/ab-split-testing-low-traffic-sites/ Important points
  56. 56. Before development © 2020 ThoughtWorks
  57. 57. Focus Group Survey © 2020 ThoughtWorks Delivery Pipeline Feature Toggle Shadow TrafficFocus Group Survey Visual Report Iterative and Incremental development Independent Teams A/B Test
  58. 58. Focus Group Survey © 2020 ThoughtWorks Delivery Pipeline Feature Toggle Shadow TrafficFocus Group Survey Visual Report Iterative and Incremental development Independent Teams What is it Study using inferential statistics to verify an hypothesis. When As part of the discovery of a feature, during development Why Short feedback loops Data-driven decisions Caution! You need experience designing and analysing statistical tests.
  59. 59. The shopteaser survey © 2020 ThoughtWorks Focus Group Survey
  60. 60. Focus Group Survey © 2020 ThoughtWorks Stronglydisagree Disagree Neutral Agree Stronglyagree Likert Scale [categorical variable] The shopteaser survey Your research question will drive the design of the experiment and also the analysis of your data trial trial trial trial
  61. 61. Focus Group Survey © 2020 ThoughtWorks Stronglydisagree Disagree Neutral Agree Stronglyagree Likert Scale The shopteaser survey trial trial trial Things that could go wrong: - Familiarity bias Methodology examples: - Gave 5s per trial so the answers would be spontaneous - The first trials were discarded [categorical variable that can be transformed to continuous - scale 1-5]
  62. 62. During the design phase we also took into account: ● Collect demographic data: there is no such thing as enough data ● Collect feedback at the end of the survey: did they understand the task, did something go wrong? ● Make clear instructions: if you are not there, they cannot ask and will “assume” © 2020 ThoughtWorks Focus Group Survey The shopteaser survey
  63. 63. Insights from a focus group The shopteaser survey © 2020 ThoughtWorks selectedmanual
  64. 64. Lab test © 2020 ThoughtWorks Delivery Pipeline Feature Toggle Shadow Traffic Lab Test Focus Group Survey Visual Report A/B Test Iterative and Incremental development Independent Teams
  65. 65. UX designers test the design and usability of a feature on a test group. ● Small group of people in-person (~5-10 pp) ● Web-based testing of users remote ● Qualitative questions ○ e.g. did you like it? Was it easy to find? UX Lab tests © 2020 ThoughtWorks
  66. 66. Wrapping up © 2020 ThoughtWorks
  67. 67. PO Delivery Pipeline Feature Toggle Shadow TrafficLab Test Focus Group Survey Visual Report A/B Test Techniques for faster and better decisions Iterative and Incremental development In- dependent Team
  68. 68. When is your next release? Could it be earlier? Do you have a solid hypothesis and measurable KPIs for it? Which measurements could you be using instead of assuming the user’s preference? Which of your meetings in the next 2 weeks could be replaced by a lean experiment? © 2020 ThoughtWorks
  69. 69. Thank you Irene Torres Klaus Fleerkötter © 2020 ThoughtWorks
  70. 70. Questions? © 2020 ThoughtWorks #talk5-when-in-doubt-go-live Irene Torres Klaus Fleerkötter

×