Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Prochain SlideShare
Chargement dans…5
×

# How to use Correlations to find Insights

227 vues

Publié le

Correlation is one of the most widely used statistical techniques in market research and insights and it is something which every researcher should be comfortable with.
In this 30-minute ‘How To’ Webinar, author and NewMR founder Ray Poynter will cover:

- What is correlation?
- What is r-squared?
- When and why should we use correlation?
- How should we use correlation?
- Potential problems with correlation
- Alternatives to correlation

The webinar recording can be accessed via the NewMR Play Again page: https://newmr.org/play-again

Publié dans : Formation
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Soyez le premier à commenter

### How to use Correlations to find Insights

1. 1. How To Use Correlations To Find Stories In The Data May 2019 Ray Poynter NewMR
2. 2. Today’s Plot 1.  What is correlation? 2.  The causes of correlation 3.  Finding stories with correlation 4.  Beyond correlation 5.  Question and Answer
3. 3. Correlation • Measures the linear association between 2 metric scales •  There are correlation measures for nonmetric scales • It produces numbers between -1 and +1 •  Where: +1 is perfect correlation, 0 is no correlation, and -1 is perfect negative correlation • We use the letter r to refer to the most common form of the correlation coefficient • We often square the correlation coefficient to get r‑squared, which we often express as a percentage
4. 4. Perfect Correlation When we drive a car, the amount of fuel in the tank is negatively correlated with how far we have driven – until we add more fuel.
5. 5. Strong Correlation •  Correlation coefficients are not intuitive •  We often use r-squared, r2, the variance •  r-squared is the proportion of the total variance that is shared •  0.9 * 0.9 is 0.81 (often written 81%) •  81% of the variance is shared – and 19% is not
6. 6. A typical strong correlation •  In the real world, 0.7 (or -0.7) is a strong relationship •  r2 for 0.7 (AND -0.7) is 51% •  51% of the total variation is shared (and 49% is not)
7. 7. Notable correlations •  In the real world, 0.5 (or -0.5) might be interesting •  r2 for 0.5 (AND -0.5) is 25% •  25% of the total variation is shared (and 75% is not)
8. 8. No (linear) relationship •  In the real world, 0 is pretty rare •  The process of measurement often creates some correlation •  Selecting questions for a study often implies association
9. 9. Today’s Plot 1.  What is correlation? 2.  The causes of correlation 3.  Finding stories with correlation 4.  Beyond correlation 5.  Question and Answer
10. 10. What ‘causes’ correlation “Correlation does not imply causality” If X is correlated with Y then we can usually say •  X causes Y, or •  Y causes X, or •  X and Y are both caused by a common agent Z, or •  X and Y both cause each other in a feedback system, or •  It is just chance But if X is correlated with Y, we should investigate why
11. 11. Smoking and Lung Cancer • 1950, UK, Richard Doll and Austin Bradford Hill conducted statistical research for the Medical Research Council • Discovered a correlation between the amount of tobacco smoked and lung cancer • Published findings and warnings in British Medical Journal in 1950 • A 1954 British Doctors Study confirmed the correlation, leading to UK Government advice that smoking and Cancer were related • Proof had to wait until the science could show the causative mechanism. Richard Doll Austin Bradford Hill
12. 12. What ‘causes’ correlation “Correlation does not imply causality” If X is correlated with Y then we can usually say •  X causes Y, or •  Y causes X, or •  X and Y are both caused by a common agent Z, or •  X and Y both cause each other in a feedback system, or •  It is just chance But if X is correlated with Y, we should investigate why
13. 13. Lots of spurious correlations http://www.tylervigen.com/spurious-correlations
14. 14. Spurious Correlations r=0.79 r2=63%
15. 15. Today’s Plot 1.  What is correlation? 2.  The causes of correlation 3.  Finding stories with correlation 4.  Beyond correlation 5.  Question and Answer
16. 16. Knowing where to look Correlation can help make the map (the data) more readable
17. 17. Knowing where to look Correlation can help make the map (the data) more readable
18. 18. Satisfaction with features and the link to NPS
19. 19. Correlation between NPS and feature satisfaction NPS Correlation Bed 0.80 Restaurant 0.58 Price 0.39 TV 0.26 Check in 0.43 Check out 0.40 Location 0.62 Internet 0.27 Bathroom 0.31 Mini-bar 0.20 NPS Correlation Bed 0.80 Location 0.62 Restaurant 0.58 Check in 0.43 Check out 0.40 Price 0.39 Bathroom 0.31 Internet 0.27 TV 0.26 Mini-bar 0.20 With story finding, sorting is key
20. 20. The ratios between the features r-squared NPS R-squared Bed 64% Location 39% Restaurant 34% Check in 19% Check out 16% Price 15% Bathroom 10% Internet 7% TV 7% Mini-bar 4%
21. 21. ‘Drivers’ of Choice – Derived Importance High Satisfaction Low Satisfaction High Importance Low Importance Bed Location Restaurant Check In Check out Price Bathroom Internet TV Mini-bar
22. 22. The ratios between the features r-squared NPS R-squared Bed 64% Location 39% Restaurant 34% Check in 19% Check out 16% Price 15% Bathroom 10% Internet 7% TV 7% Mini-bar 4% Add the r-squared values together = 214% Why? Correlations between the scores of the features Multicollinearity
23. 23. Multicollinearity NPS Bed Resta- urant Price TV Check in Check out Loca- tion Inter- net Bath- room Mini bar NPS 1.00 0.80 0.58 0.39 0.26 0.43 0.40 0.62 0.27 0.31 0.20 Bed 0.80 1.00 0.56 0.41 0.21 0.43 0.41 0.41 0.32 0.45 0.19 Restaurant 0.58 0.56 1.00 0.02 0.05 0.52 0.45 0.49 -0.01 0.30 0.22 Price 0.39 0.41 0.02 1.00 0.41 -0.08 0.03 0.08 0.15 0.00 0.03 TV 0.26 0.21 0.05 0.41 1.00 -0.09 0.01 -0.05 0.53 -0.07 0.33 Check in 0.43 0.43 0.52 -0.08 -0.09 1.00 0.71 0.30 0.14 0.16 0.07 Check out 0.40 0.41 0.45 0.03 0.01 0.71 1.00 0.55 0.11 0.23 0.16 Location 0.62 0.41 0.49 0.08 -0.05 0.30 0.55 1.00 0.05 -0.02 0.13 Internet 0.27 0.32 -0.01 0.15 0.53 0.14 0.11 0.05 1.00 -0.19 0.10 Bathroom 0.31 0.45 0.30 0.00 -0.07 0.16 0.23 -0.02 -0.19 1.00 -0.02 Mini-bar 0.20 0.19 0.22 0.03 0.33 0.07 0.16 0.13 0.10 -0.02 1.00
24. 24. Today’s Plot 1.  What is correlation? 2.  The causes of correlation 3.  Finding stories with correlation 4.  Beyond correlation 5.  Question and Answer
25. 25. Factor Analysis Factor Solution Factor 1 Factor 2 Factor 3 Factor 4 Check In 0.87 0.07 0.05 0.08 TV 0.83 -0.01 -0.06 0.02 Check Out 0.71 0.08 -0.01 -0.09 Bed 0.70 0.21 0.04 0.32 Bathroom 0.09 0.98 0.11 -0.04 Mini bar 0.16 0.97 0.05 0.04 Price -0.05 0.28 0.84 -0.14 Restaurant -0.04 -0.05 0.76 0.21 Location 0.19 -0.03 0.62 -0.53 Internet 0.16 -0.02 0.04 0.87 Variance 29% 20% 15% 11% Cumulative Variance 29% 49% 64% 75% NewMR Webinar - Introduction to Factor Analysis https://newmr.org/blog/introduction-to-factor-analysis/
26. 26. Drivers of choice, importance etc • Correlation •  Easy, stable, but less rigourous – a gateway drug for Advanced Analytics • Regression • Shapley Values • Latent Class • Conjoint Analysis • Path Analysis
27. 27. Correlation versus Regression y = 2x + 1 R² = 0.67028 0 5 10 15 20 25 0 1 2 3 4 5 6 7 8 9 Correlation shows the goodness of fit y = 2x + 1 R² = 1 0 5 10 15 20 25 0 1 2 3 4 5 6 7 8 9 Regression shows the scale of the relationship
28. 28. Uses of correlation • To show patterns in the data, associations between items and between attributes • As a measure of importance, attributes that correlate with an outcome (e.g. satisfaction) might be more important • To suggest areas of investigation – the link between smoking and lung cancer started as a correlation, leading to the discovery of the causal link • In the form of r-squared to assess the quality of model – usually in terms of consistency rather than validity