SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
Lies, Damned Lies & Dataviz
Bad visualization, and how to avoid it
Dr. Andrew Clegg
Director, Learner Analytics & Data Science
Pearson
@andrew_clegg
Part I — Why Visualize?
What are the benefits — when it’s done right?
Part II — Bad Dataviz
How to spot the failures — and how to avoid them yourself
Warning: Contains Opinion!
Introduction
Part I — Why Visualize?
● Summarizing and communicating numbers
● Drawing attention to trends and patterns
● Exploring data interactively
● Capturing attention
● Telling stories
What is the goal?
Playing to your neural hardware’s strengths
Your visual system excels at pattern detection & parallel processing.
Representing data graphically means you can leverage this “for free”.
How does visualization help?
Challenge: estimate x when y = 0
x y x y x y
27.38 24.05 32.31 31.61 75.67 14.83
62.64 7.31 51.84 28.61 34.23 31.65
50.76 16.30 59.04 18.29 51.21 7.69
42.94 26.78 74.63 1.15 47.26 22.90
8.72 42.35 56.15 11.37 66.60 3.21
30.62 30.87 47.23 19.49 17.46 40.31
62.63 9.14 59.36 8.82 65.70 12.79
63.21 18.66 44.58 19.12 52.24 12.92
40.49 23.29 47.85 20.55 62.56 14.17
22.07 41.46 68.21 11.99 40.43 19.77
Challenge: estimate x when y = 0
Challenge: estimate x when y = 0
Challenge: find most similar data point
x y x y x y
54.88 71.52 97.86 79.92 35.95 43.70
60.28 54.49 46.15 78.05 69.76 6.02
42.37 64.59 11.83 63.99 66.68 67.06
43.76 89.18 14.34 94.47 21.04 12.89
96.37 38.34 52.18 41.47 31.54 36.37
79.17 52.89 26.46 77.42 57.02 43.86
56.80 92.56 45.62 56.84 98.84 10.20
7.10 8.71 1.88 61.76 20.89 16.13
2.02 83.26 61.21 61.69 65.31 25.33
77.82 87.00 94.37 68.18 46.63 24.44
Challenge: find most similar data point
Challenge: find the outlier
x y x y x y
54.88 71.52 97.86 79.92 35.95 43.70
60.28 54.49 46.15 78.05 69.76 6.02
42.37 64.59 11.83 63.99 66.68 67.06
43.76 89.18 14.34 94.47 21.04 12.89
96.37 38.34 52.18 41.47 31.54 36.37
79.17 52.89 26.46 77.42 57.02 43.86
56.80 92.56 45.62 56.84 98.84 10.20
7.10 8.71 1.88 61.76 20.89 16.13
2.02 83.26 61.21 61.69 65.31 25.33
77.82 87.00 94.37 68.18 46.63 24.44
Challenge: find the outlier
Avoiding limitations of statistics
Showing patterns in large data sets with minimal information loss.
Revealing structure of “tricky” data sets where typical summary
statistics do a poor job.
How does visualization help?
Showing patterns in large data sets
https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
Describing statistically tricky data
http://www.stanford.edu/~mwaskom/software/seaborn/examples/anscombes_quartet.html
All four have the
same:
mean(x)
variance(x)
mean(y)
variance(y)
correlation coefficient
regression coefficients
Anscombe’s Quartet
(Francis Anscombe, 1973)
Describing statistically tricky data
Much web data,
especially involving
human preferences or
choices, looks like this.
There is no “central
tendency” so typical
descriptive statistics are
useless.
Zipfian distribution,
an example of a
power law.
How does visualization help?
Illustrating a story
Visualizations are often used simply to clarify or reinforce the main
points of a story, narrative or message.
This process fails when the conclusions suggested by the graphic are
irrelevant to the narrative, or even contradict it.
It can also fail when the graphic has no clear message or multiple
conflicting interpretations, or is largely incomprehensible.
Many of the following examples illustrate these mistakes.
Part II — Bad Dataviz
1. Axes of evil
Bad dataviz
http://fluffware.tumblr.com/post/4580822773/axes
http://www.google.co.uk/trends/explore#q=%22data science%22
Unlabelled axes
Firearms (skjutvapen) seizures report: http://bit.ly/1dHnFzC (PDF) via Junk Charts
Axis scale manipulation
https://twitter.com/jk_keller/status/410498080765919232/photo/1
Axis scale manipulation (totally shameless version)
Version published by Reuters Version “fixed” by @jk_keller
Example from Stephen Few (PDF)
Dual axes: caution
Natural interpretation:
Units sold “dipped below”
revenue (A) and is now
“catching up” (B).
But these impressions are
meaningless.
They are just artefacts of the
chosen axis scales.
A
B
Proportionality errors
From an Australian document found at The Guardian
1 row of people = roughly 43,000 nurses.
10 rows = roughly 48,000 nurses.
?!?
Cheating outright?
All found via The Guardian
Quick quiz: what happened in 2005?
Axis inversion: when “down” means “up”?!?
From Thomson Reuters via Business Insider
Version published by Reuters Version “fixed” by @PFedewa
Bad dataviz
2. Distance vs. area vs. volume
http://muhammadfamizwanabdullah.blogspot.co.uk/2010/11/10-introduction-of-teaching-volume-of.html
Pie charts: avoid
Bad
Colours used for separating slices, so can’t
easily be put to another use.
No way to show time dimension statically.
Comparing relative sizes of slices is hard.
Doing it in 3D is harder. Perspective inflates
nearer slices, and the similar volume of the
objects is a red herring.
Doing it with deep, discontinuous 3D objects
is even harder.
Worse
Worst
Perhaps justifiable (in 2D) if numbers are sufficiently different.
Otherwise, use a much simpler design and avoid all those problems.
Pie charts: avoid
Pie chart horrors
http://junkcharts.typepad.com/junk_charts/2014/03/two-charts-that-fail-self-sufficiency.html
Pie charts are
supposed to show
proportions of a
whole.
People expect the %s
to add up to 100%.
This one shows
proportions of
separate quantities.
Pie chart horrors
From a World Bank report (PDF) found at The Guardian
These ones show 96%
and 40% as full circles.
This one is falling apart.
This one thinks 76% is
less than three quarters.
Even worse uses of 3D
https://www.tableausoftware.com/public/blog/2011/01/viz-wiz-1-11
and http://www.simplexnumerica.com/Gallery/gallery_pyramid.html
Cones, pyramids, spheres etc…
Are we comparing width, height,
area or volume? Nobody knows!
26.76% = tiny peak
23.32% = massive slab
?!?
Stacked charts: caution
Stacked charts show how
a data series breaks
down by another
attribute of the data.
But people often misread
these as two distinct data
series, reading off a
separate y-axis value for
each one.
Bubble charts: avoid
http://commons.wikimedia.org/wiki/File:Bubble_Chart_Chicago_Deposit_Market.jpg
Bubble charts: avoid
http://bit.ly/1okS3nE and http://bit.ly/1hdZQtO
Bad dataviz
3. Bad maps
http://xkcd.com/1138/
Non-normalized quantities are useless
http://personal.frostburg.edu/jibandy0/starbucks%20map.jpg
Don’t use absolute
values without a very
good reason.
Normalize appropriately:
per capita, per adult, per
student, per household,
per square km, per
journey, per voter …
Remember: geopolitical boundaries are artificial
This map shows all the
countries I’ve visited.
The relative size of USA
makes me seem much
more widely travelled
than I really am.
Is “country” the right
level of aggregation?
Remember: map projections lie
http://en.wikipedia.org/wiki/File:Hobo%E2%80%93Dyer_projection_SW.jpg
http://en.wikipedia.org/wiki/File:Choropleth-density.png
Consider using fixed-size bins
http://bit.ly/O9EPta
Drawbacks of maps
● Can’t easily show time dimension, without animation
● Hard to show multiple attributes of data at once
● Physical proximity can obscure demographic/cultural differences,
and vice versa
Just because you can map the data, doesn’t mean you should.
Save maps for when geographical trends are the key focus.
4. Colour choice
Bad dataviz
Good colour palettes from RColorBrewer
Sequential data
http://mapsdeguap.blogspot.co.uk/2012/04/choropleth-map.html
Use a smooth transition
from min to max.
Don’t “cycle” more than
once.
This map goes purple-
green twice.
A better choice would be:
Diverging data
http://www-03.ibm.com/press/us/en/pressrelease/35359.wss
Here the yellow section indicates the median.
Red/green = above/below median.
However, the red and green ranges are not scaled
well. 75 (close to median) is almost the same
colour as 108 (max).
Sequential data, but with a
well-defined midpoint.
Two directions from this
midpoint -- two poles:
above/below average,
positive/negative, female/male,
Democrat/Republican etc.
Categorical data
Also known as nominal or qualitative.
Colours should not form a pattern, as this
can imply a false relationship.
The ethnicity colours here are reasonable,
although quite close in colour space.
The location colours are badly chosen.
They suggest a linear progression, which
is meaningless.
http://www.visualizing.org/full-screen/10886
Consistency
Don’t do this.
http://www.raterush.com/pages/digg-reddit
Other considerations
● Colour blindness -- nearly 10% of men -- rare in women
● Print and photocopy friendliness
● Characteristics of different screens, esp. projectors
ColorBrewer is a great help:
See also…
● brewer2mpl (Python)
● RColorBrewer (R)
● ColorBrewer (Matlab)
http://colorbrewer2.org/
Bad dataviz
5. Correlation vs. causation
http://xkcd.com/552/
Beware of bogus correlations
http://gizmodo.com/5977989/internet-explorer-vs-murder-rate-will-be-your-favorite-chart-today/
and http://pubs.acs.org/doi/abs/10.1021/ci700332k
Correlation does not prove causation, even with a good R2
score.
Beware of bogus correlations
Even respectable journals
sometimes get carried away.
Ask yourself:
Are these both effects of a
common cause?
Or just sheer chance?
(Multiple comparisons)
http://www.nejm.org/doi/full/10.1056/NEJMon1211064
Bad dataviz
6. Trying to say too much
Each visualization needs a clear purpose. But some designers and
analysts try to include every possible piece of information.
This is not a good idea.
Unnecessary detail and ostentatiously “clever” presentation can
obscure the real message.
http://bit.ly/1gVzeUe
Don’t do this.
7. Tips for developing a critical eye
Here are some techniques you can use for critical analysis.
They are often subjective, debatable, context-dependent and partly
based on aesthetics… So don’t expect absolute rules.
Bad dataviz
Usability
Does the chart need detailed instructions in order for it to be
comprehensible and usable?
● Acceptable if this is a standard visualization method used in a
particular domain
● Less acceptable if this is a one-off for general consumption
First impressions test
What is the first thing you infer from looking at the visualization?
(Don’t stop to read every detail -- see what you get from a glance.)
Does this impression prove to be accurate,
on closer inspection?
If not, then there may be a problem.
Many people will only glance and never
perform the close inspection.
Return on effort (Kaiser Fung)
http://bit.ly/1dKewGo
Self-sufficiency test (Kaiser Fung)
Would the chart make sense without the numbers printed on each
data point?
If not, the chart has failed
the self-sufficiency test.
http://junkcharts.typepad.com/junk_charts/2013/03/blowing-the-whistle-at-bubble-charts.html
Trifecta checkup (Kaiser Fung)
Ask the following:
● What practical question does the graphic
attempt to address?
● What answer does the data imply?
● What answer does the graphic imply?
Can you answer these clearly?
Do the three answers align?
If not, there is something wrong.
http://junkcharts.typepad.com/junk_charts/2014/02/pets-may-need-shelter-from-this-terrible-chart.html
Data-ink score (Edward Tufte)
Main principle: Remove redundant or uninformative elements from
the design, to reduce distraction. High data-ink ratio = clarity.
http://www.infovis-wiki.net/index.php/Data-Ink_Ratio
And finally…
Ask yourself how much you trust the data.
Professional presentation does not imply reliable numbers.
Is there enough data to be sure of statistical significance?
What are the margins of error?
Is there a plausible mechanism of action?
What about sources of bias (accidental or intentional), confounding
factors, missing data, or measurement error (noise)?
Thank you!
http://www.makefive.com/categories/entertainment/other/pie-charts-that-explain-simple-material/percentage-of-chart-which-resembles-pac-man

Contenu connexe

Similaire à Lies, damned lies & dataviz

Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxVishalLabde
 
A practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningA practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningBruno Gonçalves
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)Duke Network Analysis Center
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Chris Rackauckas
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Computer Vision Course includes deep learning
Computer Vision Course includes deep learningComputer Vision Course includes deep learning
Computer Vision Course includes deep learninggigap29589
 
13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solutionKrunal Shah
 
Side Channel Analysis: Practice and a Bit of Theory
Side Channel Analysis: Practice and a Bit of TheorySide Channel Analysis: Practice and a Bit of Theory
Side Channel Analysis: Practice and a Bit of TheoryPositive Hack Days
 
"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural NetKen'ichi Matsui
 
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...Fabricio de França
 
4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...
4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...
4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...Wendy Berg
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...Jörg Bächtiger
 
Information Visualization for Medical Informatics
Information Visualization for Medical Informatics Information Visualization for Medical Informatics
Information Visualization for Medical Informatics University of Maryland
 
Steam presentation deux 3 d prints from photographs
Steam presentation deux  3 d prints from photographsSteam presentation deux  3 d prints from photographs
Steam presentation deux 3 d prints from photographsScott Eastellerson
 

Similaire à Lies, damned lies & dataviz (20)

Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
 
A practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningA practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) Learning
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Computer Vision Course includes deep learning
Computer Vision Course includes deep learningComputer Vision Course includes deep learning
Computer Vision Course includes deep learning
 
13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solution
 
Side Channel Analysis: Practice and a Bit of Theory
Side Channel Analysis: Practice and a Bit of TheorySide Channel Analysis: Practice and a Bit of Theory
Side Channel Analysis: Practice and a Bit of Theory
 
Image generative modeling for design inspiration and image editing by Camille...
Image generative modeling for design inspiration and image editing by Camille...Image generative modeling for design inspiration and image editing by Camille...
Image generative modeling for design inspiration and image editing by Camille...
 
"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net
 
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
An Artificial Immune Network for Multimodal Function Optimization on Dynamic ...
 
4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...
4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...
4 Solutions To Exercises 4.1 About These Solutions 4.2 Using The Table Of Ran...
 
SOLUTION.PDF
SOLUTION.PDFSOLUTION.PDF
SOLUTION.PDF
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
 
Information Visualization for Medical Informatics
Information Visualization for Medical Informatics Information Visualization for Medical Informatics
Information Visualization for Medical Informatics
 
Steam presentation deux 3 d prints from photographs
Steam presentation deux  3 d prints from photographsSteam presentation deux  3 d prints from photographs
Steam presentation deux 3 d prints from photographs
 

Dernier

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Dernier (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Lies, damned lies & dataviz

  • 1.
  • 2. Lies, Damned Lies & Dataviz Bad visualization, and how to avoid it Dr. Andrew Clegg Director, Learner Analytics & Data Science Pearson @andrew_clegg
  • 3. Part I — Why Visualize? What are the benefits — when it’s done right? Part II — Bad Dataviz How to spot the failures — and how to avoid them yourself Warning: Contains Opinion! Introduction
  • 4. Part I — Why Visualize?
  • 5. ● Summarizing and communicating numbers ● Drawing attention to trends and patterns ● Exploring data interactively ● Capturing attention ● Telling stories What is the goal?
  • 6. Playing to your neural hardware’s strengths Your visual system excels at pattern detection & parallel processing. Representing data graphically means you can leverage this “for free”. How does visualization help?
  • 7. Challenge: estimate x when y = 0 x y x y x y 27.38 24.05 32.31 31.61 75.67 14.83 62.64 7.31 51.84 28.61 34.23 31.65 50.76 16.30 59.04 18.29 51.21 7.69 42.94 26.78 74.63 1.15 47.26 22.90 8.72 42.35 56.15 11.37 66.60 3.21 30.62 30.87 47.23 19.49 17.46 40.31 62.63 9.14 59.36 8.82 65.70 12.79 63.21 18.66 44.58 19.12 52.24 12.92 40.49 23.29 47.85 20.55 62.56 14.17 22.07 41.46 68.21 11.99 40.43 19.77
  • 10. Challenge: find most similar data point x y x y x y 54.88 71.52 97.86 79.92 35.95 43.70 60.28 54.49 46.15 78.05 69.76 6.02 42.37 64.59 11.83 63.99 66.68 67.06 43.76 89.18 14.34 94.47 21.04 12.89 96.37 38.34 52.18 41.47 31.54 36.37 79.17 52.89 26.46 77.42 57.02 43.86 56.80 92.56 45.62 56.84 98.84 10.20 7.10 8.71 1.88 61.76 20.89 16.13 2.02 83.26 61.21 61.69 65.31 25.33 77.82 87.00 94.37 68.18 46.63 24.44
  • 11. Challenge: find most similar data point
  • 12. Challenge: find the outlier x y x y x y 54.88 71.52 97.86 79.92 35.95 43.70 60.28 54.49 46.15 78.05 69.76 6.02 42.37 64.59 11.83 63.99 66.68 67.06 43.76 89.18 14.34 94.47 21.04 12.89 96.37 38.34 52.18 41.47 31.54 36.37 79.17 52.89 26.46 77.42 57.02 43.86 56.80 92.56 45.62 56.84 98.84 10.20 7.10 8.71 1.88 61.76 20.89 16.13 2.02 83.26 61.21 61.69 65.31 25.33 77.82 87.00 94.37 68.18 46.63 24.44
  • 14. Avoiding limitations of statistics Showing patterns in large data sets with minimal information loss. Revealing structure of “tricky” data sets where typical summary statistics do a poor job. How does visualization help?
  • 15. Showing patterns in large data sets https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
  • 16. Describing statistically tricky data http://www.stanford.edu/~mwaskom/software/seaborn/examples/anscombes_quartet.html All four have the same: mean(x) variance(x) mean(y) variance(y) correlation coefficient regression coefficients Anscombe’s Quartet (Francis Anscombe, 1973)
  • 17. Describing statistically tricky data Much web data, especially involving human preferences or choices, looks like this. There is no “central tendency” so typical descriptive statistics are useless. Zipfian distribution, an example of a power law.
  • 18. How does visualization help? Illustrating a story Visualizations are often used simply to clarify or reinforce the main points of a story, narrative or message. This process fails when the conclusions suggested by the graphic are irrelevant to the narrative, or even contradict it. It can also fail when the graphic has no clear message or multiple conflicting interpretations, or is largely incomprehensible. Many of the following examples illustrate these mistakes.
  • 19. Part II — Bad Dataviz
  • 20. 1. Axes of evil Bad dataviz http://fluffware.tumblr.com/post/4580822773/axes
  • 22. Firearms (skjutvapen) seizures report: http://bit.ly/1dHnFzC (PDF) via Junk Charts Axis scale manipulation
  • 23. https://twitter.com/jk_keller/status/410498080765919232/photo/1 Axis scale manipulation (totally shameless version) Version published by Reuters Version “fixed” by @jk_keller
  • 24. Example from Stephen Few (PDF) Dual axes: caution Natural interpretation: Units sold “dipped below” revenue (A) and is now “catching up” (B). But these impressions are meaningless. They are just artefacts of the chosen axis scales. A B
  • 25. Proportionality errors From an Australian document found at The Guardian 1 row of people = roughly 43,000 nurses. 10 rows = roughly 48,000 nurses. ?!?
  • 26. Cheating outright? All found via The Guardian
  • 27. Quick quiz: what happened in 2005?
  • 28. Axis inversion: when “down” means “up”?!? From Thomson Reuters via Business Insider Version published by Reuters Version “fixed” by @PFedewa
  • 29. Bad dataviz 2. Distance vs. area vs. volume http://muhammadfamizwanabdullah.blogspot.co.uk/2010/11/10-introduction-of-teaching-volume-of.html
  • 30. Pie charts: avoid Bad Colours used for separating slices, so can’t easily be put to another use. No way to show time dimension statically. Comparing relative sizes of slices is hard. Doing it in 3D is harder. Perspective inflates nearer slices, and the similar volume of the objects is a red herring. Doing it with deep, discontinuous 3D objects is even harder. Worse Worst
  • 31. Perhaps justifiable (in 2D) if numbers are sufficiently different. Otherwise, use a much simpler design and avoid all those problems. Pie charts: avoid
  • 32. Pie chart horrors http://junkcharts.typepad.com/junk_charts/2014/03/two-charts-that-fail-self-sufficiency.html Pie charts are supposed to show proportions of a whole. People expect the %s to add up to 100%. This one shows proportions of separate quantities.
  • 33. Pie chart horrors From a World Bank report (PDF) found at The Guardian These ones show 96% and 40% as full circles. This one is falling apart. This one thinks 76% is less than three quarters.
  • 34. Even worse uses of 3D https://www.tableausoftware.com/public/blog/2011/01/viz-wiz-1-11 and http://www.simplexnumerica.com/Gallery/gallery_pyramid.html Cones, pyramids, spheres etc… Are we comparing width, height, area or volume? Nobody knows! 26.76% = tiny peak 23.32% = massive slab ?!?
  • 35. Stacked charts: caution Stacked charts show how a data series breaks down by another attribute of the data. But people often misread these as two distinct data series, reading off a separate y-axis value for each one.
  • 37. Bubble charts: avoid http://bit.ly/1okS3nE and http://bit.ly/1hdZQtO
  • 38. Bad dataviz 3. Bad maps http://xkcd.com/1138/
  • 39. Non-normalized quantities are useless http://personal.frostburg.edu/jibandy0/starbucks%20map.jpg Don’t use absolute values without a very good reason. Normalize appropriately: per capita, per adult, per student, per household, per square km, per journey, per voter …
  • 40. Remember: geopolitical boundaries are artificial This map shows all the countries I’ve visited. The relative size of USA makes me seem much more widely travelled than I really am. Is “country” the right level of aggregation?
  • 41. Remember: map projections lie http://en.wikipedia.org/wiki/File:Hobo%E2%80%93Dyer_projection_SW.jpg
  • 43. Consider using fixed-size bins http://bit.ly/O9EPta
  • 44. Drawbacks of maps ● Can’t easily show time dimension, without animation ● Hard to show multiple attributes of data at once ● Physical proximity can obscure demographic/cultural differences, and vice versa Just because you can map the data, doesn’t mean you should. Save maps for when geographical trends are the key focus.
  • 45. 4. Colour choice Bad dataviz Good colour palettes from RColorBrewer
  • 46. Sequential data http://mapsdeguap.blogspot.co.uk/2012/04/choropleth-map.html Use a smooth transition from min to max. Don’t “cycle” more than once. This map goes purple- green twice. A better choice would be:
  • 47. Diverging data http://www-03.ibm.com/press/us/en/pressrelease/35359.wss Here the yellow section indicates the median. Red/green = above/below median. However, the red and green ranges are not scaled well. 75 (close to median) is almost the same colour as 108 (max). Sequential data, but with a well-defined midpoint. Two directions from this midpoint -- two poles: above/below average, positive/negative, female/male, Democrat/Republican etc.
  • 48. Categorical data Also known as nominal or qualitative. Colours should not form a pattern, as this can imply a false relationship. The ethnicity colours here are reasonable, although quite close in colour space. The location colours are badly chosen. They suggest a linear progression, which is meaningless. http://www.visualizing.org/full-screen/10886
  • 50. Other considerations ● Colour blindness -- nearly 10% of men -- rare in women ● Print and photocopy friendliness ● Characteristics of different screens, esp. projectors ColorBrewer is a great help: See also… ● brewer2mpl (Python) ● RColorBrewer (R) ● ColorBrewer (Matlab) http://colorbrewer2.org/
  • 51. Bad dataviz 5. Correlation vs. causation http://xkcd.com/552/
  • 52. Beware of bogus correlations http://gizmodo.com/5977989/internet-explorer-vs-murder-rate-will-be-your-favorite-chart-today/ and http://pubs.acs.org/doi/abs/10.1021/ci700332k Correlation does not prove causation, even with a good R2 score.
  • 53. Beware of bogus correlations Even respectable journals sometimes get carried away. Ask yourself: Are these both effects of a common cause? Or just sheer chance? (Multiple comparisons) http://www.nejm.org/doi/full/10.1056/NEJMon1211064
  • 54. Bad dataviz 6. Trying to say too much Each visualization needs a clear purpose. But some designers and analysts try to include every possible piece of information. This is not a good idea. Unnecessary detail and ostentatiously “clever” presentation can obscure the real message.
  • 56. 7. Tips for developing a critical eye Here are some techniques you can use for critical analysis. They are often subjective, debatable, context-dependent and partly based on aesthetics… So don’t expect absolute rules. Bad dataviz
  • 57. Usability Does the chart need detailed instructions in order for it to be comprehensible and usable? ● Acceptable if this is a standard visualization method used in a particular domain ● Less acceptable if this is a one-off for general consumption
  • 58. First impressions test What is the first thing you infer from looking at the visualization? (Don’t stop to read every detail -- see what you get from a glance.) Does this impression prove to be accurate, on closer inspection? If not, then there may be a problem. Many people will only glance and never perform the close inspection.
  • 59. Return on effort (Kaiser Fung) http://bit.ly/1dKewGo
  • 60. Self-sufficiency test (Kaiser Fung) Would the chart make sense without the numbers printed on each data point? If not, the chart has failed the self-sufficiency test. http://junkcharts.typepad.com/junk_charts/2013/03/blowing-the-whistle-at-bubble-charts.html
  • 61. Trifecta checkup (Kaiser Fung) Ask the following: ● What practical question does the graphic attempt to address? ● What answer does the data imply? ● What answer does the graphic imply? Can you answer these clearly? Do the three answers align? If not, there is something wrong. http://junkcharts.typepad.com/junk_charts/2014/02/pets-may-need-shelter-from-this-terrible-chart.html
  • 62. Data-ink score (Edward Tufte) Main principle: Remove redundant or uninformative elements from the design, to reduce distraction. High data-ink ratio = clarity. http://www.infovis-wiki.net/index.php/Data-Ink_Ratio
  • 63. And finally… Ask yourself how much you trust the data. Professional presentation does not imply reliable numbers. Is there enough data to be sure of statistical significance? What are the margins of error? Is there a plausible mechanism of action? What about sources of bias (accidental or intentional), confounding factors, missing data, or measurement error (noise)?