2. About me
● Web technology manager (EEA)
● M.Sc. in Computer Science
(Lund University, SWE)
● Surveyor (ITA)
● 15 years in IT and web development
(programming and project management)
● Junior Researcher: Machine vision for
surveillance cameras at Axis
● E-commerce websites for telecom industry
● Product Owner of DaViz and many powerful Plone Add-ons
● Technical manager for the EEA main portal and CMS
● Data Visualisation, Data Science, Open Data, Statistics, Semantic Web,
Linked Data, Usability and User Experience, Artificial Intelligence,
Agile/Lean management…
demarinis@eea.europa.eu
7. Remove legend when not needed
There is no need to have a legend when there
is only one data category shown. What is
measured can be added to the title or axis.
8. Avoid pie charts and donuts
The human mind thinks linearly: we
can easily compare lengths/heights of
line segments but when it comes to
angles and areas most of us can't
judge them well.
11. Correlation does not imply causation
● see also "Superimposing time series is the biggest source of silly
theories"
Per capita consumption of cheese correlates with number of people who died by becoming tangled in
their bedsheets
13. The map on the right is just trying to
show too much information at once.
Moreover data would be much
easier to compare with a basic bar
chart (below).
14. Difficult to compare bar charts placed on map, since they are not aligned. A bar chart
would make it much easier and precise to compare countries.
15. Countries with relative small area are hidden, countries with large areas are
made more prominent (intentional?). Is country’s area really relevant
here? Is the geo-distribution important? How to compare properly?
16. Colors
● Different colors should be used for
different categories (e.g.,
male/female, types of fruit), not
different values in a range (e.g., age,
temperature).
● Do not use rainbows for range values
● If you want color to show a numerical
value, use a range that goes from
white to a highly saturated color in
one of the universal color categories.
no rainbows
18. Don’t forget 7%-10% of
your male audience
(color deficiency)
what color-deficient people seeoriginal chart
Use Vischeck to test your images. If the chart is
readable in black and white than it is even better!
19. Choose your chart type wisely
Online tools like the Data Visualization
Catalogue or a decision diagram [2006,
A.Abela] helps you finding the right chart for
your data.
20. Data provenance, trust, legitimacy
● Adding data source information helps giving credibility
and trust in your chart
● When adding source info on your chart, distinguish
datasource info from figure source info
● Disclose who financed the data visualisation work and
data collection
● Disclose your data and methodology -> reproducible
and verifiable
22. Show the level of confidence, build trust
Ask these questions before publishing your chart, and be
prepared for the critiques:
1. What was the source of your data?
2. How well do the sample data represent the population?
3. Does your data distribution include outliers? How did they
affect the results?
4. What assumptions are behind your analysis? Might certain
conditions render your assumptions and your model invalid?
5. Why did you decide on that particular analytical approach?
What alternatives did you consider?
6. How likely is it that the independent variables are actually
causing the changes in the dependent variable? Might other
analyses establish causality more clearly?
24. Typical statistical error - EU trends
See online example
It is not statistically correct to make a trend analysis of data across time
when the data in question (or sample) is not representative for the whole.
E.g. EU12 is not representative for EU25 or EU28, therefore the data cannot
be used to state a trend for the entire EU as it is in 2014, EU has changed!
very important info!
25. Typical statistical error - including no data
See online example
We cannot say “20.9% of our colleagues are male”. But we can say “20.9% of the sample
we met are male”, but this is not saying much about the entire population (the entire
staff).
26. Typical statistical error - including no data
See online example
If we have used a proper sampling technique, e.g. randomly selecting the staff, we have a
sample of (580 people) that is representative for the whole (1000 people) with a 95%
confidence level and a margin-error of 2.64%.
We can now say that 39.7% +- 2.64% are male at our work, with a confidence level
of 95%, and that is a big difference to what we said in previous slide (20.9%) !
https://www.checkmarket.com/market-research-resources/sample-size-calculator/
27. Show the level of confidence
Tell your audience how confident you are in your assertions by.
Include error bars any time you use data to make an argument
source: The importance of uncertainty, Berkeley Science
review. http://sciencereview.berkeley.edu/importance-
uncertainty/
28. Get it professionally reviewed
Have a statistician review
your analysis and your
representation. You will
be surprised about how
much corrections and
improvements you can
achieve.
29. Welcome to the data science!
source: http://sciencereview.berkeley.edu/article/first-rule-data-science/
30. I shall not use visualization
to intentionally hide or
confuse the truth which it is
intended to portray. I will
respect the great power
visualization has in garnering
wisdom and misleading the
uninformed. I accept this
responsibility willfully and
without reservation, and
promise to defend this oath
against all enemies, both
domestic and foreign.
hippocratic oath for
data scientists
VisWeek2011, Jason Moore, A code for ethics for data visualisations
professionals