This document contains summaries and links from an online data journalism course. It discusses techniques for cleaning data, visualizing data in charts and maps, using tools like Google Refine and Tableau, and mashing up multiple data sources. The document provides advice on key things to know about each topic and links to online resources for practicing and learning more about data journalism skills.
8. “With the help of just Benford’s law
and data sets to compare he’s
able to demonstrate how the police
are systematically hiding over a
thousand murders a year in a
single state, and that’s just in one
small part of the article”
Monday, 5 March 2012
- Pete Warden
10. 5 things you need to know about
cleaning data
1. Data always needs cleaning up
2. Treat the ‘source’ like a source
3. Use the right ‘average’ and
percentage
4. Watch for changing context: inflation,
boundaries, classification
5. Always work on copies of raw data
Monday, 5 March 2012
12. “What the Independent have done
is confuse the UK’s deficit with our
debt [making] the debt problem
look around eight times worse than
it is. And it used the whole of its
front page to do so.”
- James Ball
Monday, 5 March 2012
14. Question?
A town has two hospitals. Hospital A is
bigger than hospital B. One of them has
a birth rate of 60% boys. Which one is it
more likely to be?
Monday, 5 March 2012
15. Question?
The smaller hospital is more likely to
have a 60% birth rate - larger samples
are more stable.
Monday, 5 March 2012
17. What is the data worth?
Measurement doesn't answer anything if
there's only one variable
Statistical significance
Sample size and selection
Controls and the placebo effect
Regression to the mean
Read up.
Monday, 5 March 2012
18. Getting data ready to answer
questions
Data > Text to columns or =SPLIT
Find & replace
=IF(condition, if met, if not)
=TRIM, =CONCATENATE
=RIGHT, =LEFT, =MID
=REPLACE, =SUBSTITUTE
=LEN
Monday, 5 March 2012
19. Walkthrough: cleaning data in
Google Refine
Edit cells > common transforms
Edit cells > split multi-valued cells
Facet > text facet
Export...
Monday, 5 March 2012
22. 5 things you need to know about
visualising data
1. Choose the chart for the purpose
2. For answers or for story?
3. Good design is when there’s nothing
more to take away
4. It should be self-contained & have refs
5. Be careful with scales and classes
Monday, 5 March 2012
29. Visualisation tools
ManyEyes, Tableau, Number Picture
Wordle, Tagxedo
BatchGeo, FusionTables
Gephi
Delicious.com/paulb/vis+tools
Monday, 5 March 2012
30. Distribution: getting social
Publish embed code & link to data
Have or join a Flickr group for
visualisations, comment on others
Tumblr blog
Digg, Reddit, Stumbleupon
Buzzdata
Monday, 5 March 2012
32. 5 things you need to know about
mashing data
1. It is what a journalist does best
2. Look for a point of connection: place?
Person? Company? Date? Code?
3. Mashups can be live, updated or
static
4. What an API can do
5. What APIs there are
Monday, 5 March 2012
34. Mashup tools
Yahoo! Pipes, xFruits
OpenHeatMap
Mapalist, Maptube, FusionTables
Scraperwiki
Google Refine
Monday, 5 March 2012
35. Walkthrough: grabbing geo data
with Google Refine
Edit column > Add column by fetching
URLs
Use GREL (Google Refine Expression
Language)
Search web for help & examples
Monday, 5 March 2012
38. Lab
Before the lab: play with these
techniques yourself, have problems,
find solutions, raise questions. Install
Google Refine and Tableau on your
laptop to use.
- Visualise, interrogate or mash data
Monday, 5 March 2012
39. Books
Kaiser Fung - Numbers Rule Your World
Ben Goldacre - Bad Science
Donna Wong - The WSJ Guide to
Information Graphics
Brian Suda - A Practical Guide to
Designing with Data
Monday, 5 March 2012