From a presentation I gave to the inaugural meeting of the Hacks & Hackers Ottawa chapter. It's a general survey on data journalism (nee computer-assisted reporting).
6. useful data (now) • tweets • geo-tagged images • foursquare updates • facebook status updates anything online
7.
8.
9.
10. why use data? • no more “according to…” state as fact, don’t attribute. • uncovers stories even the subjects don’t know. • confirms the obvious; reveals the unexpected.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33. other ideas • overpass inspection records • mayoralty campaign donors • ambulance response times • health/safety reports in city-run apartments • complaints against taxi drivers.
34. other ideas • single men/women by census tract • day most marriage licenses • pets by postal code
35. data sources • inspection reports • complaints • incidents reports • discipline records • registrations and licenses check reporting requirements.
36. data journalism inverted pyramid of aggravation obtaining data (hard) formatting data analyzing data reporting writing (easy)
37. where to get data • ask for it • download it • scrape it * • build from documents • FOI or ATIP
38. finding the story • think vertically • look at columns • cross-tab columns • chart over time • look for patterns • dig down from data
39.
40. hacks are good at • discerning news from info • interviewing subjects • providing context • writing • offering a big platform
41. hackers are good at • obtaining data • processing it • analyzing it • building better platforms to present it
42.
Notes de l'éditeur
Predates use of the internet in newsrooms. Everyone who does a Google search is computer assisted. We are using electronic data to do reporting. Convention sources were…. First-hand descrpition, interview based, document based
CAR is a misnomer. Predates use of the internet in newsrooms. Everyone who does a Google search is computer assisted. We are using electronic data to do reporting.
CAR is a misnomer. Predates use of the internet in newsrooms. Everyone who does a Google search is computer assisted. We are using electronic data to do reporting.
- Just about anything large organizations (and governments) do gets put into a database somewhere. Email is a form of database. Maps are graphic representations of data. Census stuff.
- Just about anything large organizations (and governments) do gets put into a database somewhere. Email is a form of database. Maps are graphic representations of data. Census stuff.
Each line is a separate record that represents a single gun somewhere in Canada. Record level data. Not summary data. Summary data would be “21 per cent of handguns in Canada are registered in Quebec.”
Shows the file number of a complaint. Which are most interesting? Most serious?
Govt. contracts by department and date. Party column was added after the fact.
“ Ottawa Police solve fewer murders of women than police forces in eight other major Canadian cities, a Citizen analysis shows.” Parking meter on Lisgar most ticketed in the city. Lots of tickets in Byward Market; but Lynda Lane close.
Low income people twice as likely to live near lotto dealers. Neighbourhood with most outlets per capita. How to do this story?
Black boys twice as likely to be suspended than white boys. More interesting lede?
Data works great on web. Lets readers drill down themselves so you don’t have to.
- Ledes with compelling story; data hit comes later.
Reporter’s process on this? Complaint from a person, idea, data, analysis, back to a person with the same problem. Sources?
Everyone fills up with gas. Water-cooler value. The numbers behind the mundane things we do every day. Lots of papers have done this story. Nice simple lede. Sources?
More negative comments than any other story. (I’ve written about gun control and abortion)
More negative comments than any other story. (I’ve written about gun control and abortion)
Texas Assessment of Knowledge and Skills. We publish results of standardized testing every year. Imagine if we could show which schools cheat the most? Multiple choice data easy to computer capture.
Call this guy, asking him about Ontario standardized tests. Privacy concerns easy to get around.
Huge amount of donations from two addresses. Takes abstract idea like campaign finance and reduces it down to bricks-and-mortar.
Quantifying something most people have had happen to them. Obvious: theives hit parking lots. Less obvious: South Iowa Street is the hot zone.
Google Mash-up. Can do with Platial or code it yourself. Country club vs public course.
Google Mash-up. Can do with Platial or code it yourself. Country club vs public course.
Guy who issued more tickets than anyone else. Crunch the numbers, then go find him and talk to him.
List of parking officers names. Nearly got it wrong. Panic two days before publication because Raine ranked second. Two guys named Charbonneau. Figured out with badge numbers.
Whenver possible, put your data on the map. Confirms obvious: parking in the Market, Lynda Lane.
Whenver possible, put your data on the map. Confirms obvious: parking in the Market, Lynda Lane.
Whenver possible, put your data on the map. Confirms obvious: parking in the Market, Lynda Lane.
Whenver possible, put your data on the map. Confirms obvious: parking in the Market, Lynda Lane.
Look for data stories behind the A1 or C1 news hit.
Doesn’t have to be death and despair.
Government logs everything. WHEREVER THERE IS A FORM!!!! Professional bodies… lawyers, nurses, dentists, morticians. Story about veterinarian who gave too much laughing gas to a Norwegian Blue parrot. MOVE BEYOND OPEN DATA SITES
Hard stuff at the beginning. Get many irons in the fire.