I want to share with you some of the tools we’ve been using at the Reporter for computer-assisted projects, all of which are very accessible, easy to use, free and, for us, are making an impact with how we work with information. For Where’s the Money, Corey Pein had decided he wanted to do basically a Forbes style piece, who are the richest people in Santa Fe. And he actually talked to someone at Forbes to find out how they do it, and it’s basically a guestimation based on a lot of indicators, like land and home ownership, charitable donations, political contributions. By the time he was done, he had amassed a lot of spreadsheets, because you really can’t do a story like this without working in spreadsheets, you have to be able to keep track of the information, in some cases you’re going to need to do calculations. When he was done, we wanted to be able to present not just the results, but share some of that reporting with our readers. So we used a site that we use quite often, called Socrata. Go to socrata
http://www.socrata.com first line linkable
I know. I don’t actually love working with spreadsheets either, but there are a lot of things you will not ever really do very well if you can’t handle working with them. Campaign finance spending, or doing any sort of enterprise reporting involving lists or money or anything you want to be able to sort, you have to suck it up sometimes and working with them. We go back and forth between excel and google spreadsheets, and lately more google spreadsheets because we’ve been playing around with googlegadgets,which I’ll talk about in a minute. But first, a few resources.
One of the things that’s really tedious about spreadsheets, especially when you download a spreadsheets, let’s say with political contributions. We had this a lot this year, that our Secretary of State’s website, you can download contributions in excel, but the fields are a nightmare, it’s just a total mess, and if you want to work with it, you spend as much time cleaning up the spreadsheet as you would just looking at the data. This is where Magic/Replace comes in. And I don’t know of another site that is like this, but there may be others, but this one has been consistently really helpful. I’m just going to show you how it works.
Google, evil, evil google. Despite being evil, google does have a lot of useful tools that we’re still playing around with, but once you’ve actually put together a google spreadsheet, it’s really easy to use some of their gadgets to see if the different visualizations they offer help you interrogate your information a little bit. From regular charts to motion charts to pivot tables. And that’s just, the tip of the iceberg. Alexa did a heat chart this week that tracked the changes in election spending, using a google gadget.
There are so many mapping tools out there right now, that I just chose a few that are super easy and that I think can have value added purpose for the kinds of stories our papers often do, and the kinds of stories we should do. There’s going to be a lot of new census information this year and there are a lot of mapping tools out there that can be used in conjunction with census data to really mine that information for stories and presentations. But the census is its own conversation, so let’s talk about parks instead.
State of Play is a story Alexa did in thefall.it was actually a story idea generated at the Toronto AAN convention of going out and rating all your city’s playgrounds and seeing where the money is being spent and which ones are in disarray and that sort of thing. And Alexa did that and put together the information about how all the money gets spent in the old wealthy neighborhoods where there aren’t really any kids, and less on the other side of town.
Oil vs. Real Estate, was a story we did over the summer, looking at where the money was coming from in our gubernatorial race, and we put some of that information into socrata spread sheets, but we also wanted to map it and see where it was coming from, and so we used batch geo, which will take your spread sheet and make an interactive google map for you. Now, if you are already comfortable batching data through google using the api, then you don’t need this, but if you’re not, this is idiot proof.
Document Cloud is a project funded by Knight News Challenge, and it was dreamed up by some journalists from propublica and The New York Times. And it’s for journalists, it’s a primary source repisotry where you can put your documents, annotate them, edit them and publish them. If you’re on the back end it’s pretty cool because it creates really a compendium of your documents.
So, what is data scraping. It’s basically writing code to pull information off of websites. It’s the reason I have 1,000 emails a day trying to sell me viagra. It’s super annoying. But for journalists, it’s an emergent field because although there is a lot of information on the internet, sometimes it’s not just there to download, or you want to be able to get. More advanced journalists with coding skills are spending a fair amount of time writing code to scrape jail websites, and auto scraping to get their information. I’ve included some information that will give you an introduction to that concept, but if this is brand new, I’m going to suggest you start with Outwit. Outwit is a firefoxextenson that can make your work on the internet a lot easier. Scraperwiki is a site where you can build your own, but you can also put out requests for particular scrapers. Demo data scraper on outwit of SOS site; pull lobbyist/organizations.
Many Eyes application, link on Visualize This
Links to Many Eyes presentation on http://www.sfreporter.com