IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I do with it?" Albuquerque, NM Feb 12-13, 2011
1. DATA: Now I’ve got it; what do I do with it? Tom Johnson Managing Director Inst. for Analytic Journalism Santa Fe, New Mexico USA t o m @ j t j o h n s o n . c o m
35. DATA: Now I’ve got it; what do I do with it? Tom Johnson Managing Director Inst. for Analytic Journalism Santa Fe, New Mexico USA t o m @ j t j o h n s o n . c o m Gracias a todos
36.
37.
38.
39.
40.
41.
Notes de l'éditeur
Data In OK, it’s downloaded. Where ya gonna save it? Dropbox, SugarSync , Syncplicity $$, Jungle Disk ($3p/m), Zumodrive (2gb=$3p/m), AeroFS , SpiderOak , MiMedia , Wuala , Quanp , Avoid MS Windows Live, SkyDrive and Mesh – more trouble than they are worth Bookmarks: search on Tucows; Xmarks, Diigo, Goals bookmarks: save on PC, in cloud, sync, export, share Get data as fine-grained; into lowest common denominator
Data In OK, it’s downloaded. Where ya gonna save it? Dropbox, SugarSync , Syncplicity $$, Jungle Disk ($3p/m), Zumodrive (2gb=$3p/m), AeroFS , SpiderOak , MiMedia , Wuala , Quanp , Avoid MS Windows Live, SkyDrive and Mesh – more trouble than they are worth Bookmarks: search on Tucows; Xmarks, Diigo, Goals bookmarks: save on PC, in cloud, sync, export, share Get data as fine-grained; into lowest common denominator
Data In OK, it’s downloaded. Where ya gonna save it? Dropbox, SugarSync , Syncplicity $$, Jungle Disk ($3p/m), Zumodrive (2gb=$3p/m), AeroFS , SpiderOak , MiMedia , Wuala , Quanp , Avoid MS Windows Live, SkyDrive and Mesh – more trouble than they are worth Bookmarks: search on Tucows; Xmarks, Diigo, Goals bookmarks: save on PC, in cloud, sync, export, share Get data as fine-grained; into lowest common denominator
Data In OK, it’s downloaded. Where ya gonna save it? Dropbox, SugarSync , Syncplicity $$, Jungle Disk ($3p/m), Zumodrive (2gb=$3p/m), AeroFS , SpiderOak , MiMedia , Wuala , Quanp , Avoid MS Windows Live, SkyDrive and Mesh – more trouble than they are worth Bookmarks: search on Tucows; Xmarks, Diigo, Goals bookmarks: save on PC, in cloud, sync, export, share Get data as fine-grained; into lowest common denominator
Data In OK, it’s downloaded. Where ya gonna save it? Dropbox, SugarSync , Syncplicity $$, Jungle Disk ($3p/m), Zumodrive (2gb=$3p/m), AeroFS , SpiderOak , MiMedia , Wuala , Quanp , Avoid MS Windows Live, SkyDrive and Mesh – more trouble than they are worth Bookmarks: search on Tucows; Xmarks, Diigo, Goals bookmarks: save on PC, in cloud, sync, export, share Get data as fine-grained; into lowest common denominator
It is possible to build a good-looking site, integrating Flash technology if desired, while still making the underlying structured data directly available to users. A good example is our election day results files ( http://elections.nytimes.com/2010/results/house ). If you view the source markup for this page, which includes very sharp-looking Flash elements, you'll find an embedded URL -- http://elections.nytimes.com/2010/results/house.tsv . That is a link to a tab-delimited file containing the data underlying the map. --Griff Palmer +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From: James Jennings < [email_address] > Date: Monday, February 14, 2011 Subject: Ease of scraping this site? To: chris feola < [email_address] > 1-The entire site is in flash. I might be able to pipe some of the search data to a csv but not everything is searchable. This is the best job of making public data as inaccessible as possible that I have ever seen. It is a masterwork. I would call and just ask them to send it all in a spreadsheet and see what happens. jj
Griff Palmer: “We can't talk about motive for doing the kind of stuff they're trying to do here. But we can talk about demonstrable effect, and we can talk about practical alternatives.” It is possible to build a good-looking site, integrating Flash technology if desired, while still making the underlying structured data directly available to users. A good example is our election day results files ( http://elections.nytimes.com/2010/results/house ). If you view the source markup for this page, which includes very sharp-looking Flash elements, you'll find an embedded URL -- http://elections.nytimes.com/2010/results/house.tsv . That is a link to a tab-delimited file containing the data underlying the map. --Griff Palmer
Calculating Rates: e.g. 2,000 murders / 7,300,000 population = .0002739 * 100,000 Murder rate p/100,000 = 27.39
Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase . Freebase is an open, Creative Commons licensed repository of structured data of almost 20 million entities . An entity is a single person, place, or thing. Freebase connects entities together as a graph . Ways to use Freebase: Use Freebase's Ids to uniquely identify entities anywhere on the web Query Freebase's data using MQL Build applications using our API or Acre , our hosted development platform Freebase is also a community of thousands of data-lovers, working together to improve Freebase's data. Learn how to contribute , join our mailing list , or find out more on our community page . Fusion Tables is a service for managing large collections of tabular data in the cloud. You can upload tables of up to 100MB and share them with collaborators, or make them public. You can apply filters and aggregation to your data, visualize it on maps and other charts, merge data from multiple tables, and export it to the Web or csv files. You can also conduct discussions about the data at several levels of granularity, such as rows, columns and individual cells.