2. Locating the data
Obtaining the data
Evaluating the data
Working with the data
Visualizing the data
3. “Database state of mind”
Data has to exist. Where?
Online
Offline
4. Government websites
Data.gov
U.S. Census Bureau
FDIC
Missouri Data Portal
Missouri Accountability Portal
5. U.S. agency FOIA pages
Drug Enforcement Administration
NGO sites
Right-to-Know Network
OpenMissouri.org
NICAR database library
ALA state agency databases wiki
6. Commercial services
Socrata
Infochimps
Geocommons
Foreclosure Radar
Oil Price Information Service
Search Systems
Junar
7. Academic data catalogs
ICPSR
Forms
Forms.gov
Web forms
▪ Columbia parade permits
8. Records retention schedules
Reports
State auditor
U.S. Government Accountability Office
U.S. Inspectors General
9. Google advanced search
Look for data files
Look for key words
Look only on government sites
10.
11. Data entry
In the field
At the office
Printouts/reports
Inspection forms
12. Download it
Write or request a scraper with ScraperWiki
Convert a PDF with
CometDocs
Zamzar
Just ask for it
13. U.S. Freedom of Information Act
Passed in 1966
Amended in 1996 to include electronic records
State open-records statutes
Missouri Sunshine Law
14. Get the roadmap!
Record layout
File layout
Data dictionary
Code sheet
Metadata
Data about the data
15. Look at it immediately when you get it
It is what you asked for/expected?
How many rows/records of data?
Is the file format OK?
16. Does it look too good to be true?
Beware of missing information
Who collected the information?
How? What are their methods?
Why?
What is their agenda?
Who supports them financially or otherwise?
21. Always keep original file
Never overwrite data columns
Tools
Spreadsheets
Database managers
Google Refine
Programming languages
22. Raw numbers, without context, rarely are
interesting.
Ask: Compared to what?
23. Raw (amount) change
New-Original
Percent change
Change/Original
Per capita rates
Per person
Per x people
24. Percent of total
Individual/Total
Ratio
Apples/oranges
Averages
Mean
Median
25. Be curious!
Cut out small slices
Spreadsheets for simple math and
comparisons
Spreadsheets for pivot tables
Database managers for more robust analysis
Always ask: Is this correct?