SlideShare une entreprise Scribd logo
1  sur  74
Télécharger pour lire hors ligne
Data visualization in the
newsroom
{
“presented by”: “carl v. lewis”,
“for”: “the florida times-union”,
“slides”: “bit.ly/NIXkOD”,
“email”:“carl@carlvlewis.net”
}
What is data visualization?
•Data itself is the story; standalone narrative.
•Interactive, communicative, visual.
•Ranges from simple (charts) to complex
(database-driven applications).
•Both a technique and a format.
•Both entertaining and factual.
• See:“The Many Words forVisualization”
The history of data journalism
•Grew out of CAR
(computer assisted-reporting)
tradition
•John Snow’s 1854 cholera
map
•Has coincided with the era
of “Big Data”
On the emergence of the field of
data journalism:
•"When information was scarce, most of our efforts
were devoted to hunting and gathering. Now that
information is abundant, processing is more
important." –Phillip Meyer, UNC Chapel Hill
On the growing importance of
data-driven journalism:
•“Journalists need to be data-savvy . . . Data-driven
journalism is the future.” –Sir Tim Berners Lee.
•“The explosion ofWeb-based tools and ways of
sifting through and sharing data has created
something approaching a revolution, and the
potential benefits for journalism are only just
beginning to reveal themselves.” –Matthew Ingram
What data journalism is not:
• Simply incorporating public data into your
textual narrative
• Infographics
• Illustration
• Resource-intensive
• Just about numbers and programming
• Just about making data flashy
What data journalism is:
• Visual
• Often evergreen
• Transparent – direct access to primary
source
• Credible
• Engaging
• A good business model
Hans Rosling
http://www.youtube.com/watch?v=jbkSRLYSojo
Democratization of data
journalism
• Free and open-source
tools (Google Drive,
JavaScript libraries, etc.).
• Open Data laws.
• “Anyone can do it. Data
journalism is the new punk.”
-Simon Rogers,The
Guardian
The job of the data journalist
• Part statistician, part journalist, part
programmer.
• “We're statisticians.We don't program.”
• “We’re programmers.We don’t report.”
• “We’re journalists.We don’t code.”
Notable examples of data visualization
• “Mapping America: Every City, Every Block,”
NYTimes.com.
• “Where Does My Money Go?”, Open Knowledge
Foundation.
• “Illinois school report cards,” Chicago Tribune
• “We Feel Fine,” Jonathan Harris
• “Top Secret America,” The Washington Post
News organizations to follow for
innovative data projects
What are your favorite
visualizations?
When to use data visualization:
• Show change over time
• Comparing discrete values
• Showing connections and flows
• Showing hierarchy
• Browsing large databases
When not to use data
visualization:
• When text or multimedia tells story better
• When you have very few data pints
• When there is no statistical significance
• When a map is not a map
• When a table would do
Process of data journalism
1. Research – Think of topic and research
factors.
2. Find the data – Locate and retrieve relevant
public data
3. Analysis and evaluation – Crunch numbers,
look for trends or inconsistencies
4. Visualize – Display the data in appropriate
manner
II. Mining public data
Research and retrieval
Research
1. Think of a topic – what factors influence it?
2. What public data might shed light on those
factors?
3. Seek out the data
Locating public data
• Thousands of public “data dumps” by
government bodies and nonprofits.
• Most commonly in delimited spreadsheet
format (look for .csv, .xls), sometimes in
XML and JSON.
• For geographic data, look for .kml or .shp
• Can be found directly at source or by
search engine keyword
Search tips for data retrieval
• If you don’t know which source to
look to find your data, an initial Web
search might help.
• After your keywords, type
“filetype:XLS”,“filetype:CSV”, or
whatever the extension is of the
data you’re seeking, and you’ll see
only files of that type from across
the Web.
• If you get no results, try broadening
your search term to locate sources
that cover the general discipline (i.e.
instead of “malaria deaths,” try
“public health data”)
Locating public data
• Federal sources: Data.gov,
Census.gov, OpenSecrets.org,
FollowTheMoney.org, USA.gov,
USGovXML.com (full federal list by
topic/agency here).
• Data catalogs such as
thedatahub.org, datamarket.com,
infochimps.org, datacatalogs.org are
good places to find non-
• Florida’s “Sunshine” law requires all state agencies
to provide open access to public records, including
data.
• Chapter 119 of Florida State Statutes mandates
that “any records made or received by any public
agency in the course of its official business are
available for inspection, unless specifically exempted by
the Florida Legislature.”
Florida public data sources
• Dozens of useful open data sources
maintained by Florida government
agencies, including
TransparencyFlorida.gov,
FloridaHasARightToKnow.com and
MyFlorida.gov
• Full-list of state-maintained databases
by topic here.
• A few state-maintained databases
worth mentioning: the Division of
Elections’ campaign finance data, the
DOE’s test score reports and the
Department of Law Enforcement’s
arrest and officer reports.
Florida public data sources
Florida public data sources
• A number of advocacy groups also maintain useful,
downloadable statewide databases:
• FloridaOpenGov.org, which focuses on public employee
payroll data.
• FloridaRedistricting.org, which provides demographic
data (.csv) and geographic polygons (.shp) for new
district boundaries.
• Florida Housing Data Clearinghouse, which provides
regularly updated property values, housing data (.xls).
(for even more, see my semi-exhaustive list with descriptions here).
nt.aspx?id=235
Georgia public data sources
• Although Georgia has no law
requiring all government agencies
to make public data accessible
online, many do anyway.
• In 2008, the Transparency in
Government Act expanded the
public data site,
Open.Georgia.gov, to include all
three branches of government,
regional education service
agencies, local boards of
education, and transactions made
by the General Assembly.
Georgia public data sources
• A comprehensive list of downloadable databases from
state agencies in Georgia can be found here.
• The State Ethics Committee has made all campaign
finance reports, lobbyist reports and campaign
contributions available in downloadable spreadsheets.
• OASIS provides a set of web-based tools to browse the
Georgia Department of Public Health’s Data Warehouse,
and download the data yourself if you wish.
Locating geographic data
• Most geographic data available
as TIGER/Line Shapefile
packages (archives
containing .shp, .dbf, .prj, .xml,
.shx) from U.S. Census Bureau.
• Google also hosts a directory
of .kml files for most geographic
boundaries here.
• Alternatively, Florida and
Georgia GIS data can be found
at FGDL.org, Geoplan and
Data.GeorgiaSpatial.org.
What to look for
• Most numeric spreadsheet data comes either as a comma-separated value
(.csv) or Microsoft Excel (.xls) file. Example of .csv structure:
“Name”,“Date”,“Address”,”Zip”,”State”,”Country”,
• XML (eXtensible Markup Language) stores data hierarchically for the
Web, and is good for building news applications because of its broad
interoperability.
<menu id="file" value="File">
<popup>
<menuitem value="New" onclick="CreateNewDoc()" />
<menuitem value="Open" onclick="OpenDoc()" />
<menuitem value="Close" onclick="CloseDoc()" />
</popup>
</menu>
• JSON (JavaScript Object Notation) – Similar to XML in structure, but has
a “lighter” punctuation, based on JavaScript conventions. May eventually
replace XML as standard.
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
] } }}
Scraping other sources
• Scrape data from an HTML table with
simple Google spreadsheet formula:
=ImportHtml("http://the-url-goes-here", "table", 0)
• For database of HTML tables, try
Haystax.
• For PDFs, try CometDocs.
• Scrape webpages by running or creating
Python script at ScraperWiki.
APIs for data retrieval
• APIs (application programming interfaces) are how many
websites and services share content with one another.
• Allows a computer system to fetch, interpret and use data
created on another system, even if it used a different
programming language or structure.
• Examples:Twitter Search API, Google Maps API, NYTimes
Campaign Finance API.
• Usually returns data as XML, JSON or .txt
• Often requires use of an API key.
II. Analyzing and
refining public data
Manipulating datasets
• Data rarely ready for analysis and visualization out-of-the-
box (hence “raw data”).
• Spreadsheet applications most common and easiest way to
work with data (Excel, Google Spreadsheets).
• Allow for complex calculations, formulas, sorting.
• Compatible with a variety of file formats
(.xls, .ods, .csv, .txt, .tsv).
• Scripts may also be written to automate bulk manipulation
(Python).
• R Project (r-project.org)
Data analysis
• To figure out what your data
says, you’ll need to crunch the
numbers.
• Statistical significance is litmus
test.
• Skewed or normal distribution?
Why?
• Outliers? If so, error or
unexplained factor?
Benchmarks for analysis
• Mean (μ) simplest to calculate, but
susceptible to errors caused by
outliers.
• Median usually a better metric in
determining conclusion, especially
with skewed distribution.
• If mean=mode, no skewness.
• Standard deviation (σ) measures
reliability of data set.
• Z-Score = how many standard
deviations a value is away from the
mean and, thus, its likelihood of
being an outlier.
standard deviation
mean
z-score
Calculating values in Excel
• Mean: =AVERAGE(A1-A27)
• Median: MEDIAN(A1-A27)
• Standard deviation: STDEV(A1-A27)
• Z-score of a given value: Subtract mean of dataset from
value. Divide result by the standard deviation
Other commonly used Excel
formulas
• Concatenate to merge multiple columns.
• MID to split columns.
• Percent change to display relative change over time
=(new_value-original_value)/ABS(original_value)
• See this guide of helpful Excel tricks for data
journalists, compiled by Mary-Jo Webster of St. Paul
Pioneer Press: https://docs.google.com/file/d/
0ByLyArAQRhaBNDc3NjJjYTUtY2U0Yi00NmIwLThk
NTgtYzNlYThmNGE1ZTEz/edit
Refining and cleaning data
• Sometimes Excel and Google
Spreadsheets aren’t enough, especially
when working with large datasets.
• Google Refine – free tool that lets you
explore, power sort and process data.
• Useful for finding and fixing errors
and inconsistencies,“power tool for
working with messy data.”
• Facets to sort data
• Cleaning with clusters
• Shan Carter’s Mr. Data Converter to
convert spreadsheets to more web-
friendly format.
Other data analysis tips and tricks
• Put field names in first row.
• Put geographic data in first columns
• When you have two different datasets, a good tool to
merge them is Google Fusion Tables (make sure they
share a common attribute).
• Never round until the end of calculations. Round to
two decimal points for visualization purposes.
• Cut and paste calculations into a new column as values
only.
• Know the principle data types (integer, real, string,
boolean), and make sure numeric data is classified as
either integer (whole numbers only) or real (any
value).
III. Visualizing your
data
Planning your visualization
• Identify your key message
• Choose the best data series to illustrate your point
• Consider the number of points in the data
• Think about complementary/supporting datasets you can
incorporate, e.g. sanitation with poverty.
• Plan for user interaction, i.e. visual feedback.
• Make numerical changes to raw data to enhance your
point, e.g. absolute values vs. percent change
• Brainstorm potential technologies
• Consult experts on topic to back up your interpretation
of data
Choosing the right type of
visualization
• Change of single variable over time: line chart.
• Comparison of single variable among multiple classes: bar chart.
• Two variables: scatter plot, bubble chart.
• Hierarchical data: treemap, bubbletree.
• Area charts for area only
• Makeup of whole: pie chart.
• Distribution: histograms, box-and-whisker plots.
• Geographic data (point, polygon, chloropleth and symbol maps).
• Records: searchable database.
• Chronological data: timeline, sparklines.
• Other possibilities: matrices, heatmap, games, slopegraphs, stepper graphics,
Visualization design principles
• Typography: clear, consistent, not
distracting.
• Use bold, mix of serif/sans-serif to
provide emphasis.
• Don’t set type at an angle
• Color: Let color correspond to
variable, design for accessibility, choose
from same side of color wheel,
consider cultural associations but avoid
thematic palletes. Use Adobe Kuler or
0to255.com
• Visual overload, emotional design,
skewmorphism.
No white type on
black background
No angled type
• Some guidelines for graphical integrity,
according to Edward Tufte in TheVisual
Display of Quantitative Information:
1. Representation of numbers should
be directly proportional to
numerical qualities represented.
2. Clear, detailed labeling throughout.
3. Show data variation, not design
variation.
4. Avoid excessive and unnecessary
use of graphical effects
What Edward Tufte calls “the worst
visualization ever published.”
Visualization design principles
• Design for the eye
• User should be able to
discern key message
visually.
• Design for interaction
• Highlighting and details on
demand (example)
• User-driven content
selection (example)
Visualization design principles
Visualization design principles
Awful
Bad, but
better
Visualization design principles
Awful, but better
Not bad
Awful
Visualization design principles
What’s wrong with this infographic?
Visualization design principles
“Four Ways to Slice
Obama’s Budget Proposal”
• From NYTimes.com: http://
www.nytimes.com/interactive/
2012/02/13/us/politics/2013-
budget-proposal-graphic.html
• What makes this visualization
effective? How does it approach
color, complexity, interactivity
and typography? How does it
avoid visual overload?
Wireframing/
prototyping
• Follow a structured grid system
(i.e., 12 column, 960px grid –
see 960.gs and Subtraction).
• Very selectively, you can
break the grid to emphasize
a certain visual element.
• Sketch out/prototype your
wireframe on paper first (print
templates such as this)
Selecting tools/technologies
• A wealth of free, open-source
data visualization tools and
libraries exist to shorten
development times
• Examples: Google
Visualization API, Google
Fusion Tables,
Highcharts.js, CartoDB,
d3.js,Tableau Public.
• For everything else, HTML5 +
CSS + JavaScript
IV. Building a Web
app
Web app anatomy
Three components of aWeb app:
1. HTML (structure)
2. CSS (styles)
3. JavaScript (interactivity)
Parts of an HTML file
An HTML file is made up of:
1. Doctype declaration
2. Head <head>
3. CSS/JavaScript references
4. Title <title>
5. Body <body>
6. A Div container
7. Divs (IDs and classes)
Parts of a CSS file
A CSS file is made up of:
1. Container ID
2. Default paragraph (p) style
3. Default H1,H2, etc. styles
4. Default .body style
5. Styles for all divs
V. Maps
Maps 101
• Interactive maps combine
geocoded data – points or
polygons – along with metadata
and/or numeric data.
• KML (keyhole markup language)
quickly becoming popular file
format, but Shapefile (shp.zip) is
still the most widely available
• Geographic data can either be
geocoded, downloaded from the
Web, or custom-drawn.
• Good puveyor of news maps:
The Texas Tribune.
Mapping services and libraries
• Google Fusion Tables – Quick, versatile
and classic maps that integrate seamlessly
with the Google Maps JavaScript API.
• CartoDB – A newer open-source tool
much like Fusion Tables, but with a better
looking out-of-the-box experience.
• Leaflet – An open-source, client-side
mapping library with an API that allows
you to achieve a number of advanced
features. Plays nicely with Fusion Tables
and CartoDB-hosted maps. Part of
CloudMade suite.
Handy desktop mapping
software
• qGis – Free program that supports
almost every conceivable map file
type, and allows you to add or
manipulate vector data, which can
then be then exported as a KML
or Shapefile package.
• Tilemill – A map creation and
styling software; ideal for those
with little programming
experience. UTF-grid enabled
tilesets only.
Primary map types
• Chloropleth – Colors
for each geometry
correspond to numeric
values of a given
variable.
• Point – Locations on a
map displayed by
geocoded markers.
• Less frequently:
proportional maps and
geo maps.
Chloropleth map of Georgia voter turnout
Point map of Jacksonville polling locations
Tips and tricks
• If you have street address data, you
can use BatchGeocode to convert
them to lat-long coordinates.
• For chloropleth maps,
• Include no more than five fill
colors or “buckets”
• Don’t define an equidistant color
ramp; use ColorBrewer instead.
• Use MarkerClusterer when there
are too many points for certain
zoom levels.
Using ColorBrewer to define an accurate, accessible color ramp.
Using MarkerClusterer to cluster points at further zoom levels.
Tips and tricks
• To convert Shapefiles so they can
be imported into Fusion Tables,
either use Shape to Fusion, or
export it as KML from CartoDB.
• Before using the embed tool in
Fusion Tables or CartoDB, make
sure the map is centered where
you want it.
• Ensure your map is set to
“Public.”
Export a Shapefile as KML in CartoDB.
Making your map public in Fusion Tables
V. Charts
Charts
• Basic building block of visualization
• Simple, but also easy to mess up.
• Should always be interactive.
• Should always include data source.
• Should always include a legend.
• Unless necessary, only show labels
on mouseover.
Interactive charting tools
• Out-of-the-box: Google Drive
charts, infogr.am.
• More advanced: Google Code
Playground.
• Most agile: Highcharts.js.
• Most extendible:Tableau Public
A combo chart made using Highcharts.js
Charting best practices
• Color: Pick palette of no more
than 3-4 colors from same side of
color wheel.
• Increments: Use natural-
increments like (0,2,4,6...) instead
of, say, (0,3,6,9...)
• Scale: Don’t plot two unrelated
series with one scale on left and
one on right.
• Style: Flat and simple. No 3D
effects, shadows, narrow bars or
distracting shading.
Don’t plot two different variables on same scale.
Bars too narrow Distracting shading
Misleading 3D effects Pointless shadows
Source: TheWall Street Journal Guide
to Information Graphics, Dona M.Wong.
Charting best practices
• Always set the baseline to
zero.
• Always order starting with
greatest value
• Use broken bars sparingly
• No more than five slices on
pie charts; no “donut” pie
charts.
• No more than 3-4 lines on
line chart
Wrong order Right order
Wrong baseline Right baseline
No donut-pies
Source: TheWall Street Journal Guide
to Information Graphics, Dona M.Wong.
V. Programming and
beyond
Utilizing JavaScript/HTML5 libraries
• Together, JavaScript, HTML5
and jQuery have expanded
boundaries of data
visualization
• Abundance of open-source
libraries and packages mean
less programming required to
produce unique, interactive
visualizations.
• Examples:Timeline.js,
Bubbletree.js, Raphael.js,
ProPublica tools
The HTML5 revolution
• Adobe Edge for HTML5
development; end of Flash’s
reign
• Platform-agnostic, mobile-
first movement
• Forking resources and
packages off GitHub
Pushing the limits
• RaphaelJS for easier
manipulation of serialized
vector graphics
• Other boundary-pushing data
visualization projects:
Processing!, Gephi, d3.js,
IBM’s Many Eyes. A network map produced using D3.js
Helpful resources and communities
• Blogs/Tutorials:
FlowingData.com,Vis4.net,Driven-
by-data.net, Chryswu.com,
datavisualization.ch
• Books: The Data Journalism
Handbook, O’Reilly Media. Flowing
Data Guide toVisualization, Chris
Wyu. TheWall Street Journal Guide to
InformationVisualization, Dona M.
Wong.
• Communities: visual.ly, Hacks/
Hackers, NICAR.
Free data journalism handbook
from O’Reilly Media
For slides and list of links,
http://bit.ly/NIXkOD
@carlvlewis

Contenu connexe

Tendances

Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015J T "Tom" Johnson
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Stefan Dietze
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data ManagementMarin Dimitrov
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...g8briel
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Stefan Dietze
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceStefan Dietze
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedRensselaer Polytechnic Institute
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenancePaolo Missier
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
 
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...News Leaders Association's NewsTrain
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
OpenDataHK Meetup 13 June 2013 What is Open Data?
OpenDataHK Meetup 13 June 2013 What is Open Data? OpenDataHK Meetup 13 June 2013 What is Open Data?
OpenDataHK Meetup 13 June 2013 What is Open Data? Mr. Bill Proudfit
 

Tendances (20)

Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
 
Qpat 2007
Qpat 2007Qpat 2007
Qpat 2007
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScience
 
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
 
The Open Access Community, and OAIster
The Open Access Community, and OAIsterThe Open Access Community, and OAIster
The Open Access Community, and OAIster
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenance
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Nordic health data metadata
Nordic health data   metadataNordic health data   metadata
Nordic health data metadata
 
OpenDataHK Meetup 13 June 2013 What is Open Data?
OpenDataHK Meetup 13 June 2013 What is Open Data? OpenDataHK Meetup 13 June 2013 What is Open Data?
OpenDataHK Meetup 13 June 2013 What is Open Data?
 
Searching the web general
Searching the web generalSearching the web general
Searching the web general
 

Similaire à Data Visualization in the Newsroom

open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...News Leaders Association's NewsTrain
 
Intro to Government Information Sources
Intro to Government Information SourcesIntro to Government Information Sources
Intro to Government Information SourcesDr. Starr Hoffman
 
It's not the documents; it's the DATA
It's not the documents; it's the DATAIt's not the documents; it's the DATA
It's not the documents; it's the DATAJ T "Tom" Johnson
 
Martin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial TimesMartin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial Timesjoelmgunter
 
Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2lmk7
 
3rd Socio-Cultural Data Summit
3rd Socio-Cultural Data Summit3rd Socio-Cultural Data Summit
3rd Socio-Cultural Data SummitDataCards
 
Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019News Leaders Association's NewsTrain
 
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...News Leaders Association's NewsTrain
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...
Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...
Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...News Leaders Association's NewsTrain
 
Economics introduction and data access
Economics introduction and data accessEconomics introduction and data access
Economics introduction and data accessSeth Porter, MA, MLIS
 
Big data and development
Big data and developmentBig data and development
Big data and developmentSimone Sala
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 

Similaire à Data Visualization in the Newsroom (20)

open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
 
Intro to Government Information Sources
Intro to Government Information SourcesIntro to Government Information Sources
Intro to Government Information Sources
 
It's not the documents; it's the DATA
It's not the documents; it's the DATAIt's not the documents; it's the DATA
It's not the documents; it's the DATA
 
Sociology 209
Sociology 209Sociology 209
Sociology 209
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
Martin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial TimesMartin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial Times
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Mining Social Data
Mining Social DataMining Social Data
Mining Social Data
 
Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2
 
3rd Socio-Cultural Data Summit
3rd Socio-Cultural Data Summit3rd Socio-Cultural Data Summit
3rd Socio-Cultural Data Summit
 
Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019
 
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...
Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...
Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain -...
 
Economics introduction and data access
Economics introduction and data accessEconomics introduction and data access
Economics introduction and data access
 
Big data and development
Big data and developmentBig data and development
Big data and development
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 

Plus de Carl V. Lewis

"Civic Innovation: It's Not One Size Fits All"
"Civic Innovation: It's Not One Size Fits All""Civic Innovation: It's Not One Size Fits All"
"Civic Innovation: It's Not One Size Fits All"Carl V. Lewis
 
Open Savannah: A Manifesto
Open Savannah: A ManifestoOpen Savannah: A Manifesto
Open Savannah: A ManifestoCarl V. Lewis
 
Refresh Savannah Lightning Talk: Open Savannah
Refresh Savannah Lightning Talk: Open SavannahRefresh Savannah Lightning Talk: Open Savannah
Refresh Savannah Lightning Talk: Open SavannahCarl V. Lewis
 
Data Visualization for Non-Programmers
Data Visualization for Non-ProgrammersData Visualization for Non-Programmers
Data Visualization for Non-ProgrammersCarl V. Lewis
 
newsonomics-mashable (1)
newsonomics-mashable (1)newsonomics-mashable (1)
newsonomics-mashable (1)Carl V. Lewis
 
The Next-Gen Student News Organization
The Next-Gen Student News OrganizationThe Next-Gen Student News Organization
The Next-Gen Student News OrganizationCarl V. Lewis
 
What are the skills of a post-platform journalist?
What are the skills of a post-platform journalist?What are the skills of a post-platform journalist?
What are the skills of a post-platform journalist?Carl V. Lewis
 
Introduction to Data Journalism
Introduction to Data JournalismIntroduction to Data Journalism
Introduction to Data JournalismCarl V. Lewis
 
Tackling the local classified ad market
Tackling the local classified ad marketTackling the local classified ad market
Tackling the local classified ad marketCarl V. Lewis
 

Plus de Carl V. Lewis (12)

"Civic Innovation: It's Not One Size Fits All"
"Civic Innovation: It's Not One Size Fits All""Civic Innovation: It's Not One Size Fits All"
"Civic Innovation: It's Not One Size Fits All"
 
Civicdesign
CivicdesignCivicdesign
Civicdesign
 
Open Savannah: A Manifesto
Open Savannah: A ManifestoOpen Savannah: A Manifesto
Open Savannah: A Manifesto
 
Refresh Savannah Lightning Talk: Open Savannah
Refresh Savannah Lightning Talk: Open SavannahRefresh Savannah Lightning Talk: Open Savannah
Refresh Savannah Lightning Talk: Open Savannah
 
WTF is AMP?
WTF is AMP?WTF is AMP?
WTF is AMP?
 
carlvlewis_resume
carlvlewis_resumecarlvlewis_resume
carlvlewis_resume
 
Data Visualization for Non-Programmers
Data Visualization for Non-ProgrammersData Visualization for Non-Programmers
Data Visualization for Non-Programmers
 
newsonomics-mashable (1)
newsonomics-mashable (1)newsonomics-mashable (1)
newsonomics-mashable (1)
 
The Next-Gen Student News Organization
The Next-Gen Student News OrganizationThe Next-Gen Student News Organization
The Next-Gen Student News Organization
 
What are the skills of a post-platform journalist?
What are the skills of a post-platform journalist?What are the skills of a post-platform journalist?
What are the skills of a post-platform journalist?
 
Introduction to Data Journalism
Introduction to Data JournalismIntroduction to Data Journalism
Introduction to Data Journalism
 
Tackling the local classified ad market
Tackling the local classified ad marketTackling the local classified ad market
Tackling the local classified ad market
 

Dernier

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Dernier (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Data Visualization in the Newsroom

  • 1. Data visualization in the newsroom { “presented by”: “carl v. lewis”, “for”: “the florida times-union”, “slides”: “bit.ly/NIXkOD”, “email”:“carl@carlvlewis.net” }
  • 2. What is data visualization? •Data itself is the story; standalone narrative. •Interactive, communicative, visual. •Ranges from simple (charts) to complex (database-driven applications). •Both a technique and a format. •Both entertaining and factual. • See:“The Many Words forVisualization”
  • 3. The history of data journalism •Grew out of CAR (computer assisted-reporting) tradition •John Snow’s 1854 cholera map •Has coincided with the era of “Big Data”
  • 4. On the emergence of the field of data journalism: •"When information was scarce, most of our efforts were devoted to hunting and gathering. Now that information is abundant, processing is more important." –Phillip Meyer, UNC Chapel Hill
  • 5. On the growing importance of data-driven journalism: •“Journalists need to be data-savvy . . . Data-driven journalism is the future.” –Sir Tim Berners Lee. •“The explosion ofWeb-based tools and ways of sifting through and sharing data has created something approaching a revolution, and the potential benefits for journalism are only just beginning to reveal themselves.” –Matthew Ingram
  • 6. What data journalism is not: • Simply incorporating public data into your textual narrative • Infographics • Illustration • Resource-intensive • Just about numbers and programming • Just about making data flashy
  • 7. What data journalism is: • Visual • Often evergreen • Transparent – direct access to primary source • Credible • Engaging • A good business model
  • 9. Democratization of data journalism • Free and open-source tools (Google Drive, JavaScript libraries, etc.). • Open Data laws. • “Anyone can do it. Data journalism is the new punk.” -Simon Rogers,The Guardian
  • 10. The job of the data journalist • Part statistician, part journalist, part programmer. • “We're statisticians.We don't program.” • “We’re programmers.We don’t report.” • “We’re journalists.We don’t code.”
  • 11. Notable examples of data visualization • “Mapping America: Every City, Every Block,” NYTimes.com. • “Where Does My Money Go?”, Open Knowledge Foundation. • “Illinois school report cards,” Chicago Tribune • “We Feel Fine,” Jonathan Harris • “Top Secret America,” The Washington Post
  • 12. News organizations to follow for innovative data projects
  • 13. What are your favorite visualizations?
  • 14. When to use data visualization: • Show change over time • Comparing discrete values • Showing connections and flows • Showing hierarchy • Browsing large databases
  • 15. When not to use data visualization: • When text or multimedia tells story better • When you have very few data pints • When there is no statistical significance • When a map is not a map • When a table would do
  • 16. Process of data journalism 1. Research – Think of topic and research factors. 2. Find the data – Locate and retrieve relevant public data 3. Analysis and evaluation – Crunch numbers, look for trends or inconsistencies 4. Visualize – Display the data in appropriate manner
  • 17. II. Mining public data Research and retrieval
  • 18. Research 1. Think of a topic – what factors influence it? 2. What public data might shed light on those factors? 3. Seek out the data
  • 19. Locating public data • Thousands of public “data dumps” by government bodies and nonprofits. • Most commonly in delimited spreadsheet format (look for .csv, .xls), sometimes in XML and JSON. • For geographic data, look for .kml or .shp • Can be found directly at source or by search engine keyword
  • 20. Search tips for data retrieval • If you don’t know which source to look to find your data, an initial Web search might help. • After your keywords, type “filetype:XLS”,“filetype:CSV”, or whatever the extension is of the data you’re seeking, and you’ll see only files of that type from across the Web. • If you get no results, try broadening your search term to locate sources that cover the general discipline (i.e. instead of “malaria deaths,” try “public health data”)
  • 21. Locating public data • Federal sources: Data.gov, Census.gov, OpenSecrets.org, FollowTheMoney.org, USA.gov, USGovXML.com (full federal list by topic/agency here). • Data catalogs such as thedatahub.org, datamarket.com, infochimps.org, datacatalogs.org are good places to find non-
  • 22. • Florida’s “Sunshine” law requires all state agencies to provide open access to public records, including data. • Chapter 119 of Florida State Statutes mandates that “any records made or received by any public agency in the course of its official business are available for inspection, unless specifically exempted by the Florida Legislature.” Florida public data sources
  • 23. • Dozens of useful open data sources maintained by Florida government agencies, including TransparencyFlorida.gov, FloridaHasARightToKnow.com and MyFlorida.gov • Full-list of state-maintained databases by topic here. • A few state-maintained databases worth mentioning: the Division of Elections’ campaign finance data, the DOE’s test score reports and the Department of Law Enforcement’s arrest and officer reports. Florida public data sources
  • 24. Florida public data sources • A number of advocacy groups also maintain useful, downloadable statewide databases: • FloridaOpenGov.org, which focuses on public employee payroll data. • FloridaRedistricting.org, which provides demographic data (.csv) and geographic polygons (.shp) for new district boundaries. • Florida Housing Data Clearinghouse, which provides regularly updated property values, housing data (.xls). (for even more, see my semi-exhaustive list with descriptions here). nt.aspx?id=235
  • 25. Georgia public data sources • Although Georgia has no law requiring all government agencies to make public data accessible online, many do anyway. • In 2008, the Transparency in Government Act expanded the public data site, Open.Georgia.gov, to include all three branches of government, regional education service agencies, local boards of education, and transactions made by the General Assembly.
  • 26. Georgia public data sources • A comprehensive list of downloadable databases from state agencies in Georgia can be found here. • The State Ethics Committee has made all campaign finance reports, lobbyist reports and campaign contributions available in downloadable spreadsheets. • OASIS provides a set of web-based tools to browse the Georgia Department of Public Health’s Data Warehouse, and download the data yourself if you wish.
  • 27. Locating geographic data • Most geographic data available as TIGER/Line Shapefile packages (archives containing .shp, .dbf, .prj, .xml, .shx) from U.S. Census Bureau. • Google also hosts a directory of .kml files for most geographic boundaries here. • Alternatively, Florida and Georgia GIS data can be found at FGDL.org, Geoplan and Data.GeorgiaSpatial.org.
  • 28. What to look for • Most numeric spreadsheet data comes either as a comma-separated value (.csv) or Microsoft Excel (.xls) file. Example of .csv structure: “Name”,“Date”,“Address”,”Zip”,”State”,”Country”, • XML (eXtensible Markup Language) stores data hierarchically for the Web, and is good for building news applications because of its broad interoperability. <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> • JSON (JavaScript Object Notation) – Similar to XML in structure, but has a “lighter” punctuation, based on JavaScript conventions. May eventually replace XML as standard. {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }}
  • 29. Scraping other sources • Scrape data from an HTML table with simple Google spreadsheet formula: =ImportHtml("http://the-url-goes-here", "table", 0) • For database of HTML tables, try Haystax. • For PDFs, try CometDocs. • Scrape webpages by running or creating Python script at ScraperWiki.
  • 30. APIs for data retrieval • APIs (application programming interfaces) are how many websites and services share content with one another. • Allows a computer system to fetch, interpret and use data created on another system, even if it used a different programming language or structure. • Examples:Twitter Search API, Google Maps API, NYTimes Campaign Finance API. • Usually returns data as XML, JSON or .txt • Often requires use of an API key.
  • 32. Manipulating datasets • Data rarely ready for analysis and visualization out-of-the- box (hence “raw data”). • Spreadsheet applications most common and easiest way to work with data (Excel, Google Spreadsheets). • Allow for complex calculations, formulas, sorting. • Compatible with a variety of file formats (.xls, .ods, .csv, .txt, .tsv). • Scripts may also be written to automate bulk manipulation (Python). • R Project (r-project.org)
  • 33. Data analysis • To figure out what your data says, you’ll need to crunch the numbers. • Statistical significance is litmus test. • Skewed or normal distribution? Why? • Outliers? If so, error or unexplained factor?
  • 34. Benchmarks for analysis • Mean (μ) simplest to calculate, but susceptible to errors caused by outliers. • Median usually a better metric in determining conclusion, especially with skewed distribution. • If mean=mode, no skewness. • Standard deviation (σ) measures reliability of data set. • Z-Score = how many standard deviations a value is away from the mean and, thus, its likelihood of being an outlier. standard deviation mean z-score
  • 35. Calculating values in Excel • Mean: =AVERAGE(A1-A27) • Median: MEDIAN(A1-A27) • Standard deviation: STDEV(A1-A27) • Z-score of a given value: Subtract mean of dataset from value. Divide result by the standard deviation
  • 36. Other commonly used Excel formulas • Concatenate to merge multiple columns. • MID to split columns. • Percent change to display relative change over time =(new_value-original_value)/ABS(original_value) • See this guide of helpful Excel tricks for data journalists, compiled by Mary-Jo Webster of St. Paul Pioneer Press: https://docs.google.com/file/d/ 0ByLyArAQRhaBNDc3NjJjYTUtY2U0Yi00NmIwLThk NTgtYzNlYThmNGE1ZTEz/edit
  • 37. Refining and cleaning data • Sometimes Excel and Google Spreadsheets aren’t enough, especially when working with large datasets. • Google Refine – free tool that lets you explore, power sort and process data. • Useful for finding and fixing errors and inconsistencies,“power tool for working with messy data.” • Facets to sort data • Cleaning with clusters • Shan Carter’s Mr. Data Converter to convert spreadsheets to more web- friendly format.
  • 38. Other data analysis tips and tricks • Put field names in first row. • Put geographic data in first columns • When you have two different datasets, a good tool to merge them is Google Fusion Tables (make sure they share a common attribute). • Never round until the end of calculations. Round to two decimal points for visualization purposes. • Cut and paste calculations into a new column as values only. • Know the principle data types (integer, real, string, boolean), and make sure numeric data is classified as either integer (whole numbers only) or real (any value).
  • 40. Planning your visualization • Identify your key message • Choose the best data series to illustrate your point • Consider the number of points in the data • Think about complementary/supporting datasets you can incorporate, e.g. sanitation with poverty. • Plan for user interaction, i.e. visual feedback. • Make numerical changes to raw data to enhance your point, e.g. absolute values vs. percent change • Brainstorm potential technologies • Consult experts on topic to back up your interpretation of data
  • 41.
  • 42. Choosing the right type of visualization • Change of single variable over time: line chart. • Comparison of single variable among multiple classes: bar chart. • Two variables: scatter plot, bubble chart. • Hierarchical data: treemap, bubbletree. • Area charts for area only • Makeup of whole: pie chart. • Distribution: histograms, box-and-whisker plots. • Geographic data (point, polygon, chloropleth and symbol maps). • Records: searchable database. • Chronological data: timeline, sparklines. • Other possibilities: matrices, heatmap, games, slopegraphs, stepper graphics,
  • 43. Visualization design principles • Typography: clear, consistent, not distracting. • Use bold, mix of serif/sans-serif to provide emphasis. • Don’t set type at an angle • Color: Let color correspond to variable, design for accessibility, choose from same side of color wheel, consider cultural associations but avoid thematic palletes. Use Adobe Kuler or 0to255.com • Visual overload, emotional design, skewmorphism. No white type on black background No angled type
  • 44. • Some guidelines for graphical integrity, according to Edward Tufte in TheVisual Display of Quantitative Information: 1. Representation of numbers should be directly proportional to numerical qualities represented. 2. Clear, detailed labeling throughout. 3. Show data variation, not design variation. 4. Avoid excessive and unnecessary use of graphical effects What Edward Tufte calls “the worst visualization ever published.” Visualization design principles
  • 45. • Design for the eye • User should be able to discern key message visually. • Design for interaction • Highlighting and details on demand (example) • User-driven content selection (example) Visualization design principles
  • 48. Awful, but better Not bad Awful Visualization design principles
  • 49. What’s wrong with this infographic? Visualization design principles
  • 50. “Four Ways to Slice Obama’s Budget Proposal” • From NYTimes.com: http:// www.nytimes.com/interactive/ 2012/02/13/us/politics/2013- budget-proposal-graphic.html • What makes this visualization effective? How does it approach color, complexity, interactivity and typography? How does it avoid visual overload?
  • 51. Wireframing/ prototyping • Follow a structured grid system (i.e., 12 column, 960px grid – see 960.gs and Subtraction). • Very selectively, you can break the grid to emphasize a certain visual element. • Sketch out/prototype your wireframe on paper first (print templates such as this)
  • 52. Selecting tools/technologies • A wealth of free, open-source data visualization tools and libraries exist to shorten development times • Examples: Google Visualization API, Google Fusion Tables, Highcharts.js, CartoDB, d3.js,Tableau Public. • For everything else, HTML5 + CSS + JavaScript
  • 53. IV. Building a Web app
  • 54. Web app anatomy Three components of aWeb app: 1. HTML (structure) 2. CSS (styles) 3. JavaScript (interactivity)
  • 55. Parts of an HTML file An HTML file is made up of: 1. Doctype declaration 2. Head <head> 3. CSS/JavaScript references 4. Title <title> 5. Body <body> 6. A Div container 7. Divs (IDs and classes)
  • 56. Parts of a CSS file A CSS file is made up of: 1. Container ID 2. Default paragraph (p) style 3. Default H1,H2, etc. styles 4. Default .body style 5. Styles for all divs
  • 58. Maps 101 • Interactive maps combine geocoded data – points or polygons – along with metadata and/or numeric data. • KML (keyhole markup language) quickly becoming popular file format, but Shapefile (shp.zip) is still the most widely available • Geographic data can either be geocoded, downloaded from the Web, or custom-drawn. • Good puveyor of news maps: The Texas Tribune.
  • 59. Mapping services and libraries • Google Fusion Tables – Quick, versatile and classic maps that integrate seamlessly with the Google Maps JavaScript API. • CartoDB – A newer open-source tool much like Fusion Tables, but with a better looking out-of-the-box experience. • Leaflet – An open-source, client-side mapping library with an API that allows you to achieve a number of advanced features. Plays nicely with Fusion Tables and CartoDB-hosted maps. Part of CloudMade suite.
  • 60. Handy desktop mapping software • qGis – Free program that supports almost every conceivable map file type, and allows you to add or manipulate vector data, which can then be then exported as a KML or Shapefile package. • Tilemill – A map creation and styling software; ideal for those with little programming experience. UTF-grid enabled tilesets only.
  • 61. Primary map types • Chloropleth – Colors for each geometry correspond to numeric values of a given variable. • Point – Locations on a map displayed by geocoded markers. • Less frequently: proportional maps and geo maps. Chloropleth map of Georgia voter turnout Point map of Jacksonville polling locations
  • 62. Tips and tricks • If you have street address data, you can use BatchGeocode to convert them to lat-long coordinates. • For chloropleth maps, • Include no more than five fill colors or “buckets” • Don’t define an equidistant color ramp; use ColorBrewer instead. • Use MarkerClusterer when there are too many points for certain zoom levels. Using ColorBrewer to define an accurate, accessible color ramp. Using MarkerClusterer to cluster points at further zoom levels.
  • 63. Tips and tricks • To convert Shapefiles so they can be imported into Fusion Tables, either use Shape to Fusion, or export it as KML from CartoDB. • Before using the embed tool in Fusion Tables or CartoDB, make sure the map is centered where you want it. • Ensure your map is set to “Public.” Export a Shapefile as KML in CartoDB. Making your map public in Fusion Tables
  • 65. Charts • Basic building block of visualization • Simple, but also easy to mess up. • Should always be interactive. • Should always include data source. • Should always include a legend. • Unless necessary, only show labels on mouseover.
  • 66. Interactive charting tools • Out-of-the-box: Google Drive charts, infogr.am. • More advanced: Google Code Playground. • Most agile: Highcharts.js. • Most extendible:Tableau Public A combo chart made using Highcharts.js
  • 67. Charting best practices • Color: Pick palette of no more than 3-4 colors from same side of color wheel. • Increments: Use natural- increments like (0,2,4,6...) instead of, say, (0,3,6,9...) • Scale: Don’t plot two unrelated series with one scale on left and one on right. • Style: Flat and simple. No 3D effects, shadows, narrow bars or distracting shading. Don’t plot two different variables on same scale. Bars too narrow Distracting shading Misleading 3D effects Pointless shadows Source: TheWall Street Journal Guide to Information Graphics, Dona M.Wong.
  • 68. Charting best practices • Always set the baseline to zero. • Always order starting with greatest value • Use broken bars sparingly • No more than five slices on pie charts; no “donut” pie charts. • No more than 3-4 lines on line chart Wrong order Right order Wrong baseline Right baseline No donut-pies Source: TheWall Street Journal Guide to Information Graphics, Dona M.Wong.
  • 70. Utilizing JavaScript/HTML5 libraries • Together, JavaScript, HTML5 and jQuery have expanded boundaries of data visualization • Abundance of open-source libraries and packages mean less programming required to produce unique, interactive visualizations. • Examples:Timeline.js, Bubbletree.js, Raphael.js, ProPublica tools
  • 71. The HTML5 revolution • Adobe Edge for HTML5 development; end of Flash’s reign • Platform-agnostic, mobile- first movement • Forking resources and packages off GitHub
  • 72. Pushing the limits • RaphaelJS for easier manipulation of serialized vector graphics • Other boundary-pushing data visualization projects: Processing!, Gephi, d3.js, IBM’s Many Eyes. A network map produced using D3.js
  • 73. Helpful resources and communities • Blogs/Tutorials: FlowingData.com,Vis4.net,Driven- by-data.net, Chryswu.com, datavisualization.ch • Books: The Data Journalism Handbook, O’Reilly Media. Flowing Data Guide toVisualization, Chris Wyu. TheWall Street Journal Guide to InformationVisualization, Dona M. Wong. • Communities: visual.ly, Hacks/ Hackers, NICAR. Free data journalism handbook from O’Reilly Media
  • 74. For slides and list of links, http://bit.ly/NIXkOD @carlvlewis