1. Seminar 11
Maps, Timelines, Big Data, and
Visualization
Introduction to the Digital Liberal Arts
MDST 3703 / 7703
Fall 2010
2. Business
• Quiz 2 available in Collab after this class
• Same protocol as Quiz 1 and Midterm
• Due by start of class Thursday
3. Review
• http://www.lkozma.net/wpv/index.html
• The Blogosphere, Wikiverse and other “regions”
of the web have produced massive, aggregated
sources of information – Big Data
• An unintended consequence of this is that these
sources are now being mined for patterns
– Freebase, dbPedia, Facebook, etc.
• As a result, new level of information is emerging
on the web – the datasphere
4. Overview
• The Datasphere raises two big questions
– What can we do with it?
– What will it do with us?
• Today, we look at both questions
6. Different Approaches
• Traditional approaches
– Geographical data (Robertson)
– Historical data (Elliot and Gillies)
• Radical approaches
– Distant Reading (Moretti)
– Cultural Analytics (Manovich)
7. Geographical Data (Places)
• Geographical data are low-hanging fruit
– Names can be extracted from a variety of sources
and then “meshed” with gazetteers
– e.g. GeoNames http://www.geonames.org/
• Maps can help visualize that data
• Maps can also serve as an interface to the
data
• Elliot and Gillies exemplify this approach in
Classics
9. Historical Data (Events)
• HEML (Historical Event Markup Language)
provides a model for defining events
– Written in RDF
• Can be used to extract events from texts or
convert from other formats
– CIDOC-CRM
– Semantic MediaWiki
• These can be aggregated and visualized
14. Cultural Analytics
• Lev Manovich
• Applies interactive visualization to Big Data
• http://lab.softwarestudies.com/2008/09/cultu
ral-analytics.html
15. Distant Reading
• Franco Moretti
• Part of a long tradition of “statistical criticism”
• Influenced by the French historian, Fernand
Braudel
16. One of Moretti’s graphs shows the emergence of the market for
novels in Britain, Japan, Italy, Spain, and Nigeria between about
1700 and 2000. In each case, the number of new novels
produced per year grows -- not at the smooth, gradual pace one
might expect, but with the wild upward surge one might expect of
a lab rat’s increasing interest in a liquid cocaine drip.
“Five countries, three continents, over two centuries apart,” writes
Moretti, “and it’s the same pattern ... in twenty years or so, the
graph leaps from five [to] ten new titles per year, which means
one new novel every month or so, to one new novel per week.
And at that point, the horizon of novel-reading changes. As long
as only a handful of new titles are published each year, I mean,
novels remain unreliable products, that disappear for long
stretches of time, and cannot really command the loyalty of the
reading public; they are commodities, yes, but commodities still
waiting for a fully developed market.”
But as that market emerges and consolidates itself -- with at least
one new title per week becoming available -- the novel becomes
“the great capitalist oxymoron of the regular novelty: the
unexpected that is produced with such efficiency and punctuality
that readers become unable to do without it.”
17.
18. What are some similarities and
differences between the traditional
and radical approaches?
19.
20. Digital Traditionalists and Radicals
• Similarities
– Visualization
– Pattern recognition
– Desire to express data in terms of RDF, etc.
• Allows programs to aggregate, mash up, and analyze
data
• Differences
– Traditionalists favor metadata and ontologies,
where the radicals believe the data will “speak for
themselves”