1. How and why study big
visual cultural data
Dr. Lev Manovich
Professor, CUNY Graduate Center
manovich.lev@gmail.com
softwarestudies.com
Fall 2012 version
softwarestudies.com 1
3. Software Studies Initiative - 2007
NEH Office for Digital Humanities - 2008
NEH Humanities High Performance Computing - 2008
NEH/NSF Digging Into Data competition - 2009
Computational Social Science - 2009
Culturnomics and Google n-gram viewer - 2010
New York Times: “The next big idea in language,
history and the arts? Data.”- 2010
softwarestudies.com 3
4. How can we take advantage of unprecedented
amounts of cultural data available on the web
and digitized cultural heritage to begin analyzing
cultural processes in new ways?
How does computational analysis of the
massive cultural datasets and real-time flows
can help us to develop theories and methods in
humanities adequate for the scale and speed of
the 21st century global networked digital
culture ?
softwarestudies.com 4
5. NEH/NSF Digging into Data competition (2009):
“How does the notion of scale affect
humanities and social science research?
Now that scholars have access to huge
repositories of digitized data—far more than
they could read in a lifetime—what does that
mean for research?”
softwarestudies.com 5
7. 1 study societies through the social media
traces (social computing)
2 more inclusive understanding of cultural
history and present (using much larger
samples)
3 detect large scale cultural patterns
softwarestudies.com 7
8. 4 generate multiple maps of the same cultural
data sets (multiple “landscapes”)
5 the best way to follow global professionally
produced digital culture; understand new
developed cultural fields (“X” design)
6 map cultural variability and diversity
softwarestudies.com 8
10. Example - graph from Ted Underwood, “The Differentiation of Literary
and nonliterary diction, 1700-1900.” Data: 3,724 18th century volumes,
using 10,000 most frequent words (excluding proper nouns).
softwarestudies.com 10
11. modern (19th-20th centuries) social and
cultural theory: describe what is similar
(classes, structures, types) / statistics
(reduction)
computational humanities and social science
should focus on describing what is different /
variability / diversity
“from data to knowledge” is wrong. In the
study of culture, we need to go from our
(incomplete, biased) knowledge to actual
cultural data
softwarestudies.com 11
12. “We are no longer interested in the conformity
of an individual to an ideal type; we are now
interested in the relation of an individual to the
other individuals with which it interacts...
Relations will be more important than
categories; functions, which are variable, will
be more important than purposes; transitions
will be more important than boundaries;
sequences will be more important than
hierarchies.”
Louis Menand on Darvin, 2001.
softwarestudies.com 12
14. Manual De Landa:
“The ontological status of assemblages, large
and small, is always that of unique, singular
individuals.”
“Unlike taxonomic essentialism in which
genus, species and individuals are separate
ontological categories, the ontology of
assemblages is flat since it contains nothing
but differently scaled individual singularities.”
source: A New Philosophy of Society.
softwarestudies.com 14
15. Bruno Latour:
“The ‘whole is now nothing more than a
provisional visualization which can be
modified and reversed at will, by moving back
to the individual components, and then
looking for yet other tools to regroup the same
elements into alternative assemblages.”
source: “Tarde’s idea of quantification.” In
The Social After Gabriel Tarde: Debates and
Assessments.
softwarestudies.com 15
16. How to study big cultural
visual data in practice?
How to explore massive visual collections
(exploratory media analysis)?
Which data analysis and visualization
techniques are appropriate for non-technical
users? How to democratize data analysis?
softwarestudies.com 16
21. our media visualization software on newer
display wall with thin bezels
data: 4535 Time magazine covers)
softwarestudies.com 21
22. mediavis - related research:
M. Worring, G.P. Nguyen. Interactive access to large
image collections using similarity-based visualization.
Journal of Visual Languages and Computing 19 (2008)
(submitted 2005).
Gerald Schaefer. Interactive Browsing of Image
Repositories. ICVG 2012.
Jing et al., Google Inc. Google Image Swirl: A Large-Scale
Content-Based Image Visualization System. WWW 2012.
softwarestudies.com 22
23. mediavis vs. normal
computer science approach:
borrow techniques from media art, digital art,
information visualization / for non-technical users
explore the possibilities of simplest techniques by
using them with media collections from every area
of humanities
use mediavis to challenge existing concepts and
assumptions of humanities
softwarestudies.com 23
24.
25. Basic media visualization
techniques:
1 montage: sort images using metadata
2 slice: sample images and arrange using
metadata
3 image plot: automatically measure image
properties (features) and organize in 2D using
these measurements and metadata
softwarestudies.com 25
27. 1 montage close up: Time magazine covers, 1920s
softwarestudies.com 27
28. 1 montage close up: Time magazine covers, 1990s-2000s
softwarestudies.com 28
29. 2
slice: sample images and arrange using metadata
4535 Time covers, 1923-2009. Each line is a vertical slice through the center of an image.
softwarestudies.com 29
31. 3 image plot: organize images using features and
(optionally) metadata
Image plots of 4535 Time covers, 1923-2009. X-axis = date; Y-axis = saturation mean.
softwarestudies.com 31
33. Comparing a number of image sets with image plots
Selected paintings by six impressionist artists. X-axis = mean saturation. Y-axis =
median hue. Megan O’Rourke, 2012.
softwarestudies.com 33
35. visualizing video
collections:
use media visualization with a set of
keyframes
automatic selection of key frames
(for example, using free shot detection
software)
softwarestudies.com 35
36. Kingdom Hearts video game
62.5 hr. of game play, 29 sessions over 20 days.ys.
montage: 1 frame per 3 sec (22500 frames in total)
softwarestudies.com
39. 11th Year (Dziga Vertov, 1928): first frame of every shot
softwarestudies.com
40. 11th Year (Dziga Vertov, 1928): comparing first
and last frame in every shot (close-ups from
the larger visualization)
softwarestudies.com 40
41. Why use numbers?
Using numbers to describe
cultural artifacts allows to
replacing discrete
categories (words) with
continuos descriptions
(curves)
softwarestudies.com 41
42. 1 from timelines to graphs
2 better represent analog attributes
of cultural artifacts
3 map cultural landscapes (fuzzy /
overlapping / hard clusters?)
4 visualize cultural variability
5 discover new gropings
softwarestudies.com 42
43. 1 from timelines to curves Mark Rothko, 393 paintings (1927-1970).
X - year. Y - brightness mean. Hao Wang and Mayra Vasquez.
softwarestudies.com
44. 2 better represent analog attributes of cultural artifacts
Next slide:
close-up of a visualization showing average amount of
visual change (bar graph) in every shot in Vertov’s
11th year. Images above the bar: first frame of every
shot.
To measure visual change per shot:
1) calculate brightness mean of the difference image
between each two frames in the shot
2) add all means
3) divide by number of frames in the shot
softwarestudies.com
46. 3 the maps of cultural landscapes reveal fuzzy and
overlapping clusters - rather than discrete categories
with hard boundaries
softwarestudies.com 46
47. 4 visualize the space of variations
600 variations of Google Logo, 1988-2009
softwarestudies.com
54. 776 Vincent van Gogh paintings. X - year/month. Y - brightness mean.
softwarestudies.com 54
55. Current / recent projects
at softwarestudies.com:
6000+ paintings of French Impressionists
7000 year old stone arrowheads
(with UCSD anthropologist)
softwarestudies.com 55
56. samples from 4.7 million newspaper pages
collection from Library of Congress (UCSD
undergraduate students)
virtual world / game analytics (funded by NSF
Eager, with UCSD Experimental Games Lab)
comparing Art Now & Graphic design Flickr
groups (340,000 images)
(with CS collaborator from Laurence Berkeley
National Laboratory)
softwarestudies.com 56
57. Big project supported by Mellon Foundation
Grant, 2012-2015
- tools and workflows for working with image
and video collections using SEASR / MEANDRE
digital humanities workflow platform
- applications:
1) 1+ million images + millions of metadata
records from deviantArt (the largest social
network for user-created art - 20 M users, 240 M
artworks).
2) 1+ million manga pages.
3) thousands of hours TV poltical news and
online video
softwarestudies.com 57
59. “The capacity to collect and analyze massive amounts
of data has transformed such fields as biology and
physics. But the emergence of a data-driven
'computational social science' has been much slower.
Leading journals in economics, sociology, and political
science show little evidence of this field. But
computational social science is occurring in Internet
companies such as Google and Yahoo, and in
government agencies such as the U.S. National
Security Agency.”
“Computational Social Science.” Science, vol. 323, no.
6, February 2009.
softwarestudies.com 59
60. Massive amounts of cultural content and online
conversations, opinions, and cultural activities
(general and specialized social media networks;
personal and professional web sites ).
This data offers us unprecedented opportunities to
understand cultural processes and their dynamics
and develop new concepts and models which can be
also used to better understand the past.
Currently only analyzed by Google, Facebook,
YouTube, Bluefin labs, Echonest, and other
companies, and computer scientists working in
“social computing”- not yet by humanists.
softwarestudies.com 60
62. Our free open source software tools for
analyzing and visualizing large image and
video collections, publications and
projects:
softwarestudies.com
The tools run on Mac, PC, Unix.
All media visualizations in this presentation
were created by members of Software
softwarestudies.com 62