Information Literacy Data Visualization Holly Willis Institute for Multimedia Literacy USC 10 Questions Georgia International Conference on Information Literacy
What is data visualization? #1
“… situations when quantified data which by itself is not visual… is transformed into a visual representation…” Lev Manovich, The Future of Learning Institutions in a Digital Age “… the process whereby abstract data is rendered visual in a manner that helps elucidate relationships and supports forms of analysis. . . ” data visualization
pragmatic aesthetic versus
* Edward Tufte The Visual Display of Quantitative Information – > clarity – > precision – > efficiency
What is its history? #2
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ William Playfair +++ ++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++ Jacques Bertin: Semiology of Graphics +++ ++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++ John Tukey: Explanatory Data Analysis +++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++ Edward Tufte, The Visual Display of Quantitative Information ++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ * … historical precedents….
Balance of Trade Prices, Wages and Reigns William Playfair “ inventor of graphical forms”
Charles Minard Napoleon’s March to Moscow “ best statistical graphic”
John Snow Cholera deaths in London “ pioneer in disease mapping”
Etienne-Jules Marey motion studies “ pioneer of dynamic graphics”
Henry Beck London Underground 1933 simplicity
Charles & Ray Eames Powers of Ten 1977 Henry Beck
Icoro Doria Angola China Colombia
NameVoyager 2007 Laura & Martin Wattenberg
We Feel Fine Jonathan Harris Sep Kamvar
Why is it so prevalent? #3
data deluge capacity databases social media
What’s the process? #4
parse filter represent mine refine interact Ben Fry, Visualizing Data
organize simplify compare cause and effect contrast multiply views
What are the various types ? #5
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ comparative analysis ++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ time series +++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ part to whole relationships +++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ deviation analysis +++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++multivariate analysis+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ * … data visualization genres….
What’s storytelling with data? #6
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• --------- question + visual data + context = story ---------- ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• * Mattias Shapiro Once Upon a Stacked Time Series The Importance of Storytelling in Information Visualization
“ Stories have a marvelous way of focusing our attention and helping us discern why the data presented is important or relevant to some part of our lives.” Mattias Shapiro
Al Gore, An Inconvenient Truth
What are some tools? #7
Google Refine DataWrangler http://code.google.com/p/google-refine/ http://vis.stanford.edu/wrangler/ data cleaning
Google Fusion https://sites.google.com/site/fusiontablestalks/
Many Eyes http://www-958.ibm.com
Tableau Public http://www.tableausoftware.com/public
Who cares? #8
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ proliferation of data ++++++++++++++++++++++++++++++ +++++++++++++++++++++++ deeply rhetorical ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ lower barriers ++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ competencies ++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 21 st century citizenship +++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ * … .the significance of data visualization…
“ What kinds of participation are opened up when cycles of everyday action and the representation of collective sovereignty are bound so much more closely within planetary information networks , now responsive on a molecular level?” Natalie Jereminjenko and Benjamin Bratton
“ spectacularization information” of Natalie Jereminjenko and Benjamin Bratton
IBM | Motion Theory | Data Anthem | 2010
IBM | Motion Theory | Data Energy | 2010
IBM | Motion Theory | Data City | 2010
IBM | Motion Theory | Data Baby | 2010
IBM | Internet of Things | 2010
“ visualization criticism” Robert Kosara, “Visualization Criticism – The Missing Link Between Information Visualization and Art”
Enhanced State of the Union address | 2010
Enhanced State of the Union address | 2010 No visualization
Enhanced State of the Union address | 2010 Meaningless illustration
Enhanced State of the Union address | 2010 Rhetorical juxtaposition
Enhanced State of the Union address | 2010 Text visualization
Enhanced State of the Union address | 2010 Rhetorical Illustration
Enhanced State of the Union address | 2010 Data visualization
Enhanced State of the Union address | 2010 Rhetorical visualization
Enhanced State of the Union address | 2010 Mendacious visualization
State of the Union Visualizer | Brad Borevitz | 2010
George W. Bush SOTU | 2001 Harry S. Truman SOTU | 1953
George W. Bush SOTU | 2001 George W. Bush SOTU | 2003
Barack Obama | State of the Union | 2010
State of the Union | Lenka Clayton | 2003 Data Sonification
#9 Connection to the digital humanities?
What’s the future? #10
One of the challenges as we embark on a discussion of data visualization is the lack of a single definition. Because data visualization occurs across multiple disciplines, from computer science to graphic design, and from media literacy courses to fine arts programs, the terminology varies tremendously. Generally, however, data visualization attempts to mobilize our visual acuity to detect patterns and trends, and when done well, to reveal things that might not be known otherwise.
Lev Manovich notes that visualizations are “situations when quantified data which by itself is not visual… is transformed into a visual representation” (26). In this case, the emphasis is placed on the transformation from ideas that are not visual, to those that are. So data visualization is the process whereby abstract data is rendered visual in a manner that helps elucidate relationships and supports forms of analysis. Stephen Few distinguishes among Data visualization, Information visualization and Scientific visualization. Data visualizations includes those forms that “support the exploration, examination, and communication of data.” Information and scientific visualizations are subsets, then, of data visualization. There are many, many other subsets of data visualization. At MIT, Judith Donath, oversaw the Social Media Group at MIT’s Media Lab and has done great work with “social visualization,” which is defined as “Visualization of social information for social purposes.” It’s about “visualizing data that concerns people or is that somehow people-centered.” Karrie Karahalios at the Social Spaces Group at Illinois has also done this kind of work (smg.media.mit.edu social.cs.uiuc.edu).
Robert Kosara, in his essay “Visualization Criticism – The Missing Link Between Information Visualization and Art” makes a distinction between what he terms “pragmatic visualization and “aesthetic visualization” which represent two ends of a broad spectrum. Pragmatic visualization is “the technical application of visualization techniques to analyze data.” He notes that “the goal of pragmatic visualization is to explore, analyze, or present information in a way that allows the user to thoroughly understand the data.” He contrasts that with aesthetic visualization in which the goal “is to communicate a concern, rather than to show data.” In this case, the visual support, and its link to data, grounds the concern in “the real.”
In any case, to function effectively, a visualization has to move from the abstract or invisible to the visible; it must be visual, and communicate ideas clearly through the visual register; and it must be understandable. According to Edward Tufte, “excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency” (58). Obviously there are many different kinds of visualizations, with many differing purposes.
The history of information visualization often begins with the 2 nd century and the use of a simple arrangement of data in columns and rows. This was followed by Rene Descartes’ attempt to render mathematics visually in the 17 th century, and then the graphs of scientist William Playfair followed in the 18 th century. William Playfair (1759-1823) is often described as a “pioneering thinker,” understood the power of vision in making sense of numbers. “Prior to Playfair it was difficult to comprehend quantitative patterns, trends and exceptions.” (Few, 6) 1967 brought Jacques Bertin and his book Semiology of Graphics . This book describes a basic vocabulary of vision, and relies on an understanding of visual perception. The book “systematically classified the use of visual elements to display data and relationships. Bertin's system consists of seven visual variables: position, form, orientation, color, texture, value, and size, combined with a visual semantics for linking data attributes to visual elements. (Michael Friendly, website) John Tukey of Princeton gave us Explanatory Data Analysis, a book that offered a new approach to statistics. Edward Tufte followed with The Visual Display of Quantitative Information in 1983.
Balance of Trade, Prices, wages and reigns, Chart of the National Debt of England William Playfair (1759-1823) is generally viewed as the inventor of most of the common graphical forms used to display data: line plots, bar chart and pie chart. His The Commercial and Political Atlas, published in 1786, contained a number of interesting time-series charts such as these. In the first, the area between two time-series curves was emphasized to show the difference between them, representing the balance of trade. The second graph plots three parallel time series: prices, wages, and the reigns of British kings and queens. Among the benefits of graphical display, Playfair said, &quot;On inspecting any one of these Charts attentively, a sufficiently distinct impression will be made, to remain unimpaired for a considerable time, and the idea which does remain will be simple and complete, at once including the duration and the amount.&quot;
Minard's map of Napoleon's march to Moscow and back. The French engineer, Charles Minard (1781-1870), illustrated the disastrous result of Napoleon's failed Russian campaign of 1812. The graph shows the size of the army by the width of the band across the map of the campaign on its outward and return legs, with temperature on the retreat shown on the line graph at the bottom. Many consider Minard's original the best statistical graphic ever drawn.
Another favorite is Dr John Snow's map of the cholera outbreak in London's Soho district. He is viewed by many as a pioneer in disease mapping. The most famous, early example mapping epidemiological data was Dr. John Snow's map of deaths from a cholera outbreak in London, 1854, in relation to the locations of public water pumps. The original spawned many imitators including this simplified version by Gilbert in 1958. Tufte (1983, p. 24) says, &quot;Snow observed that cholera occurred almost entirely among those who lived near (and drank from) the Broad Street water pump. He had the handle of the contaminated pump removed, ending the neighborhood epidemic which had taken more than 500 lives.”
Etienne-Jules Marey, 1830-1906, was among the pioneers of dynamic graphics and the graphical representation of movement and dynamic phenomena. Marey used and developed many devices to record and visualize motion and dynamic phenomena: walking, running, jumping, falling, of humans, horses, cats...; heart rate, pulse rate, breathing, etc.
Rather than relying on geography or topography, as previous maps of subway systems had, Henry Beck called on his background in illustrating electrical circuits to create this iconic “tube map.” It’s technically not accurate, but it’s form as an abstraction is more useful.
When it comes to visualizing the scale of the universe and our place in it, few works as clearly, succinctly and beautifully convey it as &quot;Powers of Ten&quot;—the classic nine-minute film made for IBM by the legendary design team Charles and Ray Eames in 1977. &quot;Powers of Ten,&quot; the most acclaimed and influential Eames film—included in the National Film Registry at the Library of Congress—takes viewers on a visual journey of scale and magnitude, from the edge of space to a carbon atom in a man's hand. Every 10 seconds, the viewer moves out (and later in), 10 times farther than the point before. &quot;The film conveys a holistic vision of the universe and really helps people grasp the concept of 'scale' and its importance in understanding our world,&quot; said Eames Demetrios, the designers' grandson and director of the Eames Office, which preserves and extends the legacy and work of Charles and Ray Eames.
We can attribute the tremendous increase in data visualization to the concomitant increase in data. More and more institutions are participating in the creation and dissemination of data, and even the rise of user-generated content contributes to the amount of data around us. “According to one estimate, in 2010 alone we generated 1,200 exabytes—60 million times the content of the Library of Congress” (A Tour Through the Visualization Zoo, 1). We also have an increasing capacity for storage, an increasing number of open databases and a proliferation of online social networks that encourage the communication of ideas (Manuel Lima, founder of the site Complexity.com). Further, as computation becomes pervasive or ubiquitous, and as “things” begin to gather and communicate data, the proliferation only expands exponentially. We can now monitor data in far more granular and expansive ways, from the micro to the mega. As just one example of this expanding access to data, we can point to the launch of Data.gov, in 2009 by the then Federal Chief Information Officer Vivek Kundra. “The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.” However, users complain that the site is difficult to navigate, and requires a certain level of expertise for using the data sets that are available.
We analyze data in order to come to understanding and to make decisions. “The connections between data and decisions re built one good question at a time until understanding bridges the gap between them.” (Few 7). Descriptive questions: • What is happening • What is causing this to happen? Then we can ask predictive questions: • What do we want to happen? • What actions would likely lead to this desired outcome? (Few, 7).
Presentations are becoming a veritable cultural form; consider the impact of something like Al Gore’s An Inconvenient Truth, which began its life as a PowerPoint presentation, and attests to the widespread power a visual presentation can have.
Google Refeine: cleans your spreadsheets Data Wrangler: This Web-based service from Stanford University's Visualization Group is designed for cleaning and rearranging data so it's in a form that other tools such as a spreadsheet app can use.
Wordle is a Java applet developed several years ago by Jonathan Feinberg as a tool for creating something akin to tag clouds to represent the recurrence of words within a text; the tool allows you to choose from selected colors, layouts and fonts, and its objective is more creative and playful than information-based.
Google Fusion allows users to turn data into a chart or map. Users upload a file in several different formats and then choose how to display it: table, map, heatmap, line chart, bar graph, pie chart, scatter plot, timeline, storyline or motion. It is customizable, allowing you to change map icons and style info windows. Fusion Tables offers relatively quick charting and mapping, including geographic information system (GIS) functions to analyze data by geography. The service also automatically geocodes addresses, which is useful when trying to place numerous points on a map. This is an excellent tool for beginners and advanced beginners to use to get comfortable with analyzing data; it's also a good fit for people who don't program.
Visualization can't get much easier, and the results look considerably more sophisticated than you'd expect based on the minimal amount of effort needed to create them. Plus, the list of possible visualization types includes explanations of the types of data each one is best suited for. This example uses data from the US Social Security Adminstration to visualize the top five female names for births in 2008.
Tableau Public’s motto is “Data in. Brilliance out.” Tableau gives users several ways to display interactive data. You can combine multiple connected visualizations onto a single dashboard, where one search filter can act on numerous charts, graphs and maps; underlying data tables can also be joined. And once you get the hang of how the software works, its drag-and-drop interface is considerably quicker than manually coding, which in turns lets users experiment more. Image: shows which countries consume the most oil and whether or not they're net importers or exporters.
This user-friendly website generates color-coded maps; the colors change depending on underlying info such as population change or average income. It can also place markers on a map, varying the size of the markers based on a data table. Image: Boulder Real Estate
This desktop software is for analyzing data points that involve a time component. TimeFlow can generate visual timelines from text files, with entries color- and size-coded for easy pattern spotting. It also allows the information to be sorted and filtered, and it gives some statistical summaries of the data. Timeline View : plots events over time on a scrollable, horizontal timeline Calendar View : plots events by day, month, and year in calendar format Bar Chart View : a flexible, aggregate view of data points. It allows users to aggregate data by any header in the data set. Table View : a straightforward table view of all data points List View : a simple list of events shown on the timeline, complete with description and metadata about each data point
Gephi: Billed as a Photoshop for data, this open-source beta project is designed for visualizing statistical information, including relationships within networks of up to 50,000 nodes and half a million edges (connections or relationships) as well as network analyses of factors such as &quot;betweenness,&quot; closeness and clustering coefficient. Exploratory Data Analysis : intuition-oriented analysis by networks manipulations in real time. Link Analysis : revealing the underlying structures of associations between objects, in particular in scale-free networks. Social Network Analysis : easy creation of social data connectors to map community organizations and small-world networks. Biological Network analysis : representing patterns of biological data. Poster creation : scientific work promotion with hi-quality printable maps. These tools use a pre-Facebook/Twitter definition of &quot;social network analysis&quot; (SNA), referring to the discipline of finding connections between people based on various data sets. Investigative journalists have used such tools to, for example, find links between people who are involved in development projects or who are members of various boards of directors.
The first reason we should care is that the proliferation of data invites us to seek ways to understand and interpret it. However, in teaching data visualization to students we need to remember that visualizations are never “pure,” scientific or neutral. Indeed, more often they are used within contexts that function rhetorically – to make an argument, to convince us of something – and therefore they require that our students be able to name, contextualize and interpret visualizations. Third, the proliferation of data in various open networks means that our students can participate in the creation of visualizations, which in turn can extend and enhance their understanding of concepts, problems and processes. For these reasons, comprehending data visualization becomes part of a larger set of competencies that constitute good citizenship.
In a recent essay, Natalie Jereminjenko and Benjamin Bratton ask, “What kinds of participation are opened up when cycles of everyday action and the representation of collective sovereignty are bound so much more closely within planetary information networks, now responsive on a molecular level?” (ST3, 7). They argue that in place of expanding knowledge, we find instead junk, and they begin to question information visualization “as a format of the ‘political image’” (8). They discuss the “spectacularization of information” that acts to distance us from our ability to understand relationships and to act. We need to ask how data is generated, who collected and under what conditions. As Jeremijenko notes, with air pollution data, for example, the data is gathered due to the need for compliance with federal regulations. This creates a certain kind of data.
In regard to adopting this critical attitude, we see only scant attention to it. Kosara claims that “the first use of criticism in a visualization class (to our knowledge) was in 2002 by David Laidlaw and Fritz Drury in the course Virtual Reality Design for Science  at Brown University and the Rhode Island School of Design (RISD)” (4). He advocates for “visualization criticism” and indeed, while we have a lot of information, we don’t always know how to understand it to use it well. We have amazing tools for collecting, storing and accessing data but attention to interpreting both the tools and the information itself lags behind.
Description: Any fool can make a pie chart, but it takes great genius at Fox News to design one like this that won the award from FlowingData for the Best Pie Chart ever!
Simultaneous webcast archived and available for download. Slides appeared online as he was speaking.
In this case, there is no particular or overt visualization of data.
Here, we have what is essentially a meaningless illustration, in the guise of information.
Here we see a form of rhetorical juxtaposition.
I would argue that this is a kind of visualization, with the text and its design affecting the message.
This is what we might call rhetorical illustration, using these figures to suggest the figure noted above, namely “1 in 4.” But we’re not given any opportunity to gauge the information and its validity.
This is data visualization.
This is rhetorical illustration, again using design to influence meaning and dictate opintion.
And this is an especially questionable visualization that we would want to study and question. The impulse for many of us, though, is to treat these images a neutral.
Brad Borevitz : State of the Union (SOTU) provides access to the corpus of all the State of the Union addresses from 1790 to 2011. The tool allows you to explore how specific words gain and lose prominence over time, and to link to information on the historical context for their use. SOTU focuses on the relationship between individual addresses as compared to the entire collection of addresses, highlighting what is different about the selected document. You are invited to try and understand from this information the connection between politics and language–between the state we are in, and the language which names it and calls it into being. The horizontal axis shows the average position of a word in the document. The vertical axis displays the word’s relative frequency, determined by comparing how frequently the word occurs in the document to how frequently it appears throughout the entire body of SOTU addresses. The Data The data underneath the map of significant words shows trends in the language of the State of the Union addresses. On the graph, white bars indicate the word length of each address. The red dots indicate readability as measured by the address’s Flesch-Kincaid score, which is meant to suggest the grade level in an American school for which the text is comprehensible. The actual scores are displayed in the bottom right corner of the interface.
2003 State of the Union; chopped it up and put it in alphabetical order.
One of the primary tensions that arises in the consideration of data centers precisely on the tension between quantitative and qualitative analysis. Indeed, in thinking about what’s known as the “digital humanities,” we’ve seen a progression from practices that harnessed the power of computation in order to do better search and retrieval, or to analyze texts in a way that was more quantitative. However, as Todd Presner and Kate Hayles point out, the digital humanities has moved to adopt a more qualitative approach, one that “harnesses digital toolkits in the service of Humanities’ core methodological strengths: attention to complexity, medium specificity, historical context, analytical depth, critique and interpretation” (2009). Early explorations of the intersection of computing and the humanities focused on using computation; however, many argue that what’s necessary now is a fundamental understanding of computation. Moretti: doing distant instead of “close” readings of texts. “ Additionally, for people in everyday life who need the skills that enable them to negotiate an increasingly computational field – one need only think of the amount of data in regard to managing personal money, music, film, text, news, email, pensions, etc. – there will be calls for new skills of financial and technical literacy, or, more generally, a computational literacy or computational pedagogy that the digital humanities could contribute to.”