This document introduces text analysis and provides an overview of how to perform text analysis. It defines text analysis as using computational tools to quickly search large texts and conduct complex searches. The document explains that text analysis can help test intuitions during research by encouraging reflection on questions asked of texts. Several examples of text analysis tools and projects are provided, such as Voyant Tools, Google Ngram Viewer, and projects analyzing literary texts. A variety of open-source text analysis tools and programming languages are introduced.
1. Introduction to Text Analysis
MLA Annual Convention
Getting Started in the Digital Humanities
January 9, 2014
Lauren F. Klein
Georgia Institute of Technology
lauren.klein@lmc.gatech.edu
@laurenfklein
6. What is Text Analysis?
According to Geoffrey Rockwell:
•
“Text analysis systems can search large texts quickly. They do this by preparing
electronic indexes to the text so that the computer does not have to read through
the entire text. When finding words can be done so quickly that it is "interactive",
it changes how you can work with the text - you can serendipitously explore
without being frustrated by the slowness of the search process.
•
“Text analysis systems can conduct complex searches. Text analysis systems will
often allow you to search for lists of words or for complex patterns of words. For
example you can search for the co-occurrence of two words.
•
“Text analysis systems can present the results in ways that suit the study of
texts. Text analysis systems can display the results in a number of ways; for
example, a Keyword In Context display shows you all the occurrences of the found
word with one line of context.”
http://tada.mcmaster.ca/Main/WhatTA
13. Why Use Text Analysis?
Geoff Rockwell, again:
•
•
•
“Text analysis tools aide the interpreter asking questions of electronic texts.”
“Text analysis practices encourage reflection on the questions asked and
formalization of queries.”
“Text analysis is a way of targeting rereading that tests intuitions.”
14. Why Use Text Analysis?
Geoff Rockwell, again:
•
•
•
“Text analysis tools aide the interpreter asking questions of electronic texts.”
“Text analysis practices encourage reflection on the questions asked and
formalization of queries.”
“Text analysis is a way of targeting rereading that tests intuitions.”
15. Why Use Text Analysis?
Geoff Rockwell, again:
•
•
•
“Text analysis tools aide the interpreter asking questions of electronic texts.”
“Text analysis practices encourage reflection on the questions asked and
formalization of queries.”
“Text analysis is a way of targeting rereading that tests intuitions.”
Ted Underwood:
• “Proving a literary thesis with statistical analysis is often like cracking a nut with a
jackhammer. You can do it: but the results are not necessarily better than you
would get by hand.”
16. Why Use Text Analysis?
Geoff Rockwell, again:
•
•
•
“Text analysis tools aide the interpreter asking questions of electronic texts.”
“Text analysis practices encourage reflection on the questions asked and
formalization of queries.”
“Text analysis is a way of targeting rereading that tests intuitions.”
Ted Underwood:
• “Proving a literary thesis with statistical analysis is often like cracking a nut with a
jackhammer. You can do it: but the results are not necessarily better than you
would get by hand.”
What I think (in the spirit of Movable Type):
• Text analysis as “a way to tell a new story.”
26. Tools for Text Analysis
•
•
•
•
•
•
•
•
Wordle
Google Ngram Viewer
IBM Many Eyes
Voyant
MONK (requires institutional access)
MALLET
Stanford’s Natural Language Processing Toolkit
R
34. More Lists of Tools
• http://toolingup.stanford.edu/?page_id=367
• http://guides.library.upenn.edu/dhtextanalysi
s
• http://dirt.projectbamboo.org/categories/text
-mining