SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
Microscope, macroscope and zoom lens:
close, distant and scalable reading in the
Humanities
Digital Humanities Summer School,
University of Oxford,
7th
July 2023
Martin Wynne
Senior Researcher in Corpus Linguistics
Faculty of Linguistics, Philology and Phonetics
https://orcid.org/0000-0002-4155-0530
martin.wynne@ling-phil.ox.ac.uk
Martin Wynne Text Analysis 2
Summary
●
What is text analysis?
●
Close reading, distant reading, textual
interpretation
●
Corpus linguistics: the vanguard of digital
humanities
Martin Wynne Text Analysis 3
Types of text analysis
●
the study of rhetoric ('how language is used to persuade')
●
close reading ('focus on the words')
●
stylistics ('study of the language of literature')
●
stylometry ('quantifying aspects of the language of texts, especially for authorship attribution and investigating genres')
●
corpus linguistics ('developing and analysing large electronic datasets representative of particular language varieties')
●
distant reading ('studying and doing things with more text than you can read')
●
macroanalysis (‘plotting features in large corpora over time’)
●
discourse analysis ('categorizing and analysing structural elements of discourse')
●
critical corpus discourse analysis ('using corpus linguistic methods to reveal hidden agendas and motivations in texts')
●
deconstruction ('what the words are not saying or failing to say')
●
forensic linguistics (‘gathering legal evidence about attribution and meaning’)
●
qualitative social science ('annotation and analysis of interviews, survey results, etc.')
●
...and more...
Martin Wynne Text Analysis 4
My meaning of text analysis
A diverse, open and fluid set of methods, datasets and tools, to be
used in support of a variety of research processes, with the aim of
interpreting texts
Martin Wynne Text Analysis 5
...is not just for linguists, but also not for literary scholars,
historians, political scientists, sociologists, journalists, activists,
forensic scientists…
...and it can be useful for all or any of them
My sort of text analysis
Martin Wynne Text Analysis 6
“Text analysis tools aid the interpreter asking questions of texts”
Geoffrey Rockwell
https://web.archive.org/web/20150410205354/http://tada.mcmaster.ca/Main/WhatTA
Martin Wynne Text Analysis 7
Methods and Techniques
Search
- search large texts quickly
- search a collection of texts or corpus
- search enhanced by linguistic annotation
- complex searches
Analyze
- patterns of words
- collocations
- expanded co-text around words
- wordlists, keywords
- clusters, ngrams
Compare
- compare texts
- compare sections of a text with each other
- compare a text with a reference corpus
Visualize
- concordances
- distribution of features in a text or corpus
Close Reading
Elaine Showalter describes close reading
as:
“...slow reading, a deliberate attempt to
detach ourselves from the magical power of
story-telling and pay attention to language,
imagery, allusion, intertextuality, syntax
and form.”
It is, in her words, ‘a form of
defamiliarisation we use in order to break
through our habitual and casual reading
practices’ (Teaching Literature, p.98).
Further introductory reading:
●
https://www.york.ac.uk/english/writing-at-york/writing-resources/close-reading/
●
https://writingcenter.fas.harvard.edu/pages/how-do-close-reading
●
http://theliterarylink.com/closereading.html
Close Reading
●
Traditional criticism (biographical, social, historical, psychological)
...the new paradigm...
●
Practical criticism, New Criticism (concentrate on the ‘words on the page’)
●
Hermeneutics (theory and practice of interpretation)
●
Interpretation (always provisional, never final)
●
Inductive reasoning (not deductive and mathematical, based on experience and probabilities)
...the next new paradigm...
●
New Historicism (literature must be understood in its historical context)
...
https://www.english.cam.ac.uk/classroom/pracrit.htm
https://www.oxfordbibliographies.com/view/document/obo-9780190221911/obo-978019022191
1-0015.xml
Close reading as the paradigm for
text-based humanities scholarship
But what do you do with a million books?
There are only about 30,000 days in a human life -- at a book a
day, it would take 30 lifetimes to read a million books and our
research libraries contain more than ten times that number. Only
machines can read through the 400,000 books already publicly
available for free download from the Open Content Alliance.
 Gregory Crane, “What do you do with a million books?”
D-Lib Magazine, March 2006
And 5 million books?
We constructed a corpus of digitized texts containing about 4% of all books ever
printed. Analysis of this corpus enables us to investigate cultural trends
quantitatively. We survey the vast terrain of “culturomics” focusing on linguistic
and cultural phenomena that were reflected in the English language between
1800 and 2000. We show how this approach can provide insights about fields as
diverse as lexicography, the evolution of grammar, collective memory, the
adoption of technology, the pursuit of fame, censorship, and historical
epidemiology. “Culturomics” extends the boundaries of rigorous quantitative
inquiry to a wide array of new phenomena spanning the social sciences and the
humanities.
www.sciencexpress.org / 16 December 2010
Distant reading: where distance, let me
repeat it, is a condition of knowledge: it
allows you to focus on units that are much
smaller or much larger than the text:
devices, themes, tropes—or genres and
systems. And if, between the very small
and the very large, the text itself
disappears, well, it is one of those cases
when one can justifiably say, less is more.
If we want to understand the system in its
entirety, we must accept losing something.
We always pay a price for theoretical
knowledge: reality is infinitely rich;
concepts are abstract, are poor. But it’s
precisely this ‘poverty’ that makes it
possible to handle them, and therefore to
know. This is why less is actually more.
Franco Moretti, “Conjectures on World
Literature” Distant Reading, 2013.
Distant Reading
A canon of 200 novels, for instance,
sounds very large for 19th-century
Britain (and is much larger than the
current one), but it still less than 1% of
the novels that were actually published
[…] and close reading won’t help here, a
novel a day every day of the year would
take a century or so … And it’s not even
a matter of time, but of method: a field
this large cannot be understood by
stitching together separate bits of
knowledge about individual cases,
because it isn’t a sum of individual
cases: it’s a collective system, that
should be grasped as such, as a whole.
Franco Moretti, Graphs, Maps, Trees:
Abstract Models for Literary History, 2005
What are we ultimately aiming for
when it comes to digital scholarship in the Humanities?
Ways to combine close reading with
big data approaches.
From “distant”
(not) reading to
close reading and
back again...
Digital Humanities
as a locus for
“scalable” reading
practices
DATA: digitally
assisted text
analysis
Martin Mueller,
Northwestern
Martin Wynne Text Analysis 20
What do you need to know in order to move to
interpretation?
1. You need to know what’s in your dataset.
2. You need to know how to find what you are looking for.
3. You need to know how to make sense of what you find.
Martin Wynne Text Analysis 21
Software tools
●
AntConc
●
Sketch Engine
●
CQPweb
●
#LancsBox
●
English-corpora.org
●
KonText
●
Voyant Tools
●
CliC
●
Hansard at Huddersfield
●
...and more
Martin Wynne Text Analysis 22
Finding resources
●
CLARIN Virtual Language Observatory
(https://vlo.clarin.eu/)
●
CLARIN Resource Families
(https://www.clarin.eu/resource-families/)
Martin Wynne Text Analysis 23
Corpus Query Tools:
a CLARIN Resource Family
https://www.clarin.eu/resource-families/corpus-query-tools
The 'aftermath' of the seminar
Subject: Les Francais des Corpus – Aftermath
Dear colleagues,
First, many thanks for presenting at /attending
the Francais des Corpus Workshop and for making
it such a success.
I promised I would keep you in touch with one
another and hope that the full list of your e-
mail addresses above makes that possible.
…
KWIC concordance from Written BNC2014 generated in #lancsbox X
(a representative corpus of British English released in 2021).
'aftermath'
Collocates:
War
Gulf
coup
World
disaster
Tiananmen
death
revolution
defeat
Chernobyl
affair
riots
battle
massacre
wars
election
Crisis
events
explosion
invasion
trial
fire
June
Square
victory
accident
attempt
Significant collocates in the British National Corpus
(a representative corpus of British English released in 1994).
BNCWeb parameters:
There are 1486 different types in your collocation database
for the query "[word="aftermath"%c] [word="of"%c]".
(Your query "aftermath of" returned 544 hits in 337 different texts)
The selected range was 1 to 4.
Corpus basis for calculation: the whole BNC.
Type of calculation: Log-likelihood
Tag restriction: any noun
Collocates occur at least 5 times in the whole BNC.
Words collocate at least 5 times.
J. R. Firth (1890-1960)
“The complete meaning of a word is
always contextual, and no study of
meaning apart from context can be taken
seriously.”
J. R. Firth (1935). "The Technique of Semantics." Transactions of the Philological Society,
36-72; p. 37 (Reprinted in Firth (1957).
“You shall know a word by the company
it keeps.”
J. R. Firth (1957). "Papers in Linguistics, 1934-1951". Oxford: Oxford University Press.
What is a corpus?
“…a collection of pieces of language, selected and
ordered according to explicit linguistic criteria in
order to be used as a sample of the language.”
(Sinclair 1996)
What is Corpus Linguistics?
(1) Focus on linguistic performance, rather than competence
(2) Focus on linguistic description, rather than linguistic universals
(3) Focus on quantitative, as well as qualitative models of language
(4) Focus on a more empiricist, rather than rationalist view of
scientific inquiry.
(Leech 1992)
Antconc: explore your own texts and corpora
●
Download for free from
https://www.laurenceanthony.net/software/antconc/
●
Use with any 'plain' text’
●
Multilingual
capabilities
●
Does not interpret
mark-up or metadata
#LancsBox
Download for free from https://lancsbox.lancs.ac.uk/
●
Works with your own data or existing corpora
●
Visualizes language data
●
Analyses data in any language
●
Automatically annotates data for part-of-speech (for
some languages)
●
Wizard tool produces a prose report
●
Works with major operating systems (Windows, Mac,
Linux)
●
Latest version #LancsBox X launched 2023
CQPweb:
Online interface for indexed corpora
http://cqpweb.lancs.ac.uk
...but now also with a new feature
to upload data, in limited ways...
SketchEngine: an online interface for
your corpus
https://www.sketchengine.eu/
Access to Sketch Engine is by paid subscription. Individual licences are available from €6.56
per month, with free trials available.
Martin Wynne Text Analysis 34
A new opportunity
"It is not easy to justify assertions about the alleged frequency of infrequency of
some particular belief or attitude in the past. How many examples does one need to
cite in order to prove the point? Lacking any satisfactory method of quantifying
these matters, all I can do is to record my impressions after long immersion in the
period."
Keith Thomas, The Ends of Life, Oxford University Press, 2009.
“But the sad truth is that much of what it has taken me a lifetime to build up by
painful accumulation can now be achieved by a moderately diligent student in the
course of a morning.”
Keith Thomas, Diary, London Review of Books, 10 June 2010.
Martin Wynne Text Analysis 35
Some (more or less) testable assertions
Tudor
 “The idea of a "Tudor era" in history is a misleading invention, claims an Oxford University
historian. Cliff Davies says his research shows the term "Tudor" was barely ever used
during the time of Tudor monarchs.” (http://www.bbc.co.uk/news/education-18240901
May 2012)
Holocaust
 “I will argue that “The Holocaust” is an ideological representation of the Nazi
holocaust...Until recently, however, the Nazi holocaust barely figured in American life.
Between the end of World War II and the late 60s, only a handful of books and films
touched on the subject”. (Norman Finkelstein, The Holocaust Industry. Verso, 2000.)
State
●
“...no political writer before the middle of the sixteenth century used the word 'state' in
anything like its modern political sense” [referring to the machinery of government and
social control] (Quentin Skinner, The Foundations of Modern Political Thought, Cambridge
University Press, 1978).
0
6
/
0
7
/
2
3
Annotation
Annotation of texts should include structural markup, metadata, and linguistic
annotation, including:
- Standardized metadata for basic categories such as language, relevant dates,
author, title and text type;
- Part-of-speech tagging;
- Lemmatization; and
- Modernized (or otherwise normalized) forms
...and these can be the basis for further levels of annotation, such as:
- semantic tags
- named entity recognition
- etc.
Martin Wynne Text Analysis 39
Digital scholarship in the Humanities
and Digital Science
Issues and assumptions in scientific research:
●
Consensus (and compromise) about funding priorities
●
Adoption of technical standards
●
Standards for the representation of knowledge and interpretations (agreement on concepts and categories!)
●
Reproducibility and replicability of research
●
Sharing of generic tools
●
Curation of tools and data in professional service centres
●
Support for software sustainability
●
Promotion of interoperability of resources and tools
●
Sharing research outputs
●
Research leading to an accumulation of knowledge
●
Increasingly data-driven research
CLARIN ERIC in members and centres
40
Official membership
• 23 members
• 3 observers
• 1 linked party
A distributed network of >60 centres
25 CTS certified data centres,
strong focus on FAIRness & interoperability
• federated login:
• central metadata harvesting for easy discovery:
• chained services:
• language data - in written, spoken, video or multimodal form
• advanced tools - to discover, explore, exploit, annotate, analyse
or combine data sets, wherever they are located
CLARIN corpus resources and tools
Corpora: at least 4130 - see VLO (https://vlo.clarin.eu/) !
Online interfaces:
● Corpuscle
● Korp
● KonText
● NoSketch Engine
● D* (Diacollo demo)
● TEITOK
Federated content search: https://contentsearch.clarin.eu/
Resource Families:
● 13 curated guides to different types of corpora and how to get them
● Coming soon: Desktop corpus tools and Online corpus tools
Online and desktop tools for corpus analysis
“Corpus, concordance, collocation”
Diachronic collocations in a text collection: DiaCollo from the Deutsches Textarchiv
Diachronic collocations in a text collection: DiaCollo from the Deutsches Textarchiv
Martin Wynne Text Analysis 48
Types of Text Analysis: Further Reading
●
Baker, P (2006), Using Corpora in Discourse Analysis, London: Continuum [summary and further information at https://www.lancaster.ac.uk/staff/bakerjp/usingcorpora.htm
]
●
Baker, P (2012), ‘Acceptable Bias? Using Corpus Linguistics Methods with Critical Discourse Analysis’, Critical Discourse Studies 9.3 (2012): 247-56. Web.
●
Bode, K (2017), The Equivalence of “Close” and “Distant” Reading; or, Toward a New Object for Data-Rich Literary History, Modern Language Quarterly (2017) 78 (1): 77–106.
DOI 10.1215/00267929-3699787
●
Cheng, W. (2013). ‘Corpus-based linguistic approaches to critical discourse analysis. In The encyclopedia of applied linguistics’ (pp. 1-8). Wiley-Blackwell.
https://doi.org/10.1002/9781405198431.wbeal0262 [full book chapter available from https://www.researchgate.net/publication/262070226]
●
Gadd. Ian. ‘The Use and Misuse of Early English Books Online’ in Literature Compass 6/3 (2009): 680–692 https://doi.org/10.1111/j.1741-4113.2009.00632.x
●
Hamed, D (2020), ‘Keywords and collocations in US presidential discourse since 1993: a corpus-assisted analysis’, in Journal of Humanities and Applied Social Sciences, Vol. 3 No.
2, 2021 pp. 137-158 Emerald Publishing Limited 2632-279X DOI 10.1108/JHASS-01-2020-0019
●
Kichuk, Diana. ‘Metamorphosis: Remediation in Early English Books Online (EEBO)’. Literary and Linguistic Computing 22.3 (2007): 291–303. [available from
https://hfroehlich.files.wordpress.com/2016/07/lit-linguist-computing-2007-kichuk-291-303.pdf
]
●
Leech, G. N., & Short, M. H. (1981). Style in Fiction. London: Longman.
●
Mahlberg, M. (2013), Corpus Stylistics and Dickens’s Fiction, Routledge.
●
Martin, Shawn. ‘EEBO, Microfilm, and Umberto Eco: Historical Lessons and Future Directions for Building Electronic Collections’. Microform & Imaging Review 36.4 (2007): 159–
64 [available from https://repository.upenn.edu/cgi/viewcontent.cgi?article=1072&context=library_papers
]
●
Showalter, E (2002), Teaching Literature, London: Wiley-Blackwell.
●
Sinclair, J (1991), Corpus, Concordance, Collocation, Oxford: OUP.
●
Rockwell, G (2005), ‘What is Text Analysis’ [https://web.archive.org/web/20150410205354/http://tada.mcmaster.ca/Main/WhatTA]
●
Underwood, Ted (2015), Seven ways humanists are using computers to understand text. (blog post at
https://tedunderwood.com/2015/06/04/seven-ways-humanists-are-using-computers-to-understand-text/
)
●
John Unsworth, “How Not To Read A Million Books,” with Tanya Clement, Sara Steger, and Kirsten Uszkalo, Harvard University, Cambridge, MA (October 2008) [blog post at
https://people.brandeis.edu/~unsworth/hownot2read.rutgers.html
]
●
Text Analysis in ‘Tooling up for Digital Humanties’ blog at http://toolingup.stanford.edu/?page_id=981
●
More information about the Text Creation Partnership https://quod.lib.umich.edu/e/eebogroup/

Contenu connexe

Similaire à MacroMicroZoom.pdf

Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTAMartin Wynne
 
LCC CTS 2 Option.docx
LCC CTS 2 Option.docxLCC CTS 2 Option.docx
LCC CTS 2 Option.docxwrite4
 
Reading at a Millions Crossroads
Reading at a Millions CrossroadsReading at a Millions Crossroads
Reading at a Millions CrossroadsDouglas K. Hartman
 
An Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsAn Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsTye Rausch
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discoursePascual Pérez-Paredes
 
(48) (human cognitive processing) alexander ziem frames of understanding in t...
(48) (human cognitive processing) alexander ziem frames of understanding in t...(48) (human cognitive processing) alexander ziem frames of understanding in t...
(48) (human cognitive processing) alexander ziem frames of understanding in t...Nelli17
 
Approaches To Narrative Research
Approaches To Narrative ResearchApproaches To Narrative Research
Approaches To Narrative ResearchValerie Felton
 
1 discourse analysis.ppt
1 discourse analysis.ppt1 discourse analysis.ppt
1 discourse analysis.pptUtamitri67
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysisRubyaShaheen
 
An Interdisciplinary Bibliography for Computers and the Humanities Courses.pdf
An Interdisciplinary Bibliography for Computers and the Humanities Courses.pdfAn Interdisciplinary Bibliography for Computers and the Humanities Courses.pdf
An Interdisciplinary Bibliography for Computers and the Humanities Courses.pdfApril Smith
 
Compiling a Monolingual Dictionary for Native Speakers
Compiling a Monolingual Dictionary for Native SpeakersCompiling a Monolingual Dictionary for Native Speakers
Compiling a Monolingual Dictionary for Native Speakersmostlyharmless
 
Realisation of text coherence in english and albanian languages through conce...
Realisation of text coherence in english and albanian languages through conce...Realisation of text coherence in english and albanian languages through conce...
Realisation of text coherence in english and albanian languages through conce...Alexander Decker
 
Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...
Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...
Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...ChisomOgamba
 
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...NatGustafsonSundell
 
Topics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdfTopics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdfJennifer Triepke
 
diploma paper for presentation
diploma paper for presentationdiploma paper for presentation
diploma paper for presentationStanislav Lazarev
 
Text set key assignment
Text set key assignmentText set key assignment
Text set key assignmentCalago Hipps
 

Similaire à MacroMicroZoom.pdf (20)

Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTA
 
LCC CTS 2 Option.docx
LCC CTS 2 Option.docxLCC CTS 2 Option.docx
LCC CTS 2 Option.docx
 
Reading at a Millions Crossroads
Reading at a Millions CrossroadsReading at a Millions Crossroads
Reading at a Millions Crossroads
 
An Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsAn Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical Semantics
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
(48) (human cognitive processing) alexander ziem frames of understanding in t...
(48) (human cognitive processing) alexander ziem frames of understanding in t...(48) (human cognitive processing) alexander ziem frames of understanding in t...
(48) (human cognitive processing) alexander ziem frames of understanding in t...
 
Approaches To Narrative Research
Approaches To Narrative ResearchApproaches To Narrative Research
Approaches To Narrative Research
 
1 discourse analysis.ppt
1 discourse analysis.ppt1 discourse analysis.ppt
1 discourse analysis.ppt
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
An Interdisciplinary Bibliography for Computers and the Humanities Courses.pdf
An Interdisciplinary Bibliography for Computers and the Humanities Courses.pdfAn Interdisciplinary Bibliography for Computers and the Humanities Courses.pdf
An Interdisciplinary Bibliography for Computers and the Humanities Courses.pdf
 
Compiling a Monolingual Dictionary for Native Speakers
Compiling a Monolingual Dictionary for Native SpeakersCompiling a Monolingual Dictionary for Native Speakers
Compiling a Monolingual Dictionary for Native Speakers
 
Realisation of text coherence in english and albanian languages through conce...
Realisation of text coherence in english and albanian languages through conce...Realisation of text coherence in english and albanian languages through conce...
Realisation of text coherence in english and albanian languages through conce...
 
Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...
Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...
Critical Narrative Analysis in Linguistics: Analysing a Homodiegetic Rape Nar...
 
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
The New Past, and a Speculative Future, of Literature: A Brief Discussion of ...
 
Topics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdfTopics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdf
 
diploma paper for presentation
diploma paper for presentationdiploma paper for presentation
diploma paper for presentation
 
8 how to teach literature (and comics)
8 how to teach literature (and comics) 8 how to teach literature (and comics)
8 how to teach literature (and comics)
 
Seminario "Beyond Annotation: learning through close reading of media texts"....
Seminario "Beyond Annotation: learning through close reading of media texts"....Seminario "Beyond Annotation: learning through close reading of media texts"....
Seminario "Beyond Annotation: learning through close reading of media texts"....
 
Text set key assignment
Text set key assignmentText set key assignment
Text set key assignment
 
Choosing a topic
Choosing a topicChoosing a topic
Choosing a topic
 

Plus de Martin Wynne

CLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposalsCLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposalsMartin Wynne
 
CLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaborationCLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaborationMartin Wynne
 
Forty-five Years of the OTA
Forty-five Years of the OTAForty-five Years of the OTA
Forty-five Years of the OTAMartin Wynne
 
Corpus Approaches to the Language of Literature 2008
Corpus Approaches to the Language of Literature 2008Corpus Approaches to the Language of Literature 2008
Corpus Approaches to the Language of Literature 2008Martin Wynne
 
Exploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentExploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentMartin Wynne
 
Corpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and LearningCorpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and LearningMartin Wynne
 
Big data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesBig data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesMartin Wynne
 
Hacking EEBO: colour terms
Hacking EEBO: colour termsHacking EEBO: colour terms
Hacking EEBO: colour termsMartin Wynne
 
When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?Martin Wynne
 
Annotated Corpora for Research in the Humanities
Annotated Corpora for Research in the HumanitiesAnnotated Corpora for Research in the Humanities
Annotated Corpora for Research in the HumanitiesMartin Wynne
 

Plus de Martin Wynne (10)

CLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposalsCLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposals
 
CLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaborationCLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaboration
 
Forty-five Years of the OTA
Forty-five Years of the OTAForty-five Years of the OTA
Forty-five Years of the OTA
 
Corpus Approaches to the Language of Literature 2008
Corpus Approaches to the Language of Literature 2008Corpus Approaches to the Language of Literature 2008
Corpus Approaches to the Language of Literature 2008
 
Exploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentExploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic Enlightenment
 
Corpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and LearningCorpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and Learning
 
Big data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesBig data and Digital Transformations in the Humanities
Big data and Digital Transformations in the Humanities
 
Hacking EEBO: colour terms
Hacking EEBO: colour termsHacking EEBO: colour terms
Hacking EEBO: colour terms
 
When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?
 
Annotated Corpora for Research in the Humanities
Annotated Corpora for Research in the HumanitiesAnnotated Corpora for Research in the Humanities
Annotated Corpora for Research in the Humanities
 

Dernier

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Dernier (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

MacroMicroZoom.pdf

  • 1. Microscope, macroscope and zoom lens: close, distant and scalable reading in the Humanities Digital Humanities Summer School, University of Oxford, 7th July 2023 Martin Wynne Senior Researcher in Corpus Linguistics Faculty of Linguistics, Philology and Phonetics https://orcid.org/0000-0002-4155-0530 martin.wynne@ling-phil.ox.ac.uk
  • 2. Martin Wynne Text Analysis 2 Summary ● What is text analysis? ● Close reading, distant reading, textual interpretation ● Corpus linguistics: the vanguard of digital humanities
  • 3. Martin Wynne Text Analysis 3 Types of text analysis ● the study of rhetoric ('how language is used to persuade') ● close reading ('focus on the words') ● stylistics ('study of the language of literature') ● stylometry ('quantifying aspects of the language of texts, especially for authorship attribution and investigating genres') ● corpus linguistics ('developing and analysing large electronic datasets representative of particular language varieties') ● distant reading ('studying and doing things with more text than you can read') ● macroanalysis (‘plotting features in large corpora over time’) ● discourse analysis ('categorizing and analysing structural elements of discourse') ● critical corpus discourse analysis ('using corpus linguistic methods to reveal hidden agendas and motivations in texts') ● deconstruction ('what the words are not saying or failing to say') ● forensic linguistics (‘gathering legal evidence about attribution and meaning’) ● qualitative social science ('annotation and analysis of interviews, survey results, etc.') ● ...and more...
  • 4. Martin Wynne Text Analysis 4 My meaning of text analysis A diverse, open and fluid set of methods, datasets and tools, to be used in support of a variety of research processes, with the aim of interpreting texts
  • 5. Martin Wynne Text Analysis 5 ...is not just for linguists, but also not for literary scholars, historians, political scientists, sociologists, journalists, activists, forensic scientists… ...and it can be useful for all or any of them My sort of text analysis
  • 6. Martin Wynne Text Analysis 6 “Text analysis tools aid the interpreter asking questions of texts” Geoffrey Rockwell https://web.archive.org/web/20150410205354/http://tada.mcmaster.ca/Main/WhatTA
  • 7. Martin Wynne Text Analysis 7 Methods and Techniques Search - search large texts quickly - search a collection of texts or corpus - search enhanced by linguistic annotation - complex searches Analyze - patterns of words - collocations - expanded co-text around words - wordlists, keywords - clusters, ngrams Compare - compare texts - compare sections of a text with each other - compare a text with a reference corpus Visualize - concordances - distribution of features in a text or corpus
  • 8. Close Reading Elaine Showalter describes close reading as: “...slow reading, a deliberate attempt to detach ourselves from the magical power of story-telling and pay attention to language, imagery, allusion, intertextuality, syntax and form.” It is, in her words, ‘a form of defamiliarisation we use in order to break through our habitual and casual reading practices’ (Teaching Literature, p.98). Further introductory reading: ● https://www.york.ac.uk/english/writing-at-york/writing-resources/close-reading/ ● https://writingcenter.fas.harvard.edu/pages/how-do-close-reading ● http://theliterarylink.com/closereading.html
  • 9. Close Reading ● Traditional criticism (biographical, social, historical, psychological) ...the new paradigm... ● Practical criticism, New Criticism (concentrate on the ‘words on the page’) ● Hermeneutics (theory and practice of interpretation) ● Interpretation (always provisional, never final) ● Inductive reasoning (not deductive and mathematical, based on experience and probabilities) ...the next new paradigm... ● New Historicism (literature must be understood in its historical context) ... https://www.english.cam.ac.uk/classroom/pracrit.htm https://www.oxfordbibliographies.com/view/document/obo-9780190221911/obo-978019022191 1-0015.xml
  • 10. Close reading as the paradigm for text-based humanities scholarship
  • 11. But what do you do with a million books? There are only about 30,000 days in a human life -- at a book a day, it would take 30 lifetimes to read a million books and our research libraries contain more than ten times that number. Only machines can read through the 400,000 books already publicly available for free download from the Open Content Alliance.  Gregory Crane, “What do you do with a million books?” D-Lib Magazine, March 2006
  • 12. And 5 million books? We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of “culturomics” focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. “Culturomics” extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities. www.sciencexpress.org / 16 December 2010
  • 13.
  • 14. Distant reading: where distance, let me repeat it, is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems. And if, between the very small and the very large, the text itself disappears, well, it is one of those cases when one can justifiably say, less is more. If we want to understand the system in its entirety, we must accept losing something. We always pay a price for theoretical knowledge: reality is infinitely rich; concepts are abstract, are poor. But it’s precisely this ‘poverty’ that makes it possible to handle them, and therefore to know. This is why less is actually more. Franco Moretti, “Conjectures on World Literature” Distant Reading, 2013.
  • 15. Distant Reading A canon of 200 novels, for instance, sounds very large for 19th-century Britain (and is much larger than the current one), but it still less than 1% of the novels that were actually published […] and close reading won’t help here, a novel a day every day of the year would take a century or so … And it’s not even a matter of time, but of method: a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole. Franco Moretti, Graphs, Maps, Trees: Abstract Models for Literary History, 2005
  • 16.
  • 17.
  • 18. What are we ultimately aiming for when it comes to digital scholarship in the Humanities? Ways to combine close reading with big data approaches.
  • 19. From “distant” (not) reading to close reading and back again... Digital Humanities as a locus for “scalable” reading practices DATA: digitally assisted text analysis Martin Mueller, Northwestern
  • 20. Martin Wynne Text Analysis 20 What do you need to know in order to move to interpretation? 1. You need to know what’s in your dataset. 2. You need to know how to find what you are looking for. 3. You need to know how to make sense of what you find.
  • 21. Martin Wynne Text Analysis 21 Software tools ● AntConc ● Sketch Engine ● CQPweb ● #LancsBox ● English-corpora.org ● KonText ● Voyant Tools ● CliC ● Hansard at Huddersfield ● ...and more
  • 22. Martin Wynne Text Analysis 22 Finding resources ● CLARIN Virtual Language Observatory (https://vlo.clarin.eu/) ● CLARIN Resource Families (https://www.clarin.eu/resource-families/)
  • 23. Martin Wynne Text Analysis 23 Corpus Query Tools: a CLARIN Resource Family https://www.clarin.eu/resource-families/corpus-query-tools
  • 24. The 'aftermath' of the seminar Subject: Les Francais des Corpus – Aftermath Dear colleagues, First, many thanks for presenting at /attending the Francais des Corpus Workshop and for making it such a success. I promised I would keep you in touch with one another and hope that the full list of your e- mail addresses above makes that possible. …
  • 25. KWIC concordance from Written BNC2014 generated in #lancsbox X (a representative corpus of British English released in 2021).
  • 26. 'aftermath' Collocates: War Gulf coup World disaster Tiananmen death revolution defeat Chernobyl affair riots battle massacre wars election Crisis events explosion invasion trial fire June Square victory accident attempt Significant collocates in the British National Corpus (a representative corpus of British English released in 1994). BNCWeb parameters: There are 1486 different types in your collocation database for the query "[word="aftermath"%c] [word="of"%c]". (Your query "aftermath of" returned 544 hits in 337 different texts) The selected range was 1 to 4. Corpus basis for calculation: the whole BNC. Type of calculation: Log-likelihood Tag restriction: any noun Collocates occur at least 5 times in the whole BNC. Words collocate at least 5 times.
  • 27. J. R. Firth (1890-1960) “The complete meaning of a word is always contextual, and no study of meaning apart from context can be taken seriously.” J. R. Firth (1935). "The Technique of Semantics." Transactions of the Philological Society, 36-72; p. 37 (Reprinted in Firth (1957). “You shall know a word by the company it keeps.” J. R. Firth (1957). "Papers in Linguistics, 1934-1951". Oxford: Oxford University Press.
  • 28. What is a corpus? “…a collection of pieces of language, selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.” (Sinclair 1996)
  • 29. What is Corpus Linguistics? (1) Focus on linguistic performance, rather than competence (2) Focus on linguistic description, rather than linguistic universals (3) Focus on quantitative, as well as qualitative models of language (4) Focus on a more empiricist, rather than rationalist view of scientific inquiry. (Leech 1992)
  • 30. Antconc: explore your own texts and corpora ● Download for free from https://www.laurenceanthony.net/software/antconc/ ● Use with any 'plain' text’ ● Multilingual capabilities ● Does not interpret mark-up or metadata
  • 31. #LancsBox Download for free from https://lancsbox.lancs.ac.uk/ ● Works with your own data or existing corpora ● Visualizes language data ● Analyses data in any language ● Automatically annotates data for part-of-speech (for some languages) ● Wizard tool produces a prose report ● Works with major operating systems (Windows, Mac, Linux) ● Latest version #LancsBox X launched 2023
  • 32. CQPweb: Online interface for indexed corpora http://cqpweb.lancs.ac.uk ...but now also with a new feature to upload data, in limited ways...
  • 33. SketchEngine: an online interface for your corpus https://www.sketchengine.eu/ Access to Sketch Engine is by paid subscription. Individual licences are available from €6.56 per month, with free trials available.
  • 34. Martin Wynne Text Analysis 34 A new opportunity "It is not easy to justify assertions about the alleged frequency of infrequency of some particular belief or attitude in the past. How many examples does one need to cite in order to prove the point? Lacking any satisfactory method of quantifying these matters, all I can do is to record my impressions after long immersion in the period." Keith Thomas, The Ends of Life, Oxford University Press, 2009. “But the sad truth is that much of what it has taken me a lifetime to build up by painful accumulation can now be achieved by a moderately diligent student in the course of a morning.” Keith Thomas, Diary, London Review of Books, 10 June 2010.
  • 35. Martin Wynne Text Analysis 35 Some (more or less) testable assertions Tudor  “The idea of a "Tudor era" in history is a misleading invention, claims an Oxford University historian. Cliff Davies says his research shows the term "Tudor" was barely ever used during the time of Tudor monarchs.” (http://www.bbc.co.uk/news/education-18240901 May 2012) Holocaust  “I will argue that “The Holocaust” is an ideological representation of the Nazi holocaust...Until recently, however, the Nazi holocaust barely figured in American life. Between the end of World War II and the late 60s, only a handful of books and films touched on the subject”. (Norman Finkelstein, The Holocaust Industry. Verso, 2000.) State ● “...no political writer before the middle of the sixteenth century used the word 'state' in anything like its modern political sense” [referring to the machinery of government and social control] (Quentin Skinner, The Foundations of Modern Political Thought, Cambridge University Press, 1978).
  • 36. 0 6 / 0 7 / 2 3 Annotation Annotation of texts should include structural markup, metadata, and linguistic annotation, including: - Standardized metadata for basic categories such as language, relevant dates, author, title and text type; - Part-of-speech tagging; - Lemmatization; and - Modernized (or otherwise normalized) forms ...and these can be the basis for further levels of annotation, such as: - semantic tags - named entity recognition - etc.
  • 37.
  • 38.
  • 39. Martin Wynne Text Analysis 39 Digital scholarship in the Humanities and Digital Science Issues and assumptions in scientific research: ● Consensus (and compromise) about funding priorities ● Adoption of technical standards ● Standards for the representation of knowledge and interpretations (agreement on concepts and categories!) ● Reproducibility and replicability of research ● Sharing of generic tools ● Curation of tools and data in professional service centres ● Support for software sustainability ● Promotion of interoperability of resources and tools ● Sharing research outputs ● Research leading to an accumulation of knowledge ● Increasingly data-driven research
  • 40. CLARIN ERIC in members and centres 40 Official membership • 23 members • 3 observers • 1 linked party A distributed network of >60 centres 25 CTS certified data centres, strong focus on FAIRness & interoperability • federated login: • central metadata harvesting for easy discovery: • chained services: • language data - in written, spoken, video or multimodal form • advanced tools - to discover, explore, exploit, annotate, analyse or combine data sets, wherever they are located
  • 41. CLARIN corpus resources and tools Corpora: at least 4130 - see VLO (https://vlo.clarin.eu/) ! Online interfaces: ● Corpuscle ● Korp ● KonText ● NoSketch Engine ● D* (Diacollo demo) ● TEITOK Federated content search: https://contentsearch.clarin.eu/ Resource Families: ● 13 curated guides to different types of corpora and how to get them ● Coming soon: Desktop corpus tools and Online corpus tools
  • 42.
  • 43. Online and desktop tools for corpus analysis “Corpus, concordance, collocation”
  • 44. Diachronic collocations in a text collection: DiaCollo from the Deutsches Textarchiv
  • 45. Diachronic collocations in a text collection: DiaCollo from the Deutsches Textarchiv
  • 46.
  • 47.
  • 48. Martin Wynne Text Analysis 48 Types of Text Analysis: Further Reading ● Baker, P (2006), Using Corpora in Discourse Analysis, London: Continuum [summary and further information at https://www.lancaster.ac.uk/staff/bakerjp/usingcorpora.htm ] ● Baker, P (2012), ‘Acceptable Bias? Using Corpus Linguistics Methods with Critical Discourse Analysis’, Critical Discourse Studies 9.3 (2012): 247-56. Web. ● Bode, K (2017), The Equivalence of “Close” and “Distant” Reading; or, Toward a New Object for Data-Rich Literary History, Modern Language Quarterly (2017) 78 (1): 77–106. DOI 10.1215/00267929-3699787 ● Cheng, W. (2013). ‘Corpus-based linguistic approaches to critical discourse analysis. In The encyclopedia of applied linguistics’ (pp. 1-8). Wiley-Blackwell. https://doi.org/10.1002/9781405198431.wbeal0262 [full book chapter available from https://www.researchgate.net/publication/262070226] ● Gadd. Ian. ‘The Use and Misuse of Early English Books Online’ in Literature Compass 6/3 (2009): 680–692 https://doi.org/10.1111/j.1741-4113.2009.00632.x ● Hamed, D (2020), ‘Keywords and collocations in US presidential discourse since 1993: a corpus-assisted analysis’, in Journal of Humanities and Applied Social Sciences, Vol. 3 No. 2, 2021 pp. 137-158 Emerald Publishing Limited 2632-279X DOI 10.1108/JHASS-01-2020-0019 ● Kichuk, Diana. ‘Metamorphosis: Remediation in Early English Books Online (EEBO)’. Literary and Linguistic Computing 22.3 (2007): 291–303. [available from https://hfroehlich.files.wordpress.com/2016/07/lit-linguist-computing-2007-kichuk-291-303.pdf ] ● Leech, G. N., & Short, M. H. (1981). Style in Fiction. London: Longman. ● Mahlberg, M. (2013), Corpus Stylistics and Dickens’s Fiction, Routledge. ● Martin, Shawn. ‘EEBO, Microfilm, and Umberto Eco: Historical Lessons and Future Directions for Building Electronic Collections’. Microform & Imaging Review 36.4 (2007): 159– 64 [available from https://repository.upenn.edu/cgi/viewcontent.cgi?article=1072&context=library_papers ] ● Showalter, E (2002), Teaching Literature, London: Wiley-Blackwell. ● Sinclair, J (1991), Corpus, Concordance, Collocation, Oxford: OUP. ● Rockwell, G (2005), ‘What is Text Analysis’ [https://web.archive.org/web/20150410205354/http://tada.mcmaster.ca/Main/WhatTA] ● Underwood, Ted (2015), Seven ways humanists are using computers to understand text. (blog post at https://tedunderwood.com/2015/06/04/seven-ways-humanists-are-using-computers-to-understand-text/ ) ● John Unsworth, “How Not To Read A Million Books,” with Tanya Clement, Sara Steger, and Kirsten Uszkalo, Harvard University, Cambridge, MA (October 2008) [blog post at https://people.brandeis.edu/~unsworth/hownot2read.rutgers.html ] ● Text Analysis in ‘Tooling up for Digital Humanties’ blog at http://toolingup.stanford.edu/?page_id=981 ● More information about the Text Creation Partnership https://quod.lib.umich.edu/e/eebogroup/