SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Corpus Approaches to the
Language of Literature
Martin Wynne
Oxford Text Archive
University of Oxford
martin.wynne@oucs.ox.ac.uk
OTA
PALA
PALA 2008
6
What is corpus stylistics?
The use of the resources, tools and
methodologies of corpus linguistics to
carry out literary analysis on the basis
of the language of literature.
7
Corpus Stylistics - Methods
 Examining and analysing texts and
corpora
 Comparing texts and corpora
 Building and annotating resources
8
“I'm just going out to
commit certain deeds.”
In an episode of The Simpsons, Homer has
planned with Moe to steal Moe's car and drive it
into the water, so that Moe can claim the
insurance money. Before Homer goes out to steal
the car, he is eating dinner with the family, and is
trying to act innocently, as if it is a normal
evening. He makes various mistakes, and when
he gets up to leave, he says, “I'm just going out to
commit certain deeds.”
9
Consult a corpus to see how a
word / phrase / construction /
collocation 'normally' occurs.
For example, we can look at 'commit' and 'deeds'
in the British National Corpus, and try to answer
questions like “Why is this funny?”, and “Why are
commit and deeds the wrong words to use here?”
Requirements: access to a general reference
corpus and analysis tools (preferably online) for
concordance, collocation, cluster, distribution,
word frequency lists
10
Analyse an electronic
version of a literary text,
using text analysis tools.
 How does author X use expression Y?
 How often does she use Y?
 Does she prefer another expression in certain
contexts?
 In what parts of the novel / play / poem does
she tend to use Y?
Requirements: (reliable) electronic version of the
text (in an appropriate format), plus relevant tools
(preferably online)
11
Analysing a literary corpus
Ask questions like those above, but across the
oeuvre of an author, or across a literary genre or
time period.
Furthermore, analyse variation in an author's work
(e.g. compare one novel with the rest)
Requirements: a relevant corpus, plus tools that
allow for internal comparisons
12
Analysing an author's work
Clusters >4 words in Dickens -
among the top 25:
AS IF HE HAD BEEN
IN THE COURSE OF THE
A QUARTER OF AN HOUR
AT THE BOTTOM OF THE
WHAT DO YOU THINK OF
IN THE MIDDLE OF THE
AS IF IT HAD BEEN
AT THE TOP OF THE
ON THE OTHER SIDE OF
AT THE END OF THE
AS A MATTER OF COURSE
THE OTHER SIDE OF THE
UP AND DOWN THE ROOM
– Names and labels
21
– Speech 16
– “As if” 6
– Body parts 12
– other 22
Categorisation of cluster types
(more than 5 words):
Mahlberg, M. “Corpus stylistics: bridging the gap
between linguistic and literary studies” In M. Hoey,
M. Mahlberg, M. Stubbs, W. Teubert. Text,
Discourse, and Corpora. London: Continuum. 2007.
13
Making internal
comparisons within a text
 Comparing the speech of one character with the
rest, e.g. Romeo and Juliet .
 Comparing one act or scene with the rest.
 Comparing the style of one section of a novel
with the rest.
Requirements: text processing tools to separate
text elements, or markup to tag text structure and
markup-aware tools, plus keywords software
14
Comparing a text to a
reference corpus
Compare the frequency, distribution and usage of
words in the text with a reference corpus.
E.g. A Conneticut Yankee in King Arthur's Court
by Mark Twain, compared to the British National
Corpus (BNC)
Requirements: many reference corpora, literary
and non-literary, different languages, genres, time
periods, etc.
15
Comparing texts and
corpora
I, AND, SIR, KING, YE, IT, MY, LAUNCELOT, ME, WAS,
KNIGHTS, MERLIN, KNIGHT, ARMOR, CLARENCE, THING,
SANDY, HIM, MARHAUS, THAT, UPON, TOWARD, MORDRED,
GAWAINE, CAMELOT, SAGRAMOR, SO, DOWLEY, YES,
COULDN'T, MILRAYS, THEN, BUT, THEY, HUNDRED,
PRESENTLY, KING'S, ARTHUR'S, WOULD, MAN, HAD, WE,
ALL, YONDER, THOU, SLAVE, MIRACLE, OUT, ARTHUR,
GOOD, UNTO, COULD, AH, HATH, MYSELF, ERRANTRY, LET,
SMOTE, ALONG, WELL, MAGICIAN, NOBLE, HIS, GOT,
WHEREFORE, SWORD, HE, EVERYBODY, THEE, SPEAR,
YOU, ABBOT, PERADVENTURE, OFFENSE, HERMIT, THEM,
PROCESSION, STRAIGHTWAY, A, YET, MONKS, KAY, EVER,
GUENEVER
16
Comparing a literary corpus
to a general reference corpus
Identifying and characterizing an author's style,
e.g. comparing all of Mark Twain's work with US
fiction in the period 1870-1910;
Identifying and characterizing literary style (of a
period, or genre, etc),
e.g. comparing a corpus of US fiction with a
corpus of non-fiction from the same period, or
comparing dramatic dialogue in plays with real
conversation in a spoken corpus.
Requirements: More literary corpora, more
reference corpora, more computing power!
17
Tracing historical change
Diachronic studies of the language of literature,
studying language change, changes in style,
genre, etc.
Requirements: sets of historical literary corpora
of various time periods, or a diachronic corpus
which allows internal comparisons, or a collection
of texts (with dates) which can be cross-searched
18
Annotating and manually
analyzing texts and
corpora
Can be used to test, refine and develop theories
about the language of literature.
Theories are forced to demonstrate textual
evidence, account for all textual phenomena.
Frequencies and relevant frequencies can be
calculated.
Requirements: lots of time, money and expertise!
19
Building and Annotating
The Speech, Thought and Writing Presentation Corpus
Elena Semino, Mick Short, Martin Wynne et al
Lancaster University
Identifying, categorising and analysing the functions of all
occurrences of reported speech, thought and writing (e.g.
direct speech, indirect speech, free indirect speech, direct
thought, etc.) in a small corpus of fictional and non-fictional
texts (and later also speech)
20
Building and annotating (2)
VICI
Free University of Amsterdam
Gerard Steen et al
Identifying and categorising metaphorical
expressions in a subset of the BNC corpus;
analysing usage and distributions across text
types and modes
21
Further types of analysis
 More levels of annotation: parsing, semantic
tagging, etc.
 Stylometry
 Text mining
 Multilingual, parallel, comparable, translation
corpora
 Socio-cultural and historical investigations in literary
corpora
But note, please, that you don't need annotation for
many useful techniques!
Requirements: various!
22
A new type of Shakespeare
dictionary: Jonathan Culpeper
A proposal for a dictionary of the language of Shakespeare, involving
better integration of linguistic description, frequency information and
non-linguistic information.
− How often does X occur?
− How often do the particular meanings of X occur?
− What kind of words does X tend to co-occur with?
− How often do the particular ‘grammatical categories’ of X occur?
− What kinds of register does X co-occur with?
− What kinds of speaker/addressee does X co-occur with?
− Is X part of a particular lexical field (semantic category) and how does
that field distribute across the plays?
− How can the above help differentiate X word from Y word?
− Etc.
(1) a particular theoretical approach to meanings, (2) a particular
methodology ….. enter Corpus Linguistics
23
Using large-scale literary
corpora
 For example, Matthew Jockers, Sarah Allison
and others at Stanford University, using large
collections of literary texts, from commercial
providers, applying corpus linguistic and data
mining techniques to address literary research
questions
e.g. Joe Shapiro comparing quantity of narrative
v. descriptive passages in US 19th
Century
literature
 Perhaps, particular potential for historical literary
and linguistic studies
24
Basic methods: summary
1. Examine the norms in a general reference corpus
2. Perform text analysis on an electronic literary text
3. Make internal comparisons in a literary text
4. Analyse a literary corpus
5. Make internal comparisons in a literary corpus
6. Compare a text to a reference corpus
7. Compare a literary corpus to a non-literary corpus
8. Compare different literary corpora with each other
9. Build and annotate corpora
10. Others!
25
Methods: conclusion
It is becoming increasingly possible to test
empirically claims about the language of literature,
to search for and provide evidence from texts, and
to establish the norms of literary and non-literary
style.
Stylistics typically makes use of a toolkit of
linguistic techniques, methods and resources.
Corpus stylistics will become a powerful addition
to this toolkit in the future.
26
Resources for Corpus
Stylistics
What do we need?
● Reliable electronic editions of literary texts
● Relevant reference corpora
● Analysis tools
● Interoperability
● Shared access
● Sustainability
● Methodology
● Expertise
27
Research Infrastructure
The vision is for a set of relevant texts, corpora and tools,
hosted in various locations around the world, available
online from the user's desktop, via a single sign-on; all
the resources and tools working together using high-
speed connections and high-performance computing.
Plus tools for showing, sharing and collaborating in a
virtual workspace.
CLARIN is working to build this infrastructure for the use
of language resources and technologies across the
humanities and social sciences.
28
Links
Oxford Text Archive (OTA)
http://www.ota.ox.ac.uk/
PALA Corpus Stylistics Special Interest Group
http://www.pala.ac.uk/sigs/corpus-style/
Corpus-style mailing list
http://www.jiscmail.ac.uk/lists/corpus-style.html
Speech, Thought and Writing Presentation Project
http://bowland-files.lancs.ac.uk/stwp/
British National Corpus
http://www.natcorp.ox.ac.uk/
Brigham Young University Corpora from Mark Davies
http://corpus.byu.edu/

Contenu connexe

Tendances

Sociolinguistics_English as a Global Language
Sociolinguistics_English as a Global LanguageSociolinguistics_English as a Global Language
Sociolinguistics_English as a Global LanguageAndrea Jang
 
Stylistics and its Levels.pptx
Stylistics and its Levels.pptxStylistics and its Levels.pptx
Stylistics and its Levels.pptxFarooqNiaz2
 
Norman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse AnalysisNorman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse AnalysisMurk Razzaque
 
Conversational writing style
Conversational writing styleConversational writing style
Conversational writing styleUnaiza Saeed
 
Post Structuralism and Deconstruction
Post Structuralism and DeconstructionPost Structuralism and Deconstruction
Post Structuralism and DeconstructionBharat008
 
Historicism- The school of thought
Historicism- The school of thoughtHistoricism- The school of thought
Historicism- The school of thoughtNaqvisailya
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introductionThennarasuSakkan
 
Explanation of discourse analysis
Explanation of discourse analysisExplanation of discourse analysis
Explanation of discourse analysisEika Matari
 
Pidgins and creoles
Pidgins and creolesPidgins and creoles
Pidgins and creolesHassa Alfafa
 
Rhetorical devices used in scripted speeches
Rhetorical devices used in scripted speechesRhetorical devices used in scripted speeches
Rhetorical devices used in scripted speechesGuerillateacher
 
Optimality theory.pptx
Optimality theory.pptxOptimality theory.pptx
Optimality theory.pptxamjadnaasir
 
Stylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literatureStylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literatureMuhammad Adnan Ejaz
 
Modern linguistics
Modern linguisticsModern linguistics
Modern linguisticsamoresyoh99
 
English for specific purposes
English for specific purposesEnglish for specific purposes
English for specific purposesupadhyaydevangana
 

Tendances (20)

Sociolinguistics_English as a Global Language
Sociolinguistics_English as a Global LanguageSociolinguistics_English as a Global Language
Sociolinguistics_English as a Global Language
 
Structuralism
StructuralismStructuralism
Structuralism
 
Language planning
Language planningLanguage planning
Language planning
 
Stylistics and its Levels.pptx
Stylistics and its Levels.pptxStylistics and its Levels.pptx
Stylistics and its Levels.pptx
 
Discourse & newspaper
Discourse &  newspaperDiscourse &  newspaper
Discourse & newspaper
 
Norman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse AnalysisNorman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse Analysis
 
Political Discourse
Political DiscoursePolitical Discourse
Political Discourse
 
Michael halliday
Michael hallidayMichael halliday
Michael halliday
 
Conversational writing style
Conversational writing styleConversational writing style
Conversational writing style
 
An ode to death
An ode to deathAn ode to death
An ode to death
 
Post Structuralism and Deconstruction
Post Structuralism and DeconstructionPost Structuralism and Deconstruction
Post Structuralism and Deconstruction
 
Historicism- The school of thought
Historicism- The school of thoughtHistoricism- The school of thought
Historicism- The school of thought
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introduction
 
Explanation of discourse analysis
Explanation of discourse analysisExplanation of discourse analysis
Explanation of discourse analysis
 
Pidgins and creoles
Pidgins and creolesPidgins and creoles
Pidgins and creoles
 
Rhetorical devices used in scripted speeches
Rhetorical devices used in scripted speechesRhetorical devices used in scripted speeches
Rhetorical devices used in scripted speeches
 
Optimality theory.pptx
Optimality theory.pptxOptimality theory.pptx
Optimality theory.pptx
 
Stylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literatureStylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literature
 
Modern linguistics
Modern linguisticsModern linguistics
Modern linguistics
 
English for specific purposes
English for specific purposesEnglish for specific purposes
English for specific purposes
 

Similaire à Corpus Approaches to the Language of Literature 2008

Computationalstylistics tbpresented
Computationalstylistics   tbpresentedComputationalstylistics   tbpresented
Computationalstylistics tbpresentedeiza_89
 
Computational stylistic 24 april
Computational stylistic 24 aprilComputational stylistic 24 april
Computational stylistic 24 aprilsyila239
 
Computational stylistics ppt
Computational stylistics pptComputational stylistics ppt
Computational stylistics pptsyila239
 
Intro. to Stylistics
Intro. to StylisticsIntro. to Stylistics
Intro. to StylisticsFreelancer
 
Computational stylistics (2)[1]
Computational stylistics (2)[1]Computational stylistics (2)[1]
Computational stylistics (2)[1]Hajj Latiff
 
Research methods and materials
Research methods and materialsResearch methods and materials
Research methods and materialsGarret Raja
 
Comparative Literature Studies
Comparative Literature StudiesComparative Literature Studies
Comparative Literature StudiesDilip Barad
 
Essays And Grammar
Essays And GrammarEssays And Grammar
Essays And Grammarguest61dc4ad
 
MacroMicroZoom.pdf
MacroMicroZoom.pdfMacroMicroZoom.pdf
MacroMicroZoom.pdfMartin Wynne
 
Comparative literature- summary
Comparative literature- summaryComparative literature- summary
Comparative literature- summaryrobinsonia
 
Literary criticismpowerpoint
Literary criticismpowerpointLiterary criticismpowerpoint
Literary criticismpowerpointNishant Pandya
 
Poetry and The Merchant of Venice and The Poet X
Poetry and The Merchant of Venice and The Poet XPoetry and The Merchant of Venice and The Poet X
Poetry and The Merchant of Venice and The Poet XAbrilRodriguez37
 
12. literary criticism. fb college
12. literary criticism. fb college12. literary criticism. fb college
12. literary criticism. fb collegeArchie ibay
 
module3 teaching and assessment of lit.studies - Copy.pptx
module3 teaching and assessment of lit.studies - Copy.pptxmodule3 teaching and assessment of lit.studies - Copy.pptx
module3 teaching and assessment of lit.studies - Copy.pptxAnalieCabanlit1
 
Primary Source Responses When I assign you to read one or.docx
Primary Source Responses  When I assign you to read one or.docxPrimary Source Responses  When I assign you to read one or.docx
Primary Source Responses When I assign you to read one or.docxstilliegeorgiana
 

Similaire à Corpus Approaches to the Language of Literature 2008 (20)

Computationalstylistics tbpresented
Computationalstylistics   tbpresentedComputationalstylistics   tbpresented
Computationalstylistics tbpresented
 
Computational stylistic 24 april
Computational stylistic 24 aprilComputational stylistic 24 april
Computational stylistic 24 april
 
Computational stylistics ppt
Computational stylistics pptComputational stylistics ppt
Computational stylistics ppt
 
Intro. to Stylistics
Intro. to StylisticsIntro. to Stylistics
Intro. to Stylistics
 
Computational stylistics (2)[1]
Computational stylistics (2)[1]Computational stylistics (2)[1]
Computational stylistics (2)[1]
 
MLS
MLSMLS
MLS
 
Research methods and materials
Research methods and materialsResearch methods and materials
Research methods and materials
 
Comparative Literature Studies
Comparative Literature StudiesComparative Literature Studies
Comparative Literature Studies
 
Mls 1
Mls 1Mls 1
Mls 1
 
National trust 3
National trust 3National trust 3
National trust 3
 
Essays And Grammar
Essays And GrammarEssays And Grammar
Essays And Grammar
 
MacroMicroZoom.pdf
MacroMicroZoom.pdfMacroMicroZoom.pdf
MacroMicroZoom.pdf
 
Comparative literature- summary
Comparative literature- summaryComparative literature- summary
Comparative literature- summary
 
Literary criticismpowerpoint
Literary criticismpowerpointLiterary criticismpowerpoint
Literary criticismpowerpoint
 
Vivian
VivianVivian
Vivian
 
Poetry and The Merchant of Venice and The Poet X
Poetry and The Merchant of Venice and The Poet XPoetry and The Merchant of Venice and The Poet X
Poetry and The Merchant of Venice and The Poet X
 
12. literary criticism. fb college
12. literary criticism. fb college12. literary criticism. fb college
12. literary criticism. fb college
 
8 how to teach literature (and comics)
8 how to teach literature (and comics) 8 how to teach literature (and comics)
8 how to teach literature (and comics)
 
module3 teaching and assessment of lit.studies - Copy.pptx
module3 teaching and assessment of lit.studies - Copy.pptxmodule3 teaching and assessment of lit.studies - Copy.pptx
module3 teaching and assessment of lit.studies - Copy.pptx
 
Primary Source Responses When I assign you to read one or.docx
Primary Source Responses  When I assign you to read one or.docxPrimary Source Responses  When I assign you to read one or.docx
Primary Source Responses When I assign you to read one or.docx
 

Plus de Martin Wynne

CLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposalsCLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposalsMartin Wynne
 
CLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaborationCLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaborationMartin Wynne
 
Forty-five Years of the OTA
Forty-five Years of the OTAForty-five Years of the OTA
Forty-five Years of the OTAMartin Wynne
 
Exploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentExploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentMartin Wynne
 
Corpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and LearningCorpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and LearningMartin Wynne
 
Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTAMartin Wynne
 
Big data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesBig data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesMartin Wynne
 
Hacking EEBO: colour terms
Hacking EEBO: colour termsHacking EEBO: colour terms
Hacking EEBO: colour termsMartin Wynne
 
When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?Martin Wynne
 
Annotated Corpora for Research in the Humanities
Annotated Corpora for Research in the HumanitiesAnnotated Corpora for Research in the Humanities
Annotated Corpora for Research in the HumanitiesMartin Wynne
 

Plus de Martin Wynne (10)

CLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposalsCLARIN Supporting Horizon Europe proposals
CLARIN Supporting Horizon Europe proposals
 
CLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaborationCLARIN - Corpora, corpus tools and collaboration
CLARIN - Corpora, corpus tools and collaboration
 
Forty-five Years of the OTA
Forty-five Years of the OTAForty-five Years of the OTA
Forty-five Years of the OTA
 
Exploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentExploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic Enlightenment
 
Corpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and LearningCorpus Linguistics for Language Teaching and Learning
Corpus Linguistics for Language Teaching and Learning
 
Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTA
 
Big data and Digital Transformations in the Humanities
Big data and Digital Transformations in the HumanitiesBig data and Digital Transformations in the Humanities
Big data and Digital Transformations in the Humanities
 
Hacking EEBO: colour terms
Hacking EEBO: colour termsHacking EEBO: colour terms
Hacking EEBO: colour terms
 
When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?When will there be a digital revolution in the humanities?
When will there be a digital revolution in the humanities?
 
Annotated Corpora for Research in the Humanities
Annotated Corpora for Research in the HumanitiesAnnotated Corpora for Research in the Humanities
Annotated Corpora for Research in the Humanities
 

Dernier

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 

Dernier (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 

Corpus Approaches to the Language of Literature 2008

  • 1. Corpus Approaches to the Language of Literature Martin Wynne Oxford Text Archive University of Oxford martin.wynne@oucs.ox.ac.uk
  • 2. OTA
  • 5.
  • 6. 6 What is corpus stylistics? The use of the resources, tools and methodologies of corpus linguistics to carry out literary analysis on the basis of the language of literature.
  • 7. 7 Corpus Stylistics - Methods  Examining and analysing texts and corpora  Comparing texts and corpora  Building and annotating resources
  • 8. 8 “I'm just going out to commit certain deeds.” In an episode of The Simpsons, Homer has planned with Moe to steal Moe's car and drive it into the water, so that Moe can claim the insurance money. Before Homer goes out to steal the car, he is eating dinner with the family, and is trying to act innocently, as if it is a normal evening. He makes various mistakes, and when he gets up to leave, he says, “I'm just going out to commit certain deeds.”
  • 9. 9 Consult a corpus to see how a word / phrase / construction / collocation 'normally' occurs. For example, we can look at 'commit' and 'deeds' in the British National Corpus, and try to answer questions like “Why is this funny?”, and “Why are commit and deeds the wrong words to use here?” Requirements: access to a general reference corpus and analysis tools (preferably online) for concordance, collocation, cluster, distribution, word frequency lists
  • 10. 10 Analyse an electronic version of a literary text, using text analysis tools.  How does author X use expression Y?  How often does she use Y?  Does she prefer another expression in certain contexts?  In what parts of the novel / play / poem does she tend to use Y? Requirements: (reliable) electronic version of the text (in an appropriate format), plus relevant tools (preferably online)
  • 11. 11 Analysing a literary corpus Ask questions like those above, but across the oeuvre of an author, or across a literary genre or time period. Furthermore, analyse variation in an author's work (e.g. compare one novel with the rest) Requirements: a relevant corpus, plus tools that allow for internal comparisons
  • 12. 12 Analysing an author's work Clusters >4 words in Dickens - among the top 25: AS IF HE HAD BEEN IN THE COURSE OF THE A QUARTER OF AN HOUR AT THE BOTTOM OF THE WHAT DO YOU THINK OF IN THE MIDDLE OF THE AS IF IT HAD BEEN AT THE TOP OF THE ON THE OTHER SIDE OF AT THE END OF THE AS A MATTER OF COURSE THE OTHER SIDE OF THE UP AND DOWN THE ROOM – Names and labels 21 – Speech 16 – “As if” 6 – Body parts 12 – other 22 Categorisation of cluster types (more than 5 words): Mahlberg, M. “Corpus stylistics: bridging the gap between linguistic and literary studies” In M. Hoey, M. Mahlberg, M. Stubbs, W. Teubert. Text, Discourse, and Corpora. London: Continuum. 2007.
  • 13. 13 Making internal comparisons within a text  Comparing the speech of one character with the rest, e.g. Romeo and Juliet .  Comparing one act or scene with the rest.  Comparing the style of one section of a novel with the rest. Requirements: text processing tools to separate text elements, or markup to tag text structure and markup-aware tools, plus keywords software
  • 14. 14 Comparing a text to a reference corpus Compare the frequency, distribution and usage of words in the text with a reference corpus. E.g. A Conneticut Yankee in King Arthur's Court by Mark Twain, compared to the British National Corpus (BNC) Requirements: many reference corpora, literary and non-literary, different languages, genres, time periods, etc.
  • 15. 15 Comparing texts and corpora I, AND, SIR, KING, YE, IT, MY, LAUNCELOT, ME, WAS, KNIGHTS, MERLIN, KNIGHT, ARMOR, CLARENCE, THING, SANDY, HIM, MARHAUS, THAT, UPON, TOWARD, MORDRED, GAWAINE, CAMELOT, SAGRAMOR, SO, DOWLEY, YES, COULDN'T, MILRAYS, THEN, BUT, THEY, HUNDRED, PRESENTLY, KING'S, ARTHUR'S, WOULD, MAN, HAD, WE, ALL, YONDER, THOU, SLAVE, MIRACLE, OUT, ARTHUR, GOOD, UNTO, COULD, AH, HATH, MYSELF, ERRANTRY, LET, SMOTE, ALONG, WELL, MAGICIAN, NOBLE, HIS, GOT, WHEREFORE, SWORD, HE, EVERYBODY, THEE, SPEAR, YOU, ABBOT, PERADVENTURE, OFFENSE, HERMIT, THEM, PROCESSION, STRAIGHTWAY, A, YET, MONKS, KAY, EVER, GUENEVER
  • 16. 16 Comparing a literary corpus to a general reference corpus Identifying and characterizing an author's style, e.g. comparing all of Mark Twain's work with US fiction in the period 1870-1910; Identifying and characterizing literary style (of a period, or genre, etc), e.g. comparing a corpus of US fiction with a corpus of non-fiction from the same period, or comparing dramatic dialogue in plays with real conversation in a spoken corpus. Requirements: More literary corpora, more reference corpora, more computing power!
  • 17. 17 Tracing historical change Diachronic studies of the language of literature, studying language change, changes in style, genre, etc. Requirements: sets of historical literary corpora of various time periods, or a diachronic corpus which allows internal comparisons, or a collection of texts (with dates) which can be cross-searched
  • 18. 18 Annotating and manually analyzing texts and corpora Can be used to test, refine and develop theories about the language of literature. Theories are forced to demonstrate textual evidence, account for all textual phenomena. Frequencies and relevant frequencies can be calculated. Requirements: lots of time, money and expertise!
  • 19. 19 Building and Annotating The Speech, Thought and Writing Presentation Corpus Elena Semino, Mick Short, Martin Wynne et al Lancaster University Identifying, categorising and analysing the functions of all occurrences of reported speech, thought and writing (e.g. direct speech, indirect speech, free indirect speech, direct thought, etc.) in a small corpus of fictional and non-fictional texts (and later also speech)
  • 20. 20 Building and annotating (2) VICI Free University of Amsterdam Gerard Steen et al Identifying and categorising metaphorical expressions in a subset of the BNC corpus; analysing usage and distributions across text types and modes
  • 21. 21 Further types of analysis  More levels of annotation: parsing, semantic tagging, etc.  Stylometry  Text mining  Multilingual, parallel, comparable, translation corpora  Socio-cultural and historical investigations in literary corpora But note, please, that you don't need annotation for many useful techniques! Requirements: various!
  • 22. 22 A new type of Shakespeare dictionary: Jonathan Culpeper A proposal for a dictionary of the language of Shakespeare, involving better integration of linguistic description, frequency information and non-linguistic information. − How often does X occur? − How often do the particular meanings of X occur? − What kind of words does X tend to co-occur with? − How often do the particular ‘grammatical categories’ of X occur? − What kinds of register does X co-occur with? − What kinds of speaker/addressee does X co-occur with? − Is X part of a particular lexical field (semantic category) and how does that field distribute across the plays? − How can the above help differentiate X word from Y word? − Etc. (1) a particular theoretical approach to meanings, (2) a particular methodology ….. enter Corpus Linguistics
  • 23. 23 Using large-scale literary corpora  For example, Matthew Jockers, Sarah Allison and others at Stanford University, using large collections of literary texts, from commercial providers, applying corpus linguistic and data mining techniques to address literary research questions e.g. Joe Shapiro comparing quantity of narrative v. descriptive passages in US 19th Century literature  Perhaps, particular potential for historical literary and linguistic studies
  • 24. 24 Basic methods: summary 1. Examine the norms in a general reference corpus 2. Perform text analysis on an electronic literary text 3. Make internal comparisons in a literary text 4. Analyse a literary corpus 5. Make internal comparisons in a literary corpus 6. Compare a text to a reference corpus 7. Compare a literary corpus to a non-literary corpus 8. Compare different literary corpora with each other 9. Build and annotate corpora 10. Others!
  • 25. 25 Methods: conclusion It is becoming increasingly possible to test empirically claims about the language of literature, to search for and provide evidence from texts, and to establish the norms of literary and non-literary style. Stylistics typically makes use of a toolkit of linguistic techniques, methods and resources. Corpus stylistics will become a powerful addition to this toolkit in the future.
  • 26. 26 Resources for Corpus Stylistics What do we need? ● Reliable electronic editions of literary texts ● Relevant reference corpora ● Analysis tools ● Interoperability ● Shared access ● Sustainability ● Methodology ● Expertise
  • 27. 27 Research Infrastructure The vision is for a set of relevant texts, corpora and tools, hosted in various locations around the world, available online from the user's desktop, via a single sign-on; all the resources and tools working together using high- speed connections and high-performance computing. Plus tools for showing, sharing and collaborating in a virtual workspace. CLARIN is working to build this infrastructure for the use of language resources and technologies across the humanities and social sciences.
  • 28. 28 Links Oxford Text Archive (OTA) http://www.ota.ox.ac.uk/ PALA Corpus Stylistics Special Interest Group http://www.pala.ac.uk/sigs/corpus-style/ Corpus-style mailing list http://www.jiscmail.ac.uk/lists/corpus-style.html Speech, Thought and Writing Presentation Project http://bowland-files.lancs.ac.uk/stwp/ British National Corpus http://www.natcorp.ox.ac.uk/ Brigham Young University Corpora from Mark Davies http://corpus.byu.edu/