Presentation of the eMargin collaborative annotation tool given at the Higher Education Academy #tagginganna workshop at the University of Leicester, 5 July 2012
1. A collaborative textual annotation tool
Andrew Kehoe & Matt Gee
Research & Development
Unit for English Studies
emargin.bcu.ac.uk
2. Background
• Corpus Linguistics: developing software to build and
analyse large text collections: crawling, indexing,
annotation, search.
• Our own large-scale search engine for linguistic study.
• 10bn words of web text (part-of-speech tagged).
• Includes collections of news and blogs.
• Lets users extract examples of words/phrases in
context, monitor change across time, etc.
www.webcorp.org.uk
3. New Audiences
• Bringing Corpus Linguistic techniques to new
audiences:
i. School (A-Level) English students
ii. Literary colleagues (teachers/researchers/critics)
• A move toward literary texts and Corpus
Stylistic approaches
4. New Corpora
• Literary collections, including:
– Novels of Charles Dickens
– Works of Thomas Carlyle
– Works of James Joyce
– Works of Samuel Beckett
– Poems of Percy Bysshe Shelley
– Restoration Drama
– Science Fiction
• Downloaded and processed whole of Project
Gutenberg (23,484 texts; 1.6 billion words)
5. Colleagues’ Own Examples
The doctor seemed especially troubled by the fact of the robbery having
been unexpected, and attempted in the night-time; as if it were the
established custom of gentlemen in the housebreaking way to transact
business at noon, and to make an appointment, by post, a day or two
previous.
(Oliver Twist)
But there was no hitch in the conversation nevertheless; for one gentleman,
who travelled in the perfumery line, exhibited an interesting nick-nack, in
the way of a remarkable cake of shaving soap which he had lately met with
in Germany;
(Martin Chuzzlewit)
7. Testing Intuitions
“Dickens is known for a rich range of writing styles-
indignant, ironical, melodramatic, and sentimental,
all of which appear in David Copperfield. To set the
nostalgic tone for this novel, he also uses certain
words like "little" and "old" more than usual, so
his language seems especially sentimental.”
(Barron’s Book Notes: David Copperfield, 1985, p.32)
8.
9.
10. Limitations
• Literary scholars saw benefits of corpus linguistic
techniques but concerned about straying too far
from the text.
• Literary language is highly creative/variable.
• Corpus Linguistic techniques work best with exact
repetitions, not so good at finding paraphrases in
fully automated way.
• Difficult to pick up themes/motifs without human
input.
11. “corpus stylistics can make an important
contribution to the investigation of the interplay
between conventional, idiosyncratic and
creative patterns of language use. Corpus
stylistics also highlights that intuition and
automatic processes should work together”
(Mahlberg, 2007:224)
13. Literary Study
How do you study a literary text?
‘Close Reading’: detailed study of short text extracts
down to individual word level.
14. An Established Tradition
• Can be traced back to 11th Century.
Martin Luther:
Lectures on Romans
(1515)
Glossae: student’s
notes in the margins
Image from:
Cummings, B. (2002)
The Literary Culture of
the Reformation
(Oxford: OUP).
15. • Text quickly becomes
cluttered with underlining/
• (re-)read the text notes on each re-reading
• underline important words • Annotations tied to printed
• make notes in margin copy of text
• colour-code • Difficult to share /
combine in class
• draw out themes/motifs
• Annotations not
archivable / searchable
16. is on e-texts
Increasing emphas are to
but surprising lack of softw
ding.
support close rea
te
Difficult to annota
nnotations
Difficult to share a
nough for
N ot fine-grained e
academic study
17. Limitations of Traditional Model
• ‘Book Lovers Fear Dim
Future for Notes in the
Margins’, New York
Times, Feb 20 2011:
–writing comments
alongside passages…is a
rich literary pastime,
sometimes regarded as a
tool of literary
archaeology, …but it has
an uncertain fate in a
digitalized world
18. Our Solution
• Web-based collaborative annotation system operating
down to word level.
• Initial prototype late-2007
allowing basic highlighting/
commenting.
• Classroom trials at BCU
and Leicester.
19. Pilot Study
• Structured feedback collected from 25 Leicester students
across 3 modules (2 BA, 1 MA).
– 96% found word-level commenting useful.
– 88% found highlighting useful.
– 92% agreed that “reading others’ comments helped me
formulate my own ideas”.
– 96% found prototype ‘easy’ to use.
• Pilot study suggested which features of most use.
• JISC Learning & Teaching Innovation grant
(June 2011–May 2012) to build fully-functioning,
open-source system.
35. Case Study: Student Projects
• Individual research projects on 3rd year BA Narrative
Analysis module at BCU.
• Making connections between literary and linguistic
study by examining narrative theories.
• Example: April’s study of newspaper narratives
– 10 articles each from The Sun and The Guardian
– Analysed using 3 narrative models:
Labov (1972), White (1997), Hoey (2001).
– In eMargin: used a different colour for each model and
tags to indicate the different stages of the model.
– Shows that eMargin can be used individually as a well as
collaboratively.
36. Future Plans
• Separate layers of annotation
• Retain text layout and formatting
• Import and Export
37. Future Plans
• Integrate linguistic analysis features
– Corpora
– Tools
• Concordancing
• Wordlists
• Keywords
• Collocation
38. Phase 2: 2012-13
• JISC Embedding Benefits funds for integration
with Virtual Learning Environments (VLEs) using
IMS Learning Tools Interoperability specification:
– Single sign-in for seamless transition from VLE to eMargin
– Easier group management - import class lists from VLE
– Compatible with all major VLEs (Moodle, Blackboard
Learn, WebCT, etc.)
– Explore potential of eMargin as an e-assessment tool
39. Beyond English
• English Literature in first instance but transferable to
any text-based discipline: Law, Social Sciences,
Theology, Languages (and potential beyond text…)
• Trialled at Birmingham School of Acting
• Collaborative research/editing tool
• Beyond HE: United World College of SE Asia
• Working to increase uptake across disciplines
emargin.bcu.ac.uk