This paper was presented at the COST Digital Humanities Conference: Reassembling the Republic of Letters. My brief was to give a general, introductory discussion of the history, limits and future of encoded text, particularly XML and particularly the Text Encoding Initiative.
Salient Features of India constitution especially power and functions
Reborn Digital: coding text
1. Reborn Digital: coding text
Pip Willcox
Curator of Digital Special Collections
Bodleian Libraries, University of Oxford
@pipwillcox
Bodleian Libraries
UNIVERSITY OF OXFORD
COST Digital Humanities Conference: Reassembling the Republic of Letters
22–23 March 2015, University of Oxford
http://www.slideshare.net/PipWillcox/reborn-digital-coding-text
2. Republics of Letters
Quod feliciter vortat academici
Oxoniens bibliothecam hanc
vobis reipublicaeque
literatorum T.B.P.
Thomas Bodley has built
this library for you and for
the Republic of the Letters.
May the gift turn out well.
Bodleian Libraries
UNIVERSITY OF OXFORD
Photo:PipWillcox
3. The many forms of digital text
• Metadata — Early Modern Letters Online
• Image — Early English Books Online (EEBO)
• Optical Character Recognition (OCR) — Google Books
• Handwritten Character Recognition (HCR) — Transcribe Bentham
• Transcribed — EEBO Text Creation Partnership (EEBO-TCP)
• Encoded — Shakespeare QuartosArchive
• Edited — Digital Renaissance Editions
• Digital print — Oxford Scholarly Editions Online
Bodleian Libraries
UNIVERSITY OF OXFORD
4. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
• Publisher-led editions
• Library-led editions
• Academic-led editions
• Social editions
5. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
• Publisher-led editions
• Library-led editions
• Academic-led editions
• Social editions
•Licensedforreuse
•Freelyavailable
•Subscription
•Private
6. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
• 25
•Licensedforreuse
•Freelyavailable
•Subscription
•Private
•D
iscoverable
•C
itable
•Reusable
•Sustainable
7. The many forms of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
• 25
•Licensedforreuse
•Freelyavailable
•Subscription
•Private
•D
iscoverable
•C
itable
•Reusable
•Sustainable
• Provenance
• Conditions of re-use
• Editorial principles
8. Affordances of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
• Read it — dissemination, preservation
• Free text search
• Distant reading
• At scale
• Automated tagging, e.g. linguistic, geographic
Photo:Pip
W
illcox
9. Bodleian Libraries
UNIVERSITY OF OXFORD
Affordances of hand-encoded text
• First pick your Extensible Markup Language (XML):
• Resource Description Framework (RDF)
• Encoded Archival Description (EAD)
• Text Encoding Initiative (TEI)
• …anything to separate your data from your interface
10. Bodleian Libraries
UNIVERSITY OF OXFORD
Affordances of hand-encoded text
• First pick your Extensible Markup Language (XML):
• Resource Description Framework (RDF)
• Encoded Archival Description (EAD)
• Text Encoding Initiative (TEI)
• …anything to separate your data from your interface
11. Affordances of XML
Bodleian Libraries
UNIVERSITY OF OXFORD
• Machine-readable and human-readable(ish)
• Interoperable open standard (W3C)
• Extensible semantic markup
12. Affordances of XML
Bodleian Libraries
UNIVERSITY OF OXFORD
• Machine-readable and human-readable(ish)
• Interoperable open standard (W3C)
• Extensible semantic markup
• Not always the answer
• Not an end in itself: a research/
publication tool
• Hierarchical structure
13. Affordances of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
• An XML international standard
• A set of Guidelines
• For encoding historical text
• A community of practice:
• conference, mailing list, journal,
wiki, SourceForge, toolchain
• Future-proof
14. Affordances of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
• An XML international standard
• A set of Guidelines
• For encoding historical text
• A community of practice:
• conference, mailing list, journal,
wiki, SourceForge, toolchain
• Future-proof(ish)
16. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
• Early English Books Online Text Creation
Partnership: books in the Short Title Catalogue
• Scale:
• c.130,000 metadata records and image sets
• TCP Phase I: c.25,000 digital texts
• TCP Phase II: c. 40,000 digital texts (and counting)
• Scope: searchable, readable, marked-up, digital, full
texts
17. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
• Early English Books Online Text Creation
Partnership: books in the Short Title Catalogue
• Scale:
• c.130,000 metadata records and image sets
• TCP Phase I: c.25,000 digital texts: available!
• TCP Phase II: c. 40,000 digital texts (and counting)
• Scope: searchable, readable, marked-up, digital, full
texts
18. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
19. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
20. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:98209
21. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
• Some things we mark up:
• Textual divisions, with descriptions
• Opening material, e.g. arguments, salutes
• Closing material, e.g. signatures, dates
• Letters, lists and tables
• Speakers, speeches, stage directions, quotations
• Textual notes
22. A case study: EEBO-TCP
Bodleian Libraries
UNIVERSITY OF OXFORD
• …and some things we don’t:
• Non-Roman alphabets
• Music
• Complex mathematical material
• Illegible characters
• Manuscript
• Damaged or missing material
23. EEBO-TCP: a buildable resource
Bodleian Libraries
UNIVERSITY OF OXFORD
Distant reading — Duhaime and Zimmer, DocuScope, AdornMorph
Close reading — Verse Miscellanies Online, Digital Anthology of
Early English Drama, Forms Online Renaissance to Modern
24. Early English Print in the HathiTrust
Bodleian Libraries
UNIVERSITY OF OXFORD
KevinPage&PipWillcox
Anon.Atrueandperfectdescriptionofthestrangeandwonderfulshe-elephant,sentfromtheIndies,whicharrivedat
London,August1.1683.Withthetrueportraictureofthatwonderinnature(London:1683).Ashm.H24[42].Image:
BodleianLibraries.
Coryate,Thomas,ThomasCoriate traueller fortheEnglish vvits: greeting FromthecourtoftheGreatMogul,
residentatthe towne ofAsmere,in easterne India(London:1616).ViaEEBO—http://gateway.proquest.com/
openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:9182
Anon,Afullandtruerelationofthe elephantthat is brought overintoEnglandfromtheIndies,and landed at
London,August3d.1675. Giving likewiseatrueaccountofthewonderful
nature, understanding, breeding, taking andtaming of elephants(London,1675).ViaEEBO:http://
gateway.proquest.com/openurl?ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:184581.
TerhiNurmikko-Fuller
25. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
• The origins: pre-1642 quartos from
JISC/NEH Transatlantic Digitization
Collaboration Grant
http://quartos.org/
26. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Bodleian, Arch. G d.41
27. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Bodleian, Arch. G d.41
28. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
29. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=“margin-left” hand=“#af” type=“intervention”
resp=“#fol”>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
30. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=“margin-left” hand=“#af” type=“intervention”
resp=“#fol”>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
<delSpan> surrounding the original <l>
<anchor> (for the <delSpan>)
<addSpan>
closing </sp> (speech)
opening <sp>
opening <speaker> with its associated attributes
the line, in its entirety
second closing </sp>
<anchor> (for the <addSpan>)
opening <sp> (to reopen the printed speech)
opening <speaker> (to repeat the original speaker)
31. A case study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Nobody has ever answered “yes” to
“Let me show you my XML”
...except a computer
HeatherFroehlich
DavidDeRoure
http://firstfolio.bodleian.ox.ac.uk/
32. Limitations of the Text Encoding
Initiative (TEI)
Bodleian Libraries
UNIVERSITY OF OXFORD
• Time and funding
• Expert editors
• A learning curve
• “An extended subset”
• An XML international standard
• A set of Guidelines
• For encoding historical text
• A community of practice:
• conference, mailing list, journal,
wiki, SourceForge, toolchain
• Future-proof(ish)
33. The Future, or,An Invitation to Hubris
Bodleian Libraries
UNIVERSITY OF OXFORD
DavidDeRoure
http://www.slideshare.net/davidderoure/future-of-scholarly-communications
34. The Future, or,An Invitation to Hubris
Bodleian Libraries
UNIVERSITY OF OXFORD
• More connections across: texts, programs, communities…
• More integration between: semantic interoperability…
• More tools, more animation
• Co-constitution
• Heterogeneous actors, human and machine
• Performative and social
SusanHalfordetal:
http://eprints.soton.ac.uk/271033/
35. Find out more
Bodleian Libraries
UNIVERSITY OF OXFORD
• Teach Yourself TEI:
http://www.tei-c.org/Support/Learn/tutorials.xml
• TEI Massive Open Online Course (MOOC) is coming
• TEI Conference, 28—31 October 2015, Lyon, France:
Text Encoding Initiative: connect, animate, innovate