SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
ChartEx: Discovering spatial descriptions and
relationships in medieval charters
Introduction
www.chartex.org
@ChartExProject
Digging into Data Challenge
“As the world becomes increasingly digital, new
techniques will be needed to search, analyze, and
understand these everyday materials. Digging into Data
challenges the research community to help create the
new research infrastructure for 21st century scholarship”
UK: AHRC, ESRC, JISC
US: NEH, IMLS, NSF
CAN: SSHRC
Netherlands: Organisation for Scientific Research
www.diggingintodata.org
ChartEx Project Partners
University of York: History and Human Computer
Interaction
University of Brighton: Natural Language Processing
University of Leiden: Data Mining
University of Washington: History, Web Services
University of Toronto: History and Digital Archives
Columbia University: History and Digital Libraries
Data Repositories: The National Archives (UK), Borthwick,
DEEDS Project U of Toronto, Columbia Digital Humanities
Advisory Panel: various cognate projects
ChartEx Team
History: Sarah Rees Jones, Adam Kosto, Michael
Gervers, Robin Sutherland-Harris, Stefania Perring,
Jon Crump
NLP: Roger Evans,
Lynne Cahill
Data Mining: Arno Knobbe,
Marvin Meeng
HCI: Helen Petrie, Christopher Power, David Swallow
Digital Charter Collections
Charters of the Vicars Choral of York Minster. City of York
and its suburbs to 1546. Latin charters and English
translations.
The National Archives (UK) Ward 2. Abridged English
translations
The Borthwick Institute, Yarburgh Muniments. Abridged
English translations
DEEDS, University of Toronto. Latin charters of English
provenance.
Cluny. Latin charters of French provenance (CBMA)
York
corner of Stonegate and Petergate
opposite Minster Gates
detail of OS 1852
Urban Historic Topography
A charter from English Edition of
Vicars Choral collection
408. Grant by Thomas son of Josce goldsmith and citizen of York to
his younger son Jeremy of half his land lying in length from
Petergate at the churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to land which mag.
Simon de Evesham inhabited; paying Thomas and his heirs 1d. or
[a pair of] white gloves worth 1 d. at Christmas. Warranty. Seal.
Witnesses: Geoffrey Gunwar, William de
Gerford[b]y,' chaplains,Robert de Farnham,
Robert le Spicer, John le plastrer, Walter de
Alna goldsmith, Nicholas Page, Thomas
talliator, Hugh le bedel, John de Glouc',
clerks, and others.
January 1252
Chronological Sequencing and
Spatial Sequencing
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half his
land lying in length from Petergate at the churchyard
of St. Peter to houses of the prebend of Ampleford
and in breadth from Steyngate to land which mag.
Simon de Evesham inhabited
409. Grant by Mariot widow of Thomas son of Josce
goldsmith of York and by Jeremy (Jeremias) son of
Thomas and Mariot to mag. Simon de Evesham
canon of York of land with buildings in Steyngate,
granted to them by Thomas, lying in length between
Petergate and land of the prebend of Ampleford, and
in breadth between Steyngate and land once of
Geoffrey de Norwyz precentor of York
N
Conceptual plan of logical
relationships
ChartEx Architecture
Charter
documents
Analysed
individual
documents
Natural
language
processing
Data
mining
Analysed
integrated
documents
ChartEx
workbench
People, places and events in
charters: exploring the language of
charters within ChartEx
Robin Sutherland-Harris
University of Toronto
Roger Evans
NLTG, University of Brighton
15 November 2013
Grant from Walter de Soke to Richard de Forde
Somerset Heritage Centre, DDWHb/244
Final concord made in the Court of Common Pleas at
Westminster before James Dyer, Thomas Meade, Francis
Wyndam, William Peryam, justices, in dispute between
Richard Broke and George Armond, plaintiffs, and Robert
Jenner, defendant, over lands (specified) in Fordham, Essex
Sample document summary
Summary provided by the National Archives, Ward2_55A_188_30
Fig. 1 The phases of the Kanga Methodology <ce:cross-ref refid="bib0180"> [36]</ce:cross-ref> . The white boxes indicate the phases
performed by domain experts. The formal structuring is done by domain experts using Rabbit, while the translation to OWL is ...
Ronald Denaux , Catherine Dolbear , Glen Hart , Vania Dimitrova , Anthony G. Cohn
Supporting domain experts to construct conceptual ontologies: A holistic approach
Web Semantics: Science, Services and Agents on the World Wide Web Volume 9, Issue 2 2011 113 - 127
http://dx.doi.org/10.1016/j.websem.2011.02.001
Kanga Methodology
BRAT rapid annotation tool
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou and Jun'ichi Tsujii
(2012). brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations
Session at EACL 2012. (http://brat.nlplab.org/)
ChartEx Architecture
Charter
documents
Analysed
individual
documents
Natural
language
processing
Data
mining
Analysed
integrated
documents
ChartEx
workbench
ChartEx Architecture
Charter
documents
Analysed
individual
documents
Natural
language
processing
Data
mining
Analysed
integrated
documents
ChartEx
workbench
The NLP Task
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half
his land lying in length from Petergate at the
churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to
land which mag. Simon de Evesham inhabited;
paying Thomas and his heirs 1d. or [a pair of]
white gloves worth 1 d. at Christmas. Warranty.
Seal.
… to some
‘semantic’
data like this
Get from a
text like this …
The NLP Task
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half
his land lying in length from Petergate at the
churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to
land which mag. Simon de Evesham inhabited;
paying Thomas and his heirs 1d. or [a pair of]
white gloves worth 1 d. at Christmas. Warranty.
Seal.
Get from a
text like this …
… which is not
really very
different from
this (BRAT)
(http://brat.nlplab.org/)
The NLP Task
408. Grant by Thomas son of Josce goldsmith and
citizen of York to his younger son Jeremy of half
his land lying in length from Petergate at the
churchyard of St. Peter to houses of the prebend
of Ampleford and in breadth from Steyngate to
land which mag. Simon de Evesham inhabited;
paying Thomas and his heirs 1d. or [a pair of]
white gloves worth 1 d. at Christmas. Warranty.
Seal.
… although
actually to
us, it looks
more like this
T1 Document 0 17 vicars-choral-387
T2 Transaction 18 23 Grant
T3 Person 27 34 William
T4 Institution 73 90 prebend of Masham
T5 Person 42 49 Richard
T6 Occupation 51 64 canon of York
…
R3 is_son_of Arg1:T3 Arg2:T5
R4 is_grantor_in Arg1:T3 Arg2:T2
R5 occupation_is Arg1:T3 Arg2:T6
(http://brat.nlplab.org/)
Get from a
text like this …
NLP development methodology
Step 1 find out what the ‘mark up’
should be (from historians)
Step 2 ask historians to mark up
some documents manually
Step 3 use some of these examples
to develop an NLP system that
does ‘the same’
Step 4 use the rest to evaluate its
performance (in progress)
The manual annotation
Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of
half his land lying in length from Petergate at the churchyard of St. Peter to houses of the
prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de
Evesham inhabited;
NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
NLP – layered pattern matching
Word layer: basic information about individual words, collected ‘outside’ the main ChartEx
NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
NLP – layered pattern matching
Token layer: identify semantic types and intrinsic properties (eg gender) of known
individual words (not all of these types feature in the final output). Some types are
inferred – eg unknown capitalised words are proper nouns.
NLP – layered pattern matching
Token layer: identify semantic types and intrinsic properties (eg gender) of known
individual words (not all of these types feature in the final output). Some types are
inferred – eg unknown capitalised words are proper nouns.
NLP – layered pattern matching
Lexical layer: identify simple lexical phrases – groups of tokens that act as individual units.
Promote other individual tokens to lexical items (with lexical types).
NLP – layered pattern matching
Lexical layer: identify simple lexical phrases – groups of tokens that act as individual units.
Promote other individual tokens to lexical items (with lexical types).
NLP – layered pattern matching
Syntax layer: build (local) syntactic structure to identify basic constituents of the sentence.
NLP – layered pattern matching
Phrasal layer: use part-of-speech tags to build lexical items into local syntactic/semantic
structures. These have lots of unbound arguments – like jigsaw pieces waiting to be
slotted together.
NLP – layered pattern matching
Phrasal layer: use part-of-speech tags to build lexical items into local syntactic/semantic
structures – and then ’collapse’ all the equals relations.
NLP – layered pattern matching
Semantic layer: build semantic relationships by gluing the pieces together. Again, first we
use equals relations to capture the structure
NLP – layered pattern matching
Semantic layer: build semantic relationships – and then we collapse them down
How did we do?
Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of
half his land lying in length from Petergate at the churchyard of St. Peter to houses of the
prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de
Evesham inhabited;
How did we do?
Historians vs NLP
• NLP has to be told very clearly what it is
looking for
• NLP has to make a lot of information
explicit that people don’t notice
• NLP sees ambiguity where people do not
– Really struggles with coordination and
attachment
• NLP is programmed by computer scientists
(or sometimes statisticians), not historians
• NLP can do it, but it definitely needs help !
Reconstructing spatial relationships from
charters: a collaboration between
Data Mining and Historical Topography
Stefania Merlo Perring, Sarah Rees Jones, York
Arno Knobbe, Leiden
This presentation will :
• Introduce the process used by Data Mining to
reproduce the methodology of historic
topography
• Discuss the results and further potentials
• Reflect on how Data Mining of charters and
cartularies can add new meanings to
Diplomatics
Charter Annotation
Fragmented Relational Information
Vicars Choral 408
Matching Relational Information
Vicars Choral 408
Vicars Choral 409
Statistics
0
50
100
150
200
250
300
JohnThom
as
R
obert
W
illiam
R
ichard
W
alterN
icholas
H
enry
G
eorge
Peter
Sim
onG
eoffrey
R
ogerEdw
ardM
argaretStephenEdm
und
H
ugh
Adam
names
occupations
0
10
20
30
40
50
60
yeom
an
yeom
en
gentgentlem
an
esq
esquire
esquires
clerk
clerksgoldsm
ith
m
ayor
chaplainchaplains
widow
bailiffs
husbandm
an
knight
knt
kt
Thomas son of Josce, goldsmith
• Statistics
p(Thomas) = 0.12 (common name)
p(Josce) = 0.0015 (uncommon name)
p(Goldsmith) = 0.04 (common profession)
• Dating
vc-408 1252-1253
vc-409 1253-1261
• Final confidence
conf (Thomas 408, Thomas 409) = 0.9993
Ranking of candidate Person-Person matches
0.99999 Alexander/Alexander, glover, witness
0.99998 Erard/Erard, witness
0.99998 Bernard/Erard, canons, witness
0.99998 Maurice/Maurice, canons, witness
0.99997 Adam/Ada, fisherman, sibling Richard
0.99997 Gilbert/Gilbert, chaplains, witness
0.99995 Roger/Roger, saddler, witness
0.99995 Roger/Roger, saddler
0.99994 Serlo/Serlo of Staingate, chaplains
…
Thomas son of Josce matched
At the corner of Petergate and Stonegate
Connecting persons and sites between charters
Charter Networks
corner
Steyngate/Petergate
third property
on Petergate
Charters of the Vicars
Choral (selection of 124
charters)
confidence 0.99
Cartularies as networks
Free from previous hierarchies
People, sites and events: all are nodes in a network
New indexes and catalogues
Conclusions/suggestionsfor debate
• People, institutions, things and technical devices
represented as nodes (actors) in a network are freed
from previous categories and hierarchies (Actor-
Network-Theory, Callon 1991, Latour 1992, 2005)
• Nodes and patterns in a network are starting points for
analysis and can generate new research questions and
insights
• Networks connecting information from different fonds
have potential for creating alternative virtual archives
• These can be created for a person or for a place (e.g.
grouping charters concerning the same location)
The ChartEx Virtual Workbench
Helen Petrie, Chris Power, Dave Swallow
University of York
The ChartEx Virtual Workbench
A video presentation of the workbench is available on the ChartEx website at:
http://www.chartex.org/docs/Chartex-Montreal-12102013-VIDEO.mov
Thank you!
ChartEx: Discovering spatial descriptions and
relationships in medieval charters
www.chartex.org
@ChartExProject

Contenu connexe

En vedette

The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...Digital History
 
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...DigiVictorian
 
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...Digital History
 
Cordell scientific american
Cordell scientific americanCordell scientific american
Cordell scientific americanDigital History
 
Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Digital History
 
Gareth millwood interrogating the archived uk web
Gareth millwood   interrogating the archived uk webGareth millwood   interrogating the archived uk web
Gareth millwood interrogating the archived uk webDigital History
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
 
Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Digital History
 
Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Digital History
 
Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of MusicDigital History
 
Tudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertTudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertDigital History
 
Citizen History and its Discontents
Citizen History and its DiscontentsCitizen History and its Discontents
Citizen History and its DiscontentsDigital History
 
Text Mining the History of Medicine
Text Mining the History of MedicineText Mining the History of Medicine
Text Mining the History of MedicineDigital History
 
Tracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceTracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceDigital History
 

En vedette (16)

The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...
 
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...
Remixing Digital Archives: The Victorian Meme Machine (IHR Digital History Se...
 
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
 
Cordell scientific american
Cordell scientific americanCordell scientific american
Cordell scientific american
 
Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’
 
Gareth millwood interrogating the archived uk web
Gareth millwood   interrogating the archived uk webGareth millwood   interrogating the archived uk web
Gareth millwood interrogating the archived uk web
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
 
Mapping paris
Mapping parisMapping paris
Mapping paris
 
Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...
 
Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'
 
Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of Music
 
Tudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertTudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth Ahnert
 
Citizen History and its Discontents
Citizen History and its DiscontentsCitizen History and its Discontents
Citizen History and its Discontents
 
Text Mining the History of Medicine
Text Mining the History of MedicineText Mining the History of Medicine
Text Mining the History of Medicine
 
Petrie ihr presentation
Petrie ihr presentationPetrie ihr presentation
Petrie ihr presentation
 
Tracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceTracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and Space
 

Similaire à Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'

Machine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataMachine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataitstuff
 
"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of Hamlet"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of HamletIain Emsley
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 
Large-Scale Computational Research in Arts & Humanities
Large-Scale Computational Research in Arts & HumanitiesLarge-Scale Computational Research in Arts & Humanities
Large-Scale Computational Research in Arts & HumanitiesJisc
 
PPT slides
PPT slidesPPT slides
PPT slidesbutest
 
What is this thing called REED?
What is this thing called REED?What is this thing called REED?
What is this thing called REED?John Bradley
 
Data Mining for scholarship
Data Mining for scholarshipData Mining for scholarship
Data Mining for scholarshipAlastair Dunning
 
Digital Scholarship Intersection
Digital Scholarship IntersectionDigital Scholarship Intersection
Digital Scholarship IntersectionDavid De Roure
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysisDirk Roorda
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Cornelius Puschmann
 
Converging cultures of open in language resources development
Converging cultures of open in language resources developmentConverging cultures of open in language resources development
Converging cultures of open in language resources developmentAlannah Fitzgerald
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew BibleDirk Roorda
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 

Similaire à Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' (20)

Topical_Facets
Topical_FacetsTopical_Facets
Topical_Facets
 
Machine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataMachine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-data
 
"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of Hamlet"It will discourse most eloquent music": Sonify Variants of Hamlet
"It will discourse most eloquent music": Sonify Variants of Hamlet
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Large-Scale Computational Research in Arts & Humanities
Large-Scale Computational Research in Arts & HumanitiesLarge-Scale Computational Research in Arts & Humanities
Large-Scale Computational Research in Arts & Humanities
 
PPT slides
PPT slidesPPT slides
PPT slides
 
What is this thing called REED?
What is this thing called REED?What is this thing called REED?
What is this thing called REED?
 
Data Mining for scholarship
Data Mining for scholarshipData Mining for scholarship
Data Mining for scholarship
 
Digital Scholarship Intersection
Digital Scholarship IntersectionDigital Scholarship Intersection
Digital Scholarship Intersection
 
Ancient corpora analysis
Ancient corpora analysisAncient corpora analysis
Ancient corpora analysis
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
Converging cultures of open in language resources development
Converging cultures of open in language resources developmentConverging cultures of open in language resources development
Converging cultures of open in language resources development
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew Bible
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 

Plus de Digital History

Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020Digital History
 
Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020Digital History
 
Commemorating the Great War on Twitter
Commemorating the Great War on TwitterCommemorating the Great War on Twitter
Commemorating the Great War on TwitterDigital History
 
Community Archives and Ethics
Community Archives and EthicsCommunity Archives and Ethics
Community Archives and EthicsDigital History
 
Contemporary web archives ihr
Contemporary web archives ihrContemporary web archives ihr
Contemporary web archives ihrDigital History
 
The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...
The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...
The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...Digital History
 
The Language of Migration in the Victorian Press: A Corpus Linguistic Approach
The Language of Migration in the Victorian Press: A Corpus Linguistic ApproachThe Language of Migration in the Victorian Press: A Corpus Linguistic Approach
The Language of Migration in the Victorian Press: A Corpus Linguistic ApproachDigital History
 
Identifying responses to revolution
Identifying responses to revolutionIdentifying responses to revolution
Identifying responses to revolutionDigital History
 
Chance encounters with the past
Chance encounters with the pastChance encounters with the past
Chance encounters with the pastDigital History
 
The lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersThe lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersDigital History
 
Holford mapping the medieval countryside 2014-06-17
Holford   mapping the medieval countryside 2014-06-17Holford   mapping the medieval countryside 2014-06-17
Holford mapping the medieval countryside 2014-06-17Digital History
 

Plus de Digital History (12)

Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020
 
Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020Ihr dig hist_teachingpanel_feb2020
Ihr dig hist_teachingpanel_feb2020
 
Commemorating the Great War on Twitter
Commemorating the Great War on TwitterCommemorating the Great War on Twitter
Commemorating the Great War on Twitter
 
Community Archives and Ethics
Community Archives and EthicsCommunity Archives and Ethics
Community Archives and Ethics
 
Contemporary web archives ihr
Contemporary web archives ihrContemporary web archives ihr
Contemporary web archives ihr
 
The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...
The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...
The ‘Digital Thematic Deconstruction’ of early modern urban maps and bird’s-e...
 
The Language of Migration in the Victorian Press: A Corpus Linguistic Approach
The Language of Migration in the Victorian Press: A Corpus Linguistic ApproachThe Language of Migration in the Victorian Press: A Corpus Linguistic Approach
The Language of Migration in the Victorian Press: A Corpus Linguistic Approach
 
Identifying responses to revolution
Identifying responses to revolutionIdentifying responses to revolution
Identifying responses to revolution
 
Chance encounters with the past
Chance encounters with the pastChance encounters with the past
Chance encounters with the past
 
The lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersThe lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offenders
 
History of teaching ihr
History of teaching ihrHistory of teaching ihr
History of teaching ihr
 
Holford mapping the medieval countryside 2014-06-17
Holford   mapping the medieval countryside 2014-06-17Holford   mapping the medieval countryside 2014-06-17
Holford mapping the medieval countryside 2014-06-17
 

Dernier

Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 

Dernier (20)

Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 

Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'

  • 1. ChartEx: Discovering spatial descriptions and relationships in medieval charters Introduction www.chartex.org @ChartExProject
  • 2. Digging into Data Challenge “As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these everyday materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st century scholarship” UK: AHRC, ESRC, JISC US: NEH, IMLS, NSF CAN: SSHRC Netherlands: Organisation for Scientific Research www.diggingintodata.org
  • 3. ChartEx Project Partners University of York: History and Human Computer Interaction University of Brighton: Natural Language Processing University of Leiden: Data Mining University of Washington: History, Web Services University of Toronto: History and Digital Archives Columbia University: History and Digital Libraries Data Repositories: The National Archives (UK), Borthwick, DEEDS Project U of Toronto, Columbia Digital Humanities Advisory Panel: various cognate projects
  • 4. ChartEx Team History: Sarah Rees Jones, Adam Kosto, Michael Gervers, Robin Sutherland-Harris, Stefania Perring, Jon Crump NLP: Roger Evans, Lynne Cahill Data Mining: Arno Knobbe, Marvin Meeng HCI: Helen Petrie, Christopher Power, David Swallow
  • 5. Digital Charter Collections Charters of the Vicars Choral of York Minster. City of York and its suburbs to 1546. Latin charters and English translations. The National Archives (UK) Ward 2. Abridged English translations The Borthwick Institute, Yarburgh Muniments. Abridged English translations DEEDS, University of Toronto. Latin charters of English provenance. Cluny. Latin charters of French provenance (CBMA)
  • 6. York corner of Stonegate and Petergate opposite Minster Gates detail of OS 1852 Urban Historic Topography
  • 7. A charter from English Edition of Vicars Choral collection 408. Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of half his land lying in length from Petergate at the churchyard of St. Peter to houses of the prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de Evesham inhabited; paying Thomas and his heirs 1d. or [a pair of] white gloves worth 1 d. at Christmas. Warranty. Seal. Witnesses: Geoffrey Gunwar, William de Gerford[b]y,' chaplains,Robert de Farnham, Robert le Spicer, John le plastrer, Walter de Alna goldsmith, Nicholas Page, Thomas talliator, Hugh le bedel, John de Glouc', clerks, and others. January 1252
  • 8. Chronological Sequencing and Spatial Sequencing 408. Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of half his land lying in length from Petergate at the churchyard of St. Peter to houses of the prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de Evesham inhabited 409. Grant by Mariot widow of Thomas son of Josce goldsmith of York and by Jeremy (Jeremias) son of Thomas and Mariot to mag. Simon de Evesham canon of York of land with buildings in Steyngate, granted to them by Thomas, lying in length between Petergate and land of the prebend of Ampleford, and in breadth between Steyngate and land once of Geoffrey de Norwyz precentor of York
  • 9. N Conceptual plan of logical relationships
  • 11. People, places and events in charters: exploring the language of charters within ChartEx Robin Sutherland-Harris University of Toronto Roger Evans NLTG, University of Brighton 15 November 2013
  • 12. Grant from Walter de Soke to Richard de Forde Somerset Heritage Centre, DDWHb/244
  • 13. Final concord made in the Court of Common Pleas at Westminster before James Dyer, Thomas Meade, Francis Wyndam, William Peryam, justices, in dispute between Richard Broke and George Armond, plaintiffs, and Robert Jenner, defendant, over lands (specified) in Fordham, Essex Sample document summary Summary provided by the National Archives, Ward2_55A_188_30
  • 14. Fig. 1 The phases of the Kanga Methodology <ce:cross-ref refid="bib0180"> [36]</ce:cross-ref> . The white boxes indicate the phases performed by domain experts. The formal structuring is done by domain experts using Rabbit, while the translation to OWL is ... Ronald Denaux , Catherine Dolbear , Glen Hart , Vania Dimitrova , Anthony G. Cohn Supporting domain experts to construct conceptual ontologies: A holistic approach Web Semantics: Science, Services and Agents on the World Wide Web Volume 9, Issue 2 2011 113 - 127 http://dx.doi.org/10.1016/j.websem.2011.02.001 Kanga Methodology
  • 15.
  • 16.
  • 17. BRAT rapid annotation tool Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou and Jun'ichi Tsujii (2012). brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations Session at EACL 2012. (http://brat.nlplab.org/)
  • 20. The NLP Task 408. Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of half his land lying in length from Petergate at the churchyard of St. Peter to houses of the prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de Evesham inhabited; paying Thomas and his heirs 1d. or [a pair of] white gloves worth 1 d. at Christmas. Warranty. Seal. … to some ‘semantic’ data like this Get from a text like this …
  • 21. The NLP Task 408. Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of half his land lying in length from Petergate at the churchyard of St. Peter to houses of the prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de Evesham inhabited; paying Thomas and his heirs 1d. or [a pair of] white gloves worth 1 d. at Christmas. Warranty. Seal. Get from a text like this … … which is not really very different from this (BRAT) (http://brat.nlplab.org/)
  • 22. The NLP Task 408. Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of half his land lying in length from Petergate at the churchyard of St. Peter to houses of the prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de Evesham inhabited; paying Thomas and his heirs 1d. or [a pair of] white gloves worth 1 d. at Christmas. Warranty. Seal. … although actually to us, it looks more like this T1 Document 0 17 vicars-choral-387 T2 Transaction 18 23 Grant T3 Person 27 34 William T4 Institution 73 90 prebend of Masham T5 Person 42 49 Richard T6 Occupation 51 64 canon of York … R3 is_son_of Arg1:T3 Arg2:T5 R4 is_grantor_in Arg1:T3 Arg2:T2 R5 occupation_is Arg1:T3 Arg2:T6 (http://brat.nlplab.org/) Get from a text like this …
  • 23. NLP development methodology Step 1 find out what the ‘mark up’ should be (from historians) Step 2 ask historians to mark up some documents manually Step 3 use some of these examples to develop an NLP system that does ‘the same’ Step 4 use the rest to evaluate its performance (in progress)
  • 24. The manual annotation Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of half his land lying in length from Petergate at the churchyard of St. Peter to houses of the prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de Evesham inhabited;
  • 25. NLP – layered pattern matching Word layer: basic information about individual words, collected ‘outside’ the main ChartEx NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
  • 26. NLP – layered pattern matching Word layer: basic information about individual words, collected ‘outside’ the main ChartEx NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
  • 27. NLP – layered pattern matching Word layer: basic information about individual words, collected ‘outside’ the main ChartEx NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
  • 28. NLP – layered pattern matching Word layer: basic information about individual words, collected ‘outside’ the main ChartEx NLP system (using external tools – part of speech tagger, stemmer, Soundex coding etc.)
  • 29. NLP – layered pattern matching Token layer: identify semantic types and intrinsic properties (eg gender) of known individual words (not all of these types feature in the final output). Some types are inferred – eg unknown capitalised words are proper nouns.
  • 30. NLP – layered pattern matching Token layer: identify semantic types and intrinsic properties (eg gender) of known individual words (not all of these types feature in the final output). Some types are inferred – eg unknown capitalised words are proper nouns.
  • 31. NLP – layered pattern matching Lexical layer: identify simple lexical phrases – groups of tokens that act as individual units. Promote other individual tokens to lexical items (with lexical types).
  • 32. NLP – layered pattern matching Lexical layer: identify simple lexical phrases – groups of tokens that act as individual units. Promote other individual tokens to lexical items (with lexical types).
  • 33. NLP – layered pattern matching Syntax layer: build (local) syntactic structure to identify basic constituents of the sentence.
  • 34. NLP – layered pattern matching Phrasal layer: use part-of-speech tags to build lexical items into local syntactic/semantic structures. These have lots of unbound arguments – like jigsaw pieces waiting to be slotted together.
  • 35. NLP – layered pattern matching Phrasal layer: use part-of-speech tags to build lexical items into local syntactic/semantic structures – and then ’collapse’ all the equals relations.
  • 36. NLP – layered pattern matching Semantic layer: build semantic relationships by gluing the pieces together. Again, first we use equals relations to capture the structure
  • 37. NLP – layered pattern matching Semantic layer: build semantic relationships – and then we collapse them down
  • 38. How did we do? Grant by Thomas son of Josce goldsmith and citizen of York to his younger son Jeremy of half his land lying in length from Petergate at the churchyard of St. Peter to houses of the prebend of Ampleford and in breadth from Steyngate to land which mag. Simon de Evesham inhabited;
  • 39. How did we do?
  • 40. Historians vs NLP • NLP has to be told very clearly what it is looking for • NLP has to make a lot of information explicit that people don’t notice • NLP sees ambiguity where people do not – Really struggles with coordination and attachment • NLP is programmed by computer scientists (or sometimes statisticians), not historians • NLP can do it, but it definitely needs help !
  • 41. Reconstructing spatial relationships from charters: a collaboration between Data Mining and Historical Topography Stefania Merlo Perring, Sarah Rees Jones, York Arno Knobbe, Leiden
  • 42. This presentation will : • Introduce the process used by Data Mining to reproduce the methodology of historic topography • Discuss the results and further potentials • Reflect on how Data Mining of charters and cartularies can add new meanings to Diplomatics
  • 45. Matching Relational Information Vicars Choral 408 Vicars Choral 409
  • 47. Thomas son of Josce, goldsmith • Statistics p(Thomas) = 0.12 (common name) p(Josce) = 0.0015 (uncommon name) p(Goldsmith) = 0.04 (common profession) • Dating vc-408 1252-1253 vc-409 1253-1261 • Final confidence conf (Thomas 408, Thomas 409) = 0.9993
  • 48. Ranking of candidate Person-Person matches 0.99999 Alexander/Alexander, glover, witness 0.99998 Erard/Erard, witness 0.99998 Bernard/Erard, canons, witness 0.99998 Maurice/Maurice, canons, witness 0.99997 Adam/Ada, fisherman, sibling Richard 0.99997 Gilbert/Gilbert, chaplains, witness 0.99995 Roger/Roger, saddler, witness 0.99995 Roger/Roger, saddler 0.99994 Serlo/Serlo of Staingate, chaplains …
  • 49. Thomas son of Josce matched
  • 50. At the corner of Petergate and Stonegate
  • 51. Connecting persons and sites between charters
  • 53. Charters of the Vicars Choral (selection of 124 charters) confidence 0.99 Cartularies as networks
  • 54. Free from previous hierarchies People, sites and events: all are nodes in a network
  • 55. New indexes and catalogues
  • 56. Conclusions/suggestionsfor debate • People, institutions, things and technical devices represented as nodes (actors) in a network are freed from previous categories and hierarchies (Actor- Network-Theory, Callon 1991, Latour 1992, 2005) • Nodes and patterns in a network are starting points for analysis and can generate new research questions and insights • Networks connecting information from different fonds have potential for creating alternative virtual archives • These can be created for a person or for a place (e.g. grouping charters concerning the same location)
  • 57. The ChartEx Virtual Workbench Helen Petrie, Chris Power, Dave Swallow University of York
  • 58.
  • 59. The ChartEx Virtual Workbench A video presentation of the workbench is available on the ChartEx website at: http://www.chartex.org/docs/Chartex-Montreal-12102013-VIDEO.mov
  • 61. ChartEx: Discovering spatial descriptions and relationships in medieval charters www.chartex.org @ChartExProject