Agnes Thomas, Francesco Mambrini & Matteo Romanello (DAI, Berlin)
'Insights in the World of Thucydides: The Hellespont Project as a research environment for Digital History'.
Digital Classicist London & Institute of Classical Studies seminar 2013, Friday August 9th.
The Hellespont Project (German Archaeological Institute and Tufts University) aims to integrate two of the largest online collections for the study of Antiquity, the Perseus Digital Library and the Arachne archaeological database, in a dynamic digital research environment. Historians will have access to materials and resources of heterogeneous type, like ancient texts, archaeological evidence, historical background, and modern scholarly literature, while the documents related to each single historical event taken from the textual evidence will be interconnected through the CIDOC-CRM model.
Hellespont as a case study focuses on a limited historical period, the 50-year period in the history of Athens between the end of the Persian Wars (479 BCE) and the outburst of the Peloponnesian War (431 BCE). Furthermore, it follows the narration presented by the most important written source, chapters 1.89-118 of the Histories of Thucydides, who was a contemporary to some of the facts. One of the point of departure for the project is the annotation of Thucydides' text with multiple layers of linguistic information. Our goal is really to create a "digital sourcebook" including a lot of machine-actionable information, where historians can go to find references to sources, and tools to help linguistic analysis of the original texts.
Documents are bridged using the event-based CIDOC-CRM. We are working with two different concepts of events. In CIDOC ontology, events encompass all changes of states in cultural systems: they are identified by reference to historical scholarship. In Ancient History, where event reconstruction is mostly based on the interpretation of written sources, this definition isinsufficient. We are therefore implementing a data-driven approach, based on the semantic/syntactic strategies that express mutation in the external words through language. We aim to identify such strategies through a fine-grained semantic annotation of the written ancient texts.
We are going to present the digitally analysed text of Thucydides including different kind of additional information in a single Virtual Research Environment (VRE). The interface, which is currently still being implemented, is based on the same idea of GapVis, that is a visual interface for reading texts providing the user with multiple views on the same passage of text. In the presentation we will show the most important parts of the different views the user will access in the interface.
Digital Classicist London Seminars 2013 - Seminar 10 - Agnes Thomas et al.
1. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Insights in the World of Thucydides
The Hellespont Project as a research environment for Digital
History
A. Thomasab F. Mambrinib M. Romanellobc
aUniversität zu Köln
bDeutsches Archäologisches Institut, Berlin
cKing’s College, London
August 9, 2013
Thomas, Mambrini, Romanello The Hellespont Project
2. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Outline
1 Introduction
2 The GapVis Interface
3 Event annotation
Manual event annotation
Linguistic annotation
4 Secondary Literature
Thomas, Mambrini, Romanello The Hellespont Project
3. Introduction
The GapVis Interface
Event annotation
Secondary Literature
The Hellespont Project
Integrating Arachne and Perseus
October 2010 - September 2013
http://arachne.uni-koeln.de/drupal/?q=de/node/231
Thomas, Mambrini, Romanello The Hellespont Project
4. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Cooperating Institutions and Persons
German Archaeological
Institute Berlin:
Ortwin Dally
Reinhard Förtsch
Francesco Mambrini
Matteo Romanello
Wolfgang Schmidle
The Perseus Project:
Bridget Almas
Alison Babeu
Lisa Cerrato
Gregory Crane
Cologne Digital
Archaeology Laboratory:
Carina Berning
Robert Kummer
Alexander Recht
Marcel Riedel
Karen Schwane
Agnes Thomas
Thomas, Mambrini, Romanello The Hellespont Project
5. Introduction
The GapVis Interface
Event annotation
Secondary Literature
GapVis for Hellespont
Named entities, linguistic information, event annotation, and
bibliography connected in one interface:
A case study on Thuc. 1.89-118
Different formats (TEI, CIDOC-CRM, AGDT, PML. . . )
User interface based on GapVis:
http://nrabinowitz.github.io/gapvis
Thomas, Mambrini, Romanello The Hellespont Project
11. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Outline
1 Introduction
2 The GapVis Interface
3 Event annotation
Manual event annotation
Linguistic annotation
4 Secondary Literature
Thomas, Mambrini, Romanello The Hellespont Project
12. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Going through secondary literature
Thomas, Mambrini, Romanello The Hellespont Project
13. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Event List
Thomas, Mambrini, Romanello The Hellespont Project
14. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Oinophyta Event
Thomas, Mambrini, Romanello The Hellespont Project
15. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Myronides as a general
Thomas, Mambrini, Romanello The Hellespont Project
16. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Outline
1 Introduction
2 The GapVis Interface
3 Event annotation
Manual event annotation
Linguistic annotation
4 Secondary Literature
Thomas, Mambrini, Romanello The Hellespont Project
17. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Natural language and events
Thuc. 1.102.2
μάλιστα δ᾿ αὐτοὺς ἐπεκαλέσαντο ὅτι τειχομαχεῖν ἐδόκουν δυνατοὶ
εἶναι, τοῖς δὲ πολιορκίας μακρᾶς καθεστηκυίας τούτου ἐνδεᾶ
ἐφαίνετο: βίᾳ γὰρ ἂν εἷλον τὸ χωρίον.
Translation
[They] invited them especially because [they] considered [them]
particularly skilled in siege operations, while, since the siege for
them was dragging on, [their] own deficiency in that sort of
warfare was clear: for otherwise [they] would have taken the
place by force.
Thomas, Mambrini, Romanello The Hellespont Project
18. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Natural language and events
Thuc. 1.102.2
μάλιστα δ᾿ αὐτοὺς ἐπεκαλέσαντο ὅτι τειχομαχεῖν ἐδόκουν δυνατοὶ
εἶναι, τοῖς δὲ πολιορκίας μακρᾶς καθεστηκυίας τούτου ἐνδεᾶ
ἐφαίνετο: βίᾳ γὰρ ἂν εἷλον τὸ χωρίον.
Translation
[The siege of Ithome proved tedious, and the Lacedaemonians
called in, among other allies, the Athenians . . . ]
[They] invited them especially because [they] considered [them]
particularly skilled in siege operations, while, since the siege for
them was dragging on, [their] own deficiency in that sort of
warfare was clear: for otherwise [they] would have taken the
place by force.
Thomas, Mambrini, Romanello The Hellespont Project
19. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Natural language and events
Thuc. 1.102.2
μάλιστα δ᾿ αὐτοὺς ἐπεκαλέσαντο ὅτι τειχομαχεῖν ἐδόκουν δυνατοὶ
εἶναι, τοῖς δὲ πολιορκίας μακρᾶς καθεστηκυίας τούτου ἐνδεᾶ
ἐφαίνετο: βίᾳ γὰρ ἂν εἷλον τὸ χωρίον.
Translation
[The siege of Ithome proved tedious, and the Lacedaemonians
called in, among other allies, the Athenians . . . ]
[They] invited them especially because [they] considered [them]
particularly skilled in siege operations, while, since the siege for
them was dragging on, [their] own deficiency in that sort of
warfare was clear: for otherwise [they] would have taken the
place by force.
Thomas, Mambrini, Romanello The Hellespont Project
20. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Natural language and events
Thuc. 1.102.2
μάλιστα δ᾿ αὐτοὺς ἐπεκαλέσαντο ὅτι τειχομαχεῖν ἐδόκουν δυνατοὶ
εἶναι, τοῖς δὲ πολιορκίας μακρᾶς καθεστηκυίας τούτου ἐνδεᾶ
ἐφαίνετο: βίᾳ γὰρ ἂν εἷλον τὸ χωρίον.
Translation
[The siege of Ithome proved tedious, and the Lacedaemonians
called in, among other allies, the Athenians . . . ]
[They] invited them especially because [they] considered [them]
particularly skilled in siege operations, while, since the siege for
them was dragging on, [their] own deficiency in that sort of
warfare was clear: for otherwise [they] would have taken the
place by force.
Thomas, Mambrini, Romanello The Hellespont Project
21. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
NLP Pipeline
Tokenization POS-Tagging
Syntactic
Parsing
Thematic
Roles
Information
Structure
Coreference
Resolution
Thomas, Mambrini, Romanello The Hellespont Project
22. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
NLP Pipeline
NLP Process Ancient Greek?
Chunking
Lemmatization
POS-tagging
Syntactic parsing
Word-sense disambiguation
Co-reference resolution
Semantic role annotation
Thomas, Mambrini, Romanello The Hellespont Project
23. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Using and Enhancing the available resources
The Ancient Greek Dependency Treebank
AGDT: treebank with word-by-word morphological and
dependency-based syntactical description
a step forward: semantic information
Thomas, Mambrini, Romanello The Hellespont Project
24. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Analytical Level
“Surface” syntax
a-a-1999.01.0199_book1-chapter89_3
AuxS
οἱ
Atr
γὰρ
AuxY
Ἀθηναῖοι
Sb
τρόπῳ
Adv
τοιῷδε
Atr
ἦλθον
Pred
ἐπὶ
AuxP
τὰ
Atr
πράγματα
Obj
ἐν
ηὐξήθησαν
Atr
.
AuxK
Thomas, Mambrini, Romanello The Hellespont Project
25. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Valency
The verbal node expresses a little drama. As a
drama, it implies a process and, most of the times,
actors and circumstances
L. Tesnière
Thomas, Mambrini, Romanello The Hellespont Project
26. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Tectogrammatical annotation
t-t_tree-grc-s1-root
root
γάρ1
PREC
atom
Ἀθηναῖος1
ACT
n.denot
ἔρχομαι1 enunc
PRED
v
πρᾶγμα1
DIR3 state
n.denot
ὅς1
ACMP circ
n.denot
#PersPron
ACT
n.denot
αὐξάνω1
RSTR
v
τρόπος1
MANN
n.denot
τοιόσδε1
RSTR
adj.pron.def.demon
.
.
.
Thomas, Mambrini, Romanello The Hellespont Project
27. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
From treebanks to event data-bases
Thomas, Mambrini, Romanello The Hellespont Project
28. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
What can you do with multi-layer trees?
“Meaningful” relations between NEs
[The Athenians]. . . brought
the territories of Boeotia and
Phocis under their obedience,
and withal razed the walls of
Tanagra and took of the
wealthiest of the Locrians of
Opus a hundred hostages,
and finished also at the same
time their long walls at home
(1.108.3)
Thomas, Mambrini, Romanello The Hellespont Project
29. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
Maps with semantically relevant relations
E.g. travels by sea
πλέω
(sail)
Actor
DIR 3 (to)
DIR1 (from)
The Athenians
Other NE's
Thomas, Mambrini, Romanello The Hellespont Project
30. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
What can you do with multi-layer trees?
Extraction and analysis of events
What actions do the Athenians perform?
Thomas, Mambrini, Romanello The Hellespont Project
31. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Manual event annotation
Linguistic annotation
What can you do with multi-layer trees?
Extraction and analysis of events
What actions do the Spartans perform?
Thomas, Mambrini, Romanello The Hellespont Project
32. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Related Secondary Literature (from JSTOR)
Figure : http://tiny.cc/GapVis-SecLit
Thomas, Mambrini, Romanello The Hellespont Project
33. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Mining JSTOR:
Where is Thuc. “hiding”?
A meaningful subsample
mining citations from all ~171k journal articles, not the best approach
curated bibliography (2009) before project started (CiteULike)
articles in JSTOR related to Thuc 1.89-118
343 articles, 62 journals
journals from bibliography as “seeds”
samples ~73k articles (out of ~171k)
top-down vs bottom-up bibliographic approach
Pros and Cons
comprehensive coverage; > 2 centuries; multilingual
data not openly licensed
Thomas, Mambrini, Romanello The Hellespont Project
36. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Extracting Citations: Challenges
sentence segmentation
sentence = sensible unit of context
both for extraction and data analysis (co-citation)
dirty OCR
invalid character sequences (e.g. n)
“inconsistent” use of punctuation
1, 110-15 ; 1.89.1, 1.90 ; I 1, 102, 1
solution: reason based on domain knowledge
similar references, surface similarity
fragments, papyri, inscriptions
Thomas, Mambrini, Romanello The Hellespont Project
37. Introduction
The GapVis Interface
Event annotation
Secondary Literature
Thank you!
Our contacts and temporary development server
agnes.thomas@uni-koeln.de
francesco.mambrini@dainst.de
mattero.romanello@dainst.de
http://www.tiny.cc/GapVis-Hellespont
Thomas, Mambrini, Romanello The Hellespont Project