SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Automatic Reconstruction of Emperor Itineraries
from the Regesta Imperii
Juri Opitz, Leo Born, Vivi Nastase, Yannick Pultar
The RI corpus
● more than 150,000 “regests”
○ abstracts of charters issued by the Holy Roman Emperors
■ and also events (battles, births, etc.)
○ reference time span: almost 1,000 years
○ starting from the Carolingian dynasty
○ ….to Maximilian I
Historic itinerary research
● examines traveling paths of historic entities
● to examine their influence and reach
Regests have a given place name
was he really here?
empty?!
Let’s just map “Ofen” to a map -- easy
● is it?
Openstreetmap returns many places - however, not the correct one
Google returns places where I can buy Ovens.
Geonames: 19 places scattered across the globe…. Ofen, sub-part of Budapest
To sum up...
● It’s not so easy to map regest place names from a time span of almost 1k years onto maps
● To address this, we engage two main problems:
○ place name prediction: many place names are unknown or return zero candidates
○ coordinate prediction: place name queries return large candidate sets and the correct point must be chosen
Place name prediction
● Experiments with Logistic regression
○ features: last known place name, text uni grams, emperor
● baseline 1: most-frequent-place name
● baseline 2: last known place name
○ closest possible anterior regest, time-wise
Place name prediction results
% of issuers where method performed best
% of correct choices
% correct choices - mean over all issuers
For every place name...
● the place name predictions made sure that we have a non-empty set of candidates points
○ problem: sometimes the correct point is not contained in the candidate set (Future work)
Coordinate prediction
● we model the itinerary of an emperor in a DAG
● Assumption: lowest-cost path approximates the true itinerary
Edge cost heuristic
bias towards crowded places
(many places of medieval significance are still crowded
today, e.g. Rome, Nuremberg, etc.)
straight line distance
bias towards high ranked results
bias towards exact name matches
(we want to keep unexact matches, e.g.
Franckfurt -> autocorrect -> Frankfurt)
Shortest path selection enables us to obtain...
● for every regest/event a tuple of predicted lat-lng coordinates
● additionally we compute centroids
○ i.e. for every place name we compute the most centered coordinate
○ many and frequent place names have unequivocal points of reference (Rome, Nuremberg, etc.)
Gold standard
● Gold standard
○ appr. 10k place names manually resolved by HiWi interns on a place name level
■ this means that the gold standard cannot possibly account for the case where a king visited two places
of same name but different locations
○ our resolutions: event-level
○ nevertheless, it’s the best we have to evaluate against
Results of different path searches vs. time
very hard
even for historians
Staufer’s
Italian travels
dist. to gold,
lower = better
Did we find the correct Ofen?
Naive selection (random) vs optimal path
Detection of human labeling error
false: hiwi
correct: automatic
Conclusions
● optimal path better predictions than greedy and much better than random
○ evidence that our edge cost heuristic formula contains some useful information
● method can capture human annotation errors
● in some time periods, places are much harder to resolve than in others
Future work
● improve place name prediction
○ try time-series prediction models which model geo-spatial-temporal context better
○ place name normalization (Franckfurt, Vrankenforde, Franckenfurt → Frankfurt a. Main)
● improve coordinate prediction
○ improve cost heuristic
○ try historian place gazetteers instead of modern geo data bases
■ caveat: how well will they generalize across Europe and over almost 1k years?
● mine and resolve the rich place names and place name references inside the texts
○ difficult but yields new large-scale resources and options for statistical historic itinerary research!
Thank you for your attention!

Contenu connexe

Plus de IMPACT Centre of Competence

Advanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slidesAdvanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slidesIMPACT Centre of Competence
 

Plus de IMPACT Centre of Competence (20)

Session6 03.sandra young
Session6 03.sandra youngSession6 03.sandra young
Session6 03.sandra young
 
Session6 02.jeremi ochab
Session6 02.jeremi ochabSession6 02.jeremi ochab
Session6 02.jeremi ochab
 
Session5 04.evangelos varthis
Session5 04.evangelos varthisSession5 04.evangelos varthis
Session5 04.evangelos varthis
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Session5 02.tom derrick
Session5 02.tom derrickSession5 02.tom derrick
Session5 02.tom derrick
 
Session5 01.rutger vankoert
Session5 01.rutger vankoertSession5 01.rutger vankoert
Session5 01.rutger vankoert
 
Session4 04.senka drobac
Session4 04.senka drobacSession4 04.senka drobac
Session4 04.senka drobac
 
Session3 04.arnau baro
Session3 04.arnau baroSession3 04.arnau baro
Session3 04.arnau baro
 
Session3 03.christian clausner
Session3 03.christian clausnerSession3 03.christian clausner
Session3 03.christian clausner
 
Session3 02.kimmo ketunnen
Session3 02.kimmo ketunnenSession3 02.kimmo ketunnen
Session3 02.kimmo ketunnen
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
Session2 04.ashkan ashkpour
Session2 04.ashkan ashkpourSession2 04.ashkan ashkpour
Session2 04.ashkan ashkpour
 
Session2 02.christian reul
Session2 02.christian reulSession2 02.christian reul
Session2 02.christian reul
 
Session2 01.emad mohamed
Session2 01.emad mohamedSession2 01.emad mohamed
Session2 01.emad mohamed
 
Session1 04.florian fink
Session1 04.florian finkSession1 04.florian fink
Session1 04.florian fink
 
Session1 02.anna-maria sichani
Session1 02.anna-maria sichaniSession1 02.anna-maria sichani
Session1 02.anna-maria sichani
 
Session1 01.konstantin baierer
Session1 01.konstantin baiererSession1 01.konstantin baierer
Session1 01.konstantin baierer
 
Advanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slidesAdvanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slides
 
Xii simposi internacional noves tendencies
Xii simposi internacional noves tendenciesXii simposi internacional noves tendencies
Xii simposi internacional noves tendencies
 
Impact management report 2016
Impact management report 2016Impact management report 2016
Impact management report 2016
 

Dernier

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Session2 03.juri opitz

  • 1. Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii Juri Opitz, Leo Born, Vivi Nastase, Yannick Pultar
  • 2. The RI corpus ● more than 150,000 “regests” ○ abstracts of charters issued by the Holy Roman Emperors ■ and also events (battles, births, etc.) ○ reference time span: almost 1,000 years ○ starting from the Carolingian dynasty ○ ….to Maximilian I
  • 3. Historic itinerary research ● examines traveling paths of historic entities ● to examine their influence and reach
  • 4. Regests have a given place name was he really here? empty?!
  • 5. Let’s just map “Ofen” to a map -- easy ● is it?
  • 6. Openstreetmap returns many places - however, not the correct one
  • 7. Google returns places where I can buy Ovens.
  • 8. Geonames: 19 places scattered across the globe…. Ofen, sub-part of Budapest
  • 9. To sum up... ● It’s not so easy to map regest place names from a time span of almost 1k years onto maps ● To address this, we engage two main problems: ○ place name prediction: many place names are unknown or return zero candidates ○ coordinate prediction: place name queries return large candidate sets and the correct point must be chosen
  • 10. Place name prediction ● Experiments with Logistic regression ○ features: last known place name, text uni grams, emperor ● baseline 1: most-frequent-place name ● baseline 2: last known place name ○ closest possible anterior regest, time-wise
  • 11. Place name prediction results % of issuers where method performed best % of correct choices % correct choices - mean over all issuers
  • 12. For every place name... ● the place name predictions made sure that we have a non-empty set of candidates points ○ problem: sometimes the correct point is not contained in the candidate set (Future work)
  • 13. Coordinate prediction ● we model the itinerary of an emperor in a DAG ● Assumption: lowest-cost path approximates the true itinerary
  • 14.
  • 15. Edge cost heuristic bias towards crowded places (many places of medieval significance are still crowded today, e.g. Rome, Nuremberg, etc.) straight line distance bias towards high ranked results bias towards exact name matches (we want to keep unexact matches, e.g. Franckfurt -> autocorrect -> Frankfurt)
  • 16. Shortest path selection enables us to obtain... ● for every regest/event a tuple of predicted lat-lng coordinates ● additionally we compute centroids ○ i.e. for every place name we compute the most centered coordinate ○ many and frequent place names have unequivocal points of reference (Rome, Nuremberg, etc.)
  • 17. Gold standard ● Gold standard ○ appr. 10k place names manually resolved by HiWi interns on a place name level ■ this means that the gold standard cannot possibly account for the case where a king visited two places of same name but different locations ○ our resolutions: event-level ○ nevertheless, it’s the best we have to evaluate against
  • 18. Results of different path searches vs. time very hard even for historians Staufer’s Italian travels dist. to gold, lower = better
  • 19. Did we find the correct Ofen?
  • 20. Naive selection (random) vs optimal path
  • 21. Detection of human labeling error false: hiwi correct: automatic
  • 22. Conclusions ● optimal path better predictions than greedy and much better than random ○ evidence that our edge cost heuristic formula contains some useful information ● method can capture human annotation errors ● in some time periods, places are much harder to resolve than in others
  • 23. Future work ● improve place name prediction ○ try time-series prediction models which model geo-spatial-temporal context better ○ place name normalization (Franckfurt, Vrankenforde, Franckenfurt → Frankfurt a. Main) ● improve coordinate prediction ○ improve cost heuristic ○ try historian place gazetteers instead of modern geo data bases ■ caveat: how well will they generalize across Europe and over almost 1k years? ● mine and resolve the rich place names and place name references inside the texts ○ difficult but yields new large-scale resources and options for statistical historic itinerary research!
  • 24. Thank you for your attention!