SlideShare a Scribd company logo
1 of 26
Download to read offline
Agora: putting museum objects
into their art-historic context

          Marieke van Erp
          marieke@cs.vu.nl



             EURECOM July 2012
Introduction

• BA, MA & PhD
  Computational Linguistics/
  Information Extraction
  @Tilburg University

• Since 2009: SemWeb group
  @VU University Amsterdam
Overview

• The Agora Project
• Digital Hermeneutics
• Building an Event Thesaurus
  for Dutch
  • Experiments & Results
  • Outlook

                                Image src: http://www.artrage.com.au/dreamgirl/filesend/223/
                                EarthFromAbove_EXPOTVDC212_prog.jpg
The Agora Project

• Collaboration VU CS &
  History departments,
  Netherlands Institute for
  Sound and Vision and
  Rijksmuseum Amsterdam

• Facilitate and investigate
  digitally mediated public
  history
Digitising Heritage


•   Galleries, libraries, archives and
    museums (GLAMS) are digitising
    their data and presenting it online
•   This changes the role of GLAMS
    from information interpreters to
    information providers
•   In the online setting, objects can
    easily start to lead their own lives


                                           Image source: http://terracebay.library.on.ca/wp-content/uploads/2011/04/clip_image002.jpg
Digital Hermeneutics


• An object on its own has no
    meaning; event descriptions
    provide historical context
•   A single event only gives part
    of the historical context;
    chains of events (narratives)
    provide a more complete
    overview
                                     Image src: http://3.bp.blogspot.com/-7nXcVdW0_wc/Th0JDRIT1GI/AAAAAAAAIEk/
                                     IoPReKrojkY/s1600/42st.jpg
Event Dimension
                                                                   19/12/1948

                                                                  rma:creationDate

                                   sem:hasBeginTimeStamp                                   sem:hasBeginTimeStamp

               sem:Actor                                                                                                    sem:Actor



                 rdf:type                                                                                                    rdf:type


            Netherlands                                                                                    rma:maker      Mohammed
                                                                                                                            Toha




                                                      Painting: Three Fighter Aircraft in the Sky

                                                                                                                                   sem:
                    sem:
                                                                 rma:creationPlace                                               hasActor
                  hasActor
                                      agora:depictsEvent                                      agora:createsEvent


                                                                   Yogyakarta
                                                                                     sem:hasPlace          Mohammed Toha
sem:Event   rdf:type
                            The Attack on     sem:hasPlace             rdf:type
                                                                                                         Paints "Three Fighter          rdf:type   sem:Event
                             Yogyakarta                                                                   Aircraft in the Sky"
                                                                     sem:Place
Narratives                                                 1945 - 1946
Armed                                               sem:hasTimeStamp
Conflict
                   sem:
                 eventType
                                 The Attack on
                                  Yogyakarta
                                                       sem:hasPlace
                                                                            Indonesia


                       sem:hasActor

          KNIL


                                           agora:hasBiographicalRelation




                                                                        19/12/1948 - 31/12/1948
Armed                                               sem:hasTimeStamp
Conflict
                   sem:
                 eventType
                                      Operation
                                        Crow
                                                       sem:hasPlace          Sumatra


                       sem:hasActor


          KNIL


                                            agora:hasBiographicalRelation




                                                                           01/03/1949
                                                    sem:hasTimeStamp
Attack
                   sem:
                 eventType
                                 The Attack on
                                  Yogyakarta
                                                       sem:hasPlace
                                                                            Yogyakarta

                       sem:hasActor

          KNIL
Event-driven Browsing
Event-driven Browsing
Event-driven Browsing
Building an Event Thesaurus

•   There are no extensive structured
    event descriptions
•   Rijksmuseum Amsterdam has a
    flat list of 1,693 ‘events’: only
    names and very much focused on
    17th century Holland
•   Our goal:
       • create a list of historically
           relevant events
       • provide actors, locations,
           times & types for each event
                                          Image src: http://www.collinsdictionary.com/static/graphics/default.png
First Attempt
•   Pattern based event-name
    extraction
       • In Dutch Wikipedia we
         found 2,444 event
         candidates
       • 1209 (56.3%) correct
       • 169 (13.9%) partially
         correct
•   Off-the-shelf named entity
    recognition (P/R/F1)
       • Person 77/77/77
       • Location 75/58/66
       • Organisation 32/37/34
                                 Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                 %205.jpg
First Attempt
• Co-occurrence based event-
  relation finder
     • only actor, location and/
        or date found for 392
        events
     • 49.6% actor is correct
     • 41.1% location is correct
     • 51.5% date is correct



                                   Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                   %205.jpg
First Attempt
• Problems event element recognition:
     • Shallow grammatical
         processing (post-war rebuilding
         and during the North sea flood
         recognised as 1 event)
     •   Missing locations (Battle of
         LOC pattern fails)
     •   No distinction between
         entities and action nouns
         (German Occupancy vs German
         Occupants look the same for
         the approach)
     •   Named Entity Recogniser not
         suited for domain
                                           Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                           %205.jpg
First Attempt
• Problems event relation
  finder:
    • Relies on redundancy in
      the data, only works for
      ‘popular’ events
    • Too coarse-grained (who
      were the actors/locations
      in WWII)
    • Evaluation is hard!

                                  Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion
                                  %205.jpg
Back to the drawing board...
• Analysis of event names
     • Combinations of sortal nouns with
          a PP and a named entity e.g., Battle
          of Stalingrad, Death of John Lennon
      •   Combinations of nominalized verbs
          with a PP and a named entity e.g,
          Excavation of Troy, Election of
          Obama.
      •   Combinations of a referential
          adjective with an event type and
          named entity e.g., the American
          invasion of Iraq.
      •   Transparent proper names: Great
          War
      •   Opaque proper names: Event
          names that can not be decomposed
          on morphological grounds e.g.,
          Holocaust, Spanish Fury
                                                 Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                                 molinotrashfire10.jpg
Back to the drawing board...
• Improve Named Entity
  Recognition
    • Add gazetteers for
      historical names
    • Post-processing for titles
      and improved NE
      boundaries




                                   Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                   molinotrashfire10.jpg
Back to the drawing board...
• Finding Event Relations
     • Use structure Wikipedia/
        DBpedia
    •   Shallow parsing
    •   Hierarchies of actors &
        locations




                                  Image src: http://www.northescambia.com/wp-content/uploads/2010/01/
                                  molinotrashfire10.jpg
Current Work
               Spotlight (P/R/F) Stanford (P/R/F1) Freire (P/R/F1)
  Person        54.05/7.52/13.20    58.60/34.46/43.40   79.17/71.16/74.95

 Location       64.52/30.77/41.67   67.19/66.15/66.67   80.00/61.54/69.57

Organisation          0/0/0         9.78/25.71/14.17    89.66/74.29/81.25



      • Still some work to be done, but
      Freire et al. (2012) shows that smart
      features can work with small amounts
      of training data
      • Combine classifiers
      • Add post-processing
      • MISC Class remains to be done...
Current Work
                                      Word                      POS       CHUNK       NER
                                      U.N.                      NNP       I-NP        I-ORG
                                      official                  NN        I-NP        O
                                      Ekeus                     NNP       I-NP        I-PER
                                      heads                     VBZ       I-VP        O
                                      for                       IN        I-PP        O
                                      Baghdad                   NNP       I-NP        I-LOC
                                      .                         .         O           O     [CoNLL2003]
focus,minthree,mintwo,minone,plusone,plustwo,fnfreq,lnfreq,ncfreq,orgfreq,geo,n,v,a,adv,pn,cap,allcaps,beg,end,length,capfreq,class
"is","wood",")","and","painted","dark",0,0,0,2.45253198865684,0,0,0,1,0,0,0,0,0,0,2,0,"O"
"painted",")","and","is","dark","grey",0,0,0,0,0,0,0,0,1,0,0,0,0,0,7,0,"O"
"dark","and","is","painted","grey",".",0,0,0,0.493875418347986,0,0,1,0,1,0,0,0,0,0,4,0,"O"
"grey","is","painted","dark",".","William",0,0,0,0.0768052510316108,0,1,1,1,1,0,0,0,0,0,4,0,"O"
".","painted","dark","grey","William","Herschel",0,0,0,2.36647279037729,0,0,0,0,0,0,0,1,0,0,1,0,"O"
"William","dark","grey",".","Herschel","made",8.2034429051892,3.27892030900003,0,4.67158565874127,0,0,0,0,0,0,1,0,0,0,7,0,"B-PER"
"Herschel","grey",".","William","made","many",2.36726761611533,2.39936346938848,0,0.443930767784,0,1,1,0,0,0,1,0,0,0,8,0,"I-PER"
"made",".","William","Herschel","many","telescopes",0,0,0,0.493875418347986,0,0,0,1,1,0,0,0,0,0,4,0,"O"
"many","William","Herschel","made","telescopes","of",0,0,0,0.0768052510316108,0,0,0,0,1,0,0,0,0,0,4,0,"O"
"telescopes","Herschel","made","many","of","this",0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,"O"


                                                                                                   [Freire et al. 2012]
Current Work

• Build smarter extractors for
  event names
    • First focus on ‘regular’
       event names (e.g., Battle
       of LOC, War of YEAR)
    • Use knowledge about
       action nouns vs static
       nouns (WordNet)
The Story So Far

• It takes time to learn to
    communicate in an
    interdisciplinary project
•   Don’t try to solve too much
    in one go
•   Cycles of error analysis
•   Domain adaptation is difficult:
    optimise for precision
Outlook

• Redesign of Agora demo (new
    version autumn/winter)
•   Include different perspectives
    (together with Semantics of
    History)
•   Ship model use case
•   Historical Named Entity
    Recognition for English & Dutch
•   2nd round user studies (spring
    2013)
¿
                                                   ?                                                                           ?

                                                             ¿

                               Questions?

                                                ?
marieke@cs.vu.nl
                                                                                                                        ¿
http://www.cs.vu.nl/~marieke        Image src: http://www.rijksmuseum.nl/collectie/SK-A-2963/portret-van-don-ram%C3%B3n-satu
                                    Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg?
                                    %C3%A9-1765-1824
                                               __SQUARESPACE_CACHEVERSION=1295297003883

More Related Content

More from Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebMarieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsMarieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationMarieke van Erp
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
 

More from Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Agora: putting museum objects into their art-historic context

  • 1. Agora: putting museum objects into their art-historic context Marieke van Erp marieke@cs.vu.nl EURECOM July 2012
  • 2. Introduction • BA, MA & PhD Computational Linguistics/ Information Extraction @Tilburg University • Since 2009: SemWeb group @VU University Amsterdam
  • 3. Overview • The Agora Project • Digital Hermeneutics • Building an Event Thesaurus for Dutch • Experiments & Results • Outlook Image src: http://www.artrage.com.au/dreamgirl/filesend/223/ EarthFromAbove_EXPOTVDC212_prog.jpg
  • 4. The Agora Project • Collaboration VU CS & History departments, Netherlands Institute for Sound and Vision and Rijksmuseum Amsterdam • Facilitate and investigate digitally mediated public history
  • 5. Digitising Heritage • Galleries, libraries, archives and museums (GLAMS) are digitising their data and presenting it online • This changes the role of GLAMS from information interpreters to information providers • In the online setting, objects can easily start to lead their own lives Image source: http://terracebay.library.on.ca/wp-content/uploads/2011/04/clip_image002.jpg
  • 6.
  • 7. Digital Hermeneutics • An object on its own has no meaning; event descriptions provide historical context • A single event only gives part of the historical context; chains of events (narratives) provide a more complete overview Image src: http://3.bp.blogspot.com/-7nXcVdW0_wc/Th0JDRIT1GI/AAAAAAAAIEk/ IoPReKrojkY/s1600/42st.jpg
  • 8. Event Dimension 19/12/1948 rma:creationDate sem:hasBeginTimeStamp sem:hasBeginTimeStamp sem:Actor sem:Actor rdf:type rdf:type Netherlands rma:maker Mohammed Toha Painting: Three Fighter Aircraft in the Sky sem: sem: rma:creationPlace hasActor hasActor agora:depictsEvent agora:createsEvent Yogyakarta sem:hasPlace Mohammed Toha sem:Event rdf:type The Attack on sem:hasPlace rdf:type Paints "Three Fighter rdf:type sem:Event Yogyakarta Aircraft in the Sky" sem:Place
  • 9. Narratives 1945 - 1946 Armed sem:hasTimeStamp Conflict sem: eventType The Attack on Yogyakarta sem:hasPlace Indonesia sem:hasActor KNIL agora:hasBiographicalRelation 19/12/1948 - 31/12/1948 Armed sem:hasTimeStamp Conflict sem: eventType Operation Crow sem:hasPlace Sumatra sem:hasActor KNIL agora:hasBiographicalRelation 01/03/1949 sem:hasTimeStamp Attack sem: eventType The Attack on Yogyakarta sem:hasPlace Yogyakarta sem:hasActor KNIL
  • 13. Building an Event Thesaurus • There are no extensive structured event descriptions • Rijksmuseum Amsterdam has a flat list of 1,693 ‘events’: only names and very much focused on 17th century Holland • Our goal: • create a list of historically relevant events • provide actors, locations, times & types for each event Image src: http://www.collinsdictionary.com/static/graphics/default.png
  • 14. First Attempt • Pattern based event-name extraction • In Dutch Wikipedia we found 2,444 event candidates • 1209 (56.3%) correct • 169 (13.9%) partially correct • Off-the-shelf named entity recognition (P/R/F1) • Person 77/77/77 • Location 75/58/66 • Organisation 32/37/34 Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 15. First Attempt • Co-occurrence based event- relation finder • only actor, location and/ or date found for 392 events • 49.6% actor is correct • 41.1% location is correct • 51.5% date is correct Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 16. First Attempt • Problems event element recognition: • Shallow grammatical processing (post-war rebuilding and during the North sea flood recognised as 1 event) • Missing locations (Battle of LOC pattern fails) • No distinction between entities and action nouns (German Occupancy vs German Occupants look the same for the approach) • Named Entity Recogniser not suited for domain Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 17. First Attempt • Problems event relation finder: • Relies on redundancy in the data, only works for ‘popular’ events • Too coarse-grained (who were the actors/locations in WWII) • Evaluation is hard! Image src: http://www.spaceg.com/multimedia/collection/explosions/atomic%20explosion %205.jpg
  • 18. Back to the drawing board... • Analysis of event names • Combinations of sortal nouns with a PP and a named entity e.g., Battle of Stalingrad, Death of John Lennon • Combinations of nominalized verbs with a PP and a named entity e.g, Excavation of Troy, Election of Obama. • Combinations of a referential adjective with an event type and named entity e.g., the American invasion of Iraq. • Transparent proper names: Great War • Opaque proper names: Event names that can not be decomposed on morphological grounds e.g., Holocaust, Spanish Fury Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 19. Back to the drawing board... • Improve Named Entity Recognition • Add gazetteers for historical names • Post-processing for titles and improved NE boundaries Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 20. Back to the drawing board... • Finding Event Relations • Use structure Wikipedia/ DBpedia • Shallow parsing • Hierarchies of actors & locations Image src: http://www.northescambia.com/wp-content/uploads/2010/01/ molinotrashfire10.jpg
  • 21. Current Work Spotlight (P/R/F) Stanford (P/R/F1) Freire (P/R/F1) Person 54.05/7.52/13.20 58.60/34.46/43.40 79.17/71.16/74.95 Location 64.52/30.77/41.67 67.19/66.15/66.67 80.00/61.54/69.57 Organisation 0/0/0 9.78/25.71/14.17 89.66/74.29/81.25 • Still some work to be done, but Freire et al. (2012) shows that smart features can work with small amounts of training data • Combine classifiers • Add post-processing • MISC Class remains to be done...
  • 22. Current Work Word POS CHUNK NER U.N. NNP I-NP I-ORG official NN I-NP O Ekeus NNP I-NP I-PER heads VBZ I-VP O for IN I-PP O Baghdad NNP I-NP I-LOC . . O O [CoNLL2003] focus,minthree,mintwo,minone,plusone,plustwo,fnfreq,lnfreq,ncfreq,orgfreq,geo,n,v,a,adv,pn,cap,allcaps,beg,end,length,capfreq,class "is","wood",")","and","painted","dark",0,0,0,2.45253198865684,0,0,0,1,0,0,0,0,0,0,2,0,"O" "painted",")","and","is","dark","grey",0,0,0,0,0,0,0,0,1,0,0,0,0,0,7,0,"O" "dark","and","is","painted","grey",".",0,0,0,0.493875418347986,0,0,1,0,1,0,0,0,0,0,4,0,"O" "grey","is","painted","dark",".","William",0,0,0,0.0768052510316108,0,1,1,1,1,0,0,0,0,0,4,0,"O" ".","painted","dark","grey","William","Herschel",0,0,0,2.36647279037729,0,0,0,0,0,0,0,1,0,0,1,0,"O" "William","dark","grey",".","Herschel","made",8.2034429051892,3.27892030900003,0,4.67158565874127,0,0,0,0,0,0,1,0,0,0,7,0,"B-PER" "Herschel","grey",".","William","made","many",2.36726761611533,2.39936346938848,0,0.443930767784,0,1,1,0,0,0,1,0,0,0,8,0,"I-PER" "made",".","William","Herschel","many","telescopes",0,0,0,0.493875418347986,0,0,0,1,1,0,0,0,0,0,4,0,"O" "many","William","Herschel","made","telescopes","of",0,0,0,0.0768052510316108,0,0,0,0,1,0,0,0,0,0,4,0,"O" "telescopes","Herschel","made","many","of","this",0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,"O" [Freire et al. 2012]
  • 23. Current Work • Build smarter extractors for event names • First focus on ‘regular’ event names (e.g., Battle of LOC, War of YEAR) • Use knowledge about action nouns vs static nouns (WordNet)
  • 24. The Story So Far • It takes time to learn to communicate in an interdisciplinary project • Don’t try to solve too much in one go • Cycles of error analysis • Domain adaptation is difficult: optimise for precision
  • 25. Outlook • Redesign of Agora demo (new version autumn/winter) • Include different perspectives (together with Semantics of History) • Ship model use case • Historical Named Entity Recognition for English & Dutch • 2nd round user studies (spring 2013)
  • 26. ¿ ? ? ¿ Questions? ? marieke@cs.vu.nl ¿ http://www.cs.vu.nl/~marieke Image src: http://www.rijksmuseum.nl/collectie/SK-A-2963/portret-van-don-ram%C3%B3n-satu Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg? %C3%A9-1765-1824 __SQUARESPACE_CACHEVERSION=1295297003883