SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Reusing Linguistic Resources
Tasks and Goals for a Linked Data Approach


              Marieke van Erp
              marieke@cs.vu.nl



                    LDL2012
Introduction

• BA, MA & PhD compling/
  information extraction
  @Tilburg University

• Since 2009: SemWeb group
  @VU University Amsterdam
Why Reuse Linguistic
                   Resources?
• Linguistic resources are
    expensive to create
•   ...and difficult to use for
    ‘outsiders’

• How can we reach out to the
    ‘outside world’?



                                 Image Source: http://cyberbrethren.com/wp-content/uploads/2012/02/language1.jp
Make reuse easier!


• Increased visibility
• Social value:
     • stimulates collaboration
     • accelerates innovation
• External quality control




                                  Image Source: http://th02.deviantart.net/fs71/PRE/i/2010/146/b/3/
                                            DON__T_PANIC_by_VigilantMeadow.jpg
What’s holding us back?



• Fear?
• Habit?




             Image Source: h http://mindfulbalance.files.wordpress.com/2011/02/hesitate1.jpg
Practical Constraints

1. Task specificity
2. Formats
3. Different conceptual
   models
4. No machine-readable
   definitions
5. Lack of metadata



                          Image Source: http://bogdankipko.com/wp-content/uploads/2011/12/barriers.jpg
1. Task-specificity


• Resources are often geared
  towards one specific task
  e.g., part-of-speech tagging,
  named entity recognition

• How can we make our
  resources more flexible?



                                  Image Source: http://thelearnersguild.files.wordpress.com/2008/07/the-informal-
                                                                learners-toolkit1.jpg
2. Formats

• XML, inline XML, CSV, one
  word per line, one sentence
  per line, slashtags, ARFF,




                                Image Source: http://www.elec-intro.com/EX/05-13-03/kf_compact_data.jpg
3. Conceptual Models
• An NP is an NP is an NP?
• “President Obama signed the
  National Defense
  Authorization Act after
  months of debate”
  • NE: “President Obama”?
  • NE: “Obama”?

                                Image Source: http://www.w3.org/2001/sw/BestPractices/WNET/wordnet-
                                                        sw-20040713-fig01.png
4. Lack of Machine-
               Readable definitions
• For integration or reuse
  manual effort is needed
  • time consuming
  • difficult to track definitions
  • not scalable




                                   Image Source: http://www.barcode1.co.uk/images/samplejplarge.jpg
5. Lack of Metadata

• Can I trust this data provider?
• How was this data created?
• How many annotators?
  • for the entire data set?
  • per instance?
• If generated automatically,
  what were the parameters?



                                    Image Source: http://darwin-online.org.uk/converted/published/
                                           1859_Origin_F373/1859_Origin_F373_fig02.jpg
A Linked Data Approach
• Linked Data is not a magic
  solution to all problems

• ...but it is better than what
  we’ve got at this moment




                                  Image Source: http://linkeddata.org/static/images/lod-
                                          datasets_2009-07-14_cropped.png
1. Using RDF

• RDF is not inherently better
  than some other formats, but
  it is used by many

• + SPARQL makes it easy to
  retrieve data



                                 Image Source: http://www.247ha.com/images/rdf.jpg
2. Mapping Annotations
• A single conceptual
  model for all linguistic
  resources is not going
  to happen

• ...but can we spot the
  similarities between
  models and utilise
  that?


                             Image Source:http://www.webology.org/2006/v3n3/images/sample.JPG
3. Grounding
• It’s only linked data if you link
  it to other sources

• Added bonus: automatic
  sense disambiguation + access
  to a wealth of extra
  knowledge about your data
  item


                                      Image Source: http://mj-services.com/wallpaper/More_WallPaper/Trees/Giants,
                                        %20Calaveras%20State%20Park%20-%201600x1200%20-%20ID%2015.jpg
4. Define Your Metadata
• Include your data model
• Preferably give each instance’s
  provenance
    • collection
    • annotation/creation
    • previous versions
    • confidence


                                    Image Source: http://www.wineaustralia.com/australia/Portals/2/November%20E-
                                                     news/Wines%20of%20Provenance%20Final.jpg
Conclusions
• Look for similarities between
    resources
•   Say where your resource
    comes from
•   Use standards, or make it
    easy for others to convert
    your data to a standard
•   Link to other data


                                  Image Source: http://efr0702.files.wordpress.com/2012/02/puzzle.jpg
Questions?



marieke@cs.vu.nl
http://www.cs.vu.nl/~marieke        Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg?
                                               __SQUARESPACE_CACHEVERSION=1295297003883
Acknowledgment

• This work is funded by
  NWO in the CATCH
  programme, grant
  640.004.801

Contenu connexe

Tendances

Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] SilosOpening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] SilosKen Varnum
 
Troubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL DataTroubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL DataNASIG
 
Troubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL dataTroubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL dataBeth Ashmore
 
Keeping up to date
Keeping up to dateKeeping up to date
Keeping up to dateKara Jones
 
Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Nick Sheppard
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managersNick Sheppard
 

Tendances (7)

Escaping Datageddon
Escaping DatageddonEscaping Datageddon
Escaping Datageddon
 
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] SilosOpening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
 
Troubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL DataTroubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL Data
 
Troubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL dataTroubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL data
 
Keeping up to date
Keeping up to dateKeeping up to date
Keeping up to date
 
Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managers
 

En vedette

Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events Marieke van Erp
 
Agora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic contextAgora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic contextMarieke van Erp
 
NewsReader: Automating detective work
NewsReader: Automating detective workNewsReader: Automating detective work
NewsReader: Automating detective workMarieke van Erp
 
Knowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QAKnowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QAMarieke van Erp
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
 
KM2012 Lecture 1: introduction
KM2012 Lecture 1: introductionKM2012 Lecture 1: introduction
KM2012 Lecture 1: introductionMarieke van Erp
 
Automatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from TwitterAutomatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from TwitterMarieke van Erp
 

En vedette (13)

Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events
 
Agora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic contextAgora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic context
 
KM Lecture 7 LOD
KM Lecture 7 LODKM Lecture 7 LOD
KM Lecture 7 LOD
 
Agora User Interviews
Agora User InterviewsAgora User Interviews
Agora User Interviews
 
Richness oftheworld2012
Richness oftheworld2012Richness oftheworld2012
Richness oftheworld2012
 
NewsReader: Automating detective work
NewsReader: Automating detective workNewsReader: Automating detective work
NewsReader: Automating detective work
 
Knowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QAKnowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QA
 
DeRiVE opening
DeRiVE openingDeRiVE opening
DeRiVE opening
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
KM2012 Lecture 1: introduction
KM2012 Lecture 1: introductionKM2012 Lecture 1: introduction
KM2012 Lecture 1: introduction
 
2 ontologies I
2 ontologies I2 ontologies I
2 ontologies I
 
KM Lecture11 nlp/nif
KM Lecture11 nlp/nifKM Lecture11 nlp/nif
KM Lecture11 nlp/nif
 
Automatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from TwitterAutomatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from Twitter
 

Similaire à Ldl2012

How to build a better mousetrap final
How to build a better mousetrap finalHow to build a better mousetrap final
How to build a better mousetrap finalJeannie Castro
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
A tour of the library of the future
A tour of the library of the futureA tour of the library of the future
A tour of the library of the futureBethan Ruddock
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2Yum Studio
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked DataAdrian Stevenson
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data lossIUPUI
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationMathieu d'Aquin
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Rapid eLearning
Rapid eLearning Rapid eLearning
Rapid eLearning Yum Studio
 
UCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference WorkshopUCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference WorkshopMike Moore
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingUniversity of Arizona
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 

Similaire à Ldl2012 (20)

How to build a better mousetrap final
How to build a better mousetrap finalHow to build a better mousetrap final
How to build a better mousetrap final
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
DataUp at ACRL 2013
DataUp at ACRL 2013DataUp at ACRL 2013
DataUp at ACRL 2013
 
kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
A tour of the library of the future
A tour of the library of the futureA tour of the library of the future
A tour of the library of the future
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2
 
Designing e-Learning Objects
Designing e-Learning ObjectsDesigning e-Learning Objects
Designing e-Learning Objects
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Infographics - 2012 E3 Conestoga
Infographics - 2012 E3 ConestogaInfographics - 2012 E3 Conestoga
Infographics - 2012 E3 Conestoga
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Rapid eLearning
Rapid eLearning Rapid eLearning
Rapid eLearning
 
UCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference WorkshopUCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference Workshop
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data Sharing
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 

Plus de Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebMarieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsMarieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationMarieke van Erp
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
 

Plus de Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 

Dernier

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Ldl2012

  • 1. Reusing Linguistic Resources Tasks and Goals for a Linked Data Approach Marieke van Erp marieke@cs.vu.nl LDL2012
  • 2. Introduction • BA, MA & PhD compling/ information extraction @Tilburg University • Since 2009: SemWeb group @VU University Amsterdam
  • 3. Why Reuse Linguistic Resources? • Linguistic resources are expensive to create • ...and difficult to use for ‘outsiders’ • How can we reach out to the ‘outside world’? Image Source: http://cyberbrethren.com/wp-content/uploads/2012/02/language1.jp
  • 4. Make reuse easier! • Increased visibility • Social value: • stimulates collaboration • accelerates innovation • External quality control Image Source: http://th02.deviantart.net/fs71/PRE/i/2010/146/b/3/ DON__T_PANIC_by_VigilantMeadow.jpg
  • 5. What’s holding us back? • Fear? • Habit? Image Source: h http://mindfulbalance.files.wordpress.com/2011/02/hesitate1.jpg
  • 6. Practical Constraints 1. Task specificity 2. Formats 3. Different conceptual models 4. No machine-readable definitions 5. Lack of metadata Image Source: http://bogdankipko.com/wp-content/uploads/2011/12/barriers.jpg
  • 7. 1. Task-specificity • Resources are often geared towards one specific task e.g., part-of-speech tagging, named entity recognition • How can we make our resources more flexible? Image Source: http://thelearnersguild.files.wordpress.com/2008/07/the-informal- learners-toolkit1.jpg
  • 8. 2. Formats • XML, inline XML, CSV, one word per line, one sentence per line, slashtags, ARFF, Image Source: http://www.elec-intro.com/EX/05-13-03/kf_compact_data.jpg
  • 9. 3. Conceptual Models • An NP is an NP is an NP? • “President Obama signed the National Defense Authorization Act after months of debate” • NE: “President Obama”? • NE: “Obama”? Image Source: http://www.w3.org/2001/sw/BestPractices/WNET/wordnet- sw-20040713-fig01.png
  • 10. 4. Lack of Machine- Readable definitions • For integration or reuse manual effort is needed • time consuming • difficult to track definitions • not scalable Image Source: http://www.barcode1.co.uk/images/samplejplarge.jpg
  • 11. 5. Lack of Metadata • Can I trust this data provider? • How was this data created? • How many annotators? • for the entire data set? • per instance? • If generated automatically, what were the parameters? Image Source: http://darwin-online.org.uk/converted/published/ 1859_Origin_F373/1859_Origin_F373_fig02.jpg
  • 12. A Linked Data Approach • Linked Data is not a magic solution to all problems • ...but it is better than what we’ve got at this moment Image Source: http://linkeddata.org/static/images/lod- datasets_2009-07-14_cropped.png
  • 13. 1. Using RDF • RDF is not inherently better than some other formats, but it is used by many • + SPARQL makes it easy to retrieve data Image Source: http://www.247ha.com/images/rdf.jpg
  • 14. 2. Mapping Annotations • A single conceptual model for all linguistic resources is not going to happen • ...but can we spot the similarities between models and utilise that? Image Source:http://www.webology.org/2006/v3n3/images/sample.JPG
  • 15. 3. Grounding • It’s only linked data if you link it to other sources • Added bonus: automatic sense disambiguation + access to a wealth of extra knowledge about your data item Image Source: http://mj-services.com/wallpaper/More_WallPaper/Trees/Giants, %20Calaveras%20State%20Park%20-%201600x1200%20-%20ID%2015.jpg
  • 16. 4. Define Your Metadata • Include your data model • Preferably give each instance’s provenance • collection • annotation/creation • previous versions • confidence Image Source: http://www.wineaustralia.com/australia/Portals/2/November%20E- news/Wines%20of%20Provenance%20Final.jpg
  • 17. Conclusions • Look for similarities between resources • Say where your resource comes from • Use standards, or make it easy for others to convert your data to a standard • Link to other data Image Source: http://efr0702.files.wordpress.com/2012/02/puzzle.jpg
  • 18. Questions? marieke@cs.vu.nl http://www.cs.vu.nl/~marieke Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg? __SQUARESPACE_CACHEVERSION=1295297003883
  • 19. Acknowledgment • This work is funded by NWO in the CATCH programme, grant 640.004.801