SlideShare une entreprise Scribd logo
1  sur  33
BIS – 2013/04/15 – Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
AKSW, Universität Leipzig
Sebastian Hellmann
PhD thesis intermediate report
NLP Interchange Format (NIF) 2.0
http://nlp2rdf.org
http://lod2.eu
http://slideshare.net/kurzum
DISCLAIMER:
this presentation is work in progress, example RDF is outdated
BIS – 2013/04/15 – Page 2 http://lod2.eu
NLP Interchange Format 2.0
BIS – 2013/04/15 – Page 3 http://lod2.eu
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
• NIF 2.0 will be published in 6-8 weeks
• Highly probable to become the de-facto standard for modelling RDF tool
output in the NLP domain
NLP Interchange Format 2.0
BIS – 2013/04/15 – Page 4 http://lod2.eu
Introduction
Components have pre- and postconditions
auto configuration theoretical possible, but in reality a lot of manual work
BIS – 2013/04/15 – Page 5 http://lod2.eu
Introduction
Components have pre- and postconditions
auto configuration theoretical possible, but in reality a lot of manual work
Huge potential to save time and money at the interfaces
BIS – 2013/04/15 – Page 6 http://lod2.eu
Core problems:
1. Too much heterogeneity
2. Almost no standards available
3. No open collaboration
4. Difficult and large domain
Problem analysis
BIS – 2013/04/15 – Page 7 http://lod2.eu
Technical heterogeneity
• Technologies: XML, Relational Databases, CSV, DOC, PDF
• Similar to other domains
• Formats: Negra, CoNLL, GrAF, Paula, CAS (UIMA), Penn
• Virtually each tool has implemented readers for the 5-6 formats + its
own serialization
• Programming languages: Java, Python, ...
• Java has predominance
Problem analysis
BIS – 2013/04/15 – Page 8 http://lod2.eu
Domain heterogeneity
• Multilingualism
• Over 100 part of speech tags (several for each language)
• No open mappings exist
• About 20 different tasks listed on:
http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP
• Natural language is a difficult topic:
• The roulette dealer siad: “Rien ne va plus!”
– 8 words, 4 French, 4 English, one spelling mistake, impossible to
decide the language of the whole.
• Ban on Nude Dancing on Governor's Desk
Problem analysis
BIS – 2013/04/15 – Page 9 http://lod2.eu
Problem analysis
BIS – 2013/04/15 – Page 10 http://lod2.eu
Open collaboration
• LAF/GrAF is a recently released ISO standard
• But it is not open (60 Euros to view the document)
• Not in RDF (the main requirements for any Semantic Web tool)
• Large frameworks tend to only be “inward” compatible
• UIMA advocates say: “Why don't you just use UIMA?”
• Gate advocates: “Integrate it into GATE!”
• Generally, a large time investment and lock-in
Problem analysis
BIS – 2013/04/15 – Page 11 http://lod2.eu
Summary:
Hardly any reusability
• Free software (as in free beer), but no open licenses
• No standards and no mappings
• Integration is hard-wired (you have to write software)
Problem analysis
BIS – 2013/04/15 – Page 12 http://lod2.eu
• Definition for text normalization + URI Schemes (give URIs to Strings)
• NIF Core Ontology: default vocabulary for most often used annotations
• Predefined modules for most use cases
• Infrastructure
• for open collaboration / discussion
• persistent hosting
• validation and demo services
• Reference implementation
• Data conversion
NIF Overview
BIS – 2013/04/15 – Page 13 http://lod2.eu
Text Normalization + URI Schemes
BIS – 2013/04/15 – Page 14 http://lod2.eu
Text Normalization + URI Schemes
NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
NIF 2.0 uses RFC 5147 as base form:
http://www.w3.org/DesignIssues/LinkedData.html#char=717,729
User extensions possible:
http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme
(but you have to link to documentation on how it was created)
BIS – 2013/04/15 – Page 15 http://lod2.eu
As a Web Service
curl
--data-urlencode prefix="http://prefix.given.by/theClient#"
--data-urlencode input="[...]"
(--data-urlencode source=”http://www.w3.org/DesignIssues/LinkedData.html”)
http://nlp2rdf.lod2.eu/demo/NIFStanfordCore
The new namespace is http://persistence.uni-leipzig.org/nlp2rdf/nif-core#
BIS – 2013/04/15 – Page 16 http://lod2.eu
Ontologies:
• NIF Core Ontology (URI Scheme, String, Context, but also Token, Sentence,
lemma, stem, etc. ) for often used annotations.
• Simple Error Ontology to describe errors (fatal, message, timestamp)
• Vocabulary Modules for each purpose or ontology or project
Overview of Ontologies
BIS – 2013/04/15 – Page 17 http://lod2.eu
Each ontology consists of three sets of axioms:
- Terminology model (definitions)
- Inference model (especially transitivity)
- Validation model (consistency)
1) nif-core.ttl
2) nif-core-inf.ttl imports 1
3) nif-core-val.ttl imports 1 and 2
Logical Modularity
BIS – 2013/04/15 – Page 18 http://lod2.eu
NIF simple:
• Only one truth
• Easy to understand and to query
• Least amount of triples
NIF + Stanbol (Apache Project)
• Several ranked alternatives
• Provenance of annotations
• In collaboration with Apache Stanbol
Open Annotation (W3C group)
• Rich model
• Not only text, but everything (images)
Granularity Modularity
- More triples
- more complexity
- worse usability
- lossless
conversion
Well-defined conversions
between the different levels
- easier queries
- higher performance
- lossful conversion
BIS – 2013/04/15 – Page 19 http://lod2.eu
Strucural Interoperability:
- URI schemes provide normalization
- RDF provide graph data model
- OWL provides the logical model
Conceptual Interoperability
- NIF Core Ontology and mapping to most often used annotations, e.g. lemma,
stems
- Vocabulary Module to include other terminologies and ontologies
Interoperability
BIS – 2013/04/15 – Page 20 http://lod2.eu
• ITS 2.0
• FISE used in Apache Stanbol (IKS-EU Project)
• LAF/GrAF XML – ISO standard, recently published
• Fragment Identifiers by IETF and W3C
• Lemon ontology from Monnet EU Project
• NERD ontology from EURECOM and LinkedTV EU Project
• Xpointer/XPath URI scheme
• Open Annotation
• ISOCat
NIF 2.0 tries to be compatible to (Vocabulary Module)
BIS – 2013/04/15 – Page 21 http://lod2.eu
• Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst
• Russian TreeTagger :
http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform
• German STTS: http://purl.org/olia/stts.owl#VAPP
• English Penn: http://purl.org/olia/penn.owl#VBG
→ all map to http://purl.org/olia/olia.owl#NonFiniteVerb
Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free
and open, CC-By)
Vocabulary Module: OLiA
BIS – 2013/04/15 – Page 22 http://lod2.eu
NIF can be extended by Vocabulary Modules
OliA
http://purl.org/olia
Conceptual Interoperability
BIS – 2013/04/15 – Page 23 http://lod2.eu
• Java-Maven implementation
• PHP implementation
• Reference implementations: DBpedia Spotlight, Stanford Parser, Korean POS
tagger, Keyword Search
• Wiki: http://wiki.nlp2rdf.org
• Validators
• Code generators (convert vocabulary modules to code stubs)
• NIF is free and open (CC-0 / CC-BY / Apache)
• All ontologies will be hosted persistently by University Leipzig
•http://persistence.uni-leipzig.org/nlp2rdf/
NIF 2.0 Infrastructure for adoption
BIS – 2013/04/15 – Page 24 http://lod2.eu
• Huge collection of use cases
• e.g. Ali wants to exchange different NLP service for RDFace
• LOD2 from Wolters Kluwer
• A selection will be implemented. Assumption:
• NIF is good, if it fulfills many use cases
Evaluation 1
BIS – 2013/04/15 – Page 25 http://lod2.eu
• There are about 10 to 20 third party implementations
Evaluation 2
BIS – 2013/04/15 – Page 26 http://lod2.eu
Analysis of existing frameworks and formats. Criteria:
• Convertability (Adequacy)
• Do the graph models match?
• Coverage
• Quantitative analysis of used annotations
• Does NIF Core provide terms for the most common annotations, are
there any gaps?
Evaluation 3
BIS – 2013/04/15 – Page 27 http://lod2.eu
Data Conversion
BIS – 2013/04/15 – Page 28 http://lod2.eu
Data Conversion
BIS – 2013/04/15 – Page 29 http://lod2.eu
Data Conversion
Data is available as
free, open, interoperable (FOI) language resources at
http://linguistics.okfn.org/resources/llod/
(work in progress)
BIS – 2013/04/15 – Page 30 http://lod2.eu
Project has a very good impact:
• Many adopters
• Industrial uptake
• Inclusion in a W3C standard for ITS 2.0:
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html
• Several projects involved as stakeholders (LOD2, Monnet, ...)
• Several motivated open-source developers
• Funding is coming in
Critical judgement
BIS – 2013/04/15 – Page 31 http://lod2.eu
Scientific merit ?
• provides scientific infrastructure
• Easier to write and combine software
• Free, open, interoperable (FOI) language resources
• Free, open NLP test benchmarks (Future work)
• What part is scientific and what part is community work and negotiation?
• No progress in state of the art in NLP methods, yet
• Difficult to judge were to put the emphasis on. Lot of “soft evaluation”
topics, no key performance indicators(KPI) .
Critical judgement
BIS – 2013/04/15 – Page 32 http://lod2.eu
• 2011: Open Knowledge Conference
• 2012: Workshop and book “Linked Data in Linguistics”
• 2012: Linked Data Cup @ I-Semantics
• 2012: Web of Linked Entities @ ISWC
• 2012: MLODE@ Sabre
• 2013: Semantic Web Journal: Special Issue on Multilingual Linked Open Data
(MLOD)
• Future work: DBpedia & NLP @ ISWC 2013
Conference + Workshops + Proceedings
BIS – 2013/04/15 – Page 33 http://lod2.eu
Thanks for your attention

Contenu connexe

Tendances

LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Creating Knowledge out of Interlinked Data
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
Sebastian Hellmann
 

Tendances (10)

LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
Datalift lod2-paris-24032011
Datalift lod2-paris-24032011Datalift lod2-paris-24032011
Datalift lod2-paris-24032011
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 

En vedette (6)

PhD Progress, July 5th 2012
PhD Progress, July 5th 2012PhD Progress, July 5th 2012
PhD Progress, July 5th 2012
 
Progression Points
Progression Points Progression Points
Progression Points
 
2nd year PHD Report
2nd year PHD Report2nd year PHD Report
2nd year PHD Report
 
My thesis progress presentation
My thesis progress presentationMy thesis progress presentation
My thesis progress presentation
 
1 Year PhD Presentation
1 Year PhD Presentation1 Year PhD Presentation
1 Year PhD Presentation
 
PhD Annual Report first page & detailed table of contents
PhD Annual Report first page & detailed table of contentsPhD Annual Report first page & detailed table of contents
PhD Annual Report first page & detailed table of contents
 

Similaire à NIF 2.0 Phd thesis intermediate report

Integrating NLP using Linked Data
Integrating NLP using Linked DataIntegrating NLP using Linked Data
Integrating NLP using Linked Data
Sebastian Hellmann
 

Similaire à NIF 2.0 Phd thesis intermediate report (20)

Integrating NLP using Linked Data
Integrating NLP using Linked DataIntegrating NLP using Linked Data
Integrating NLP using Linked Data
 
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web  NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
 
F/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesF/LOSS in Norwegian libraries
F/LOSS in Norwegian libraries
 
VRA 2014 VRA Core Unbound, Arnold
VRA 2014 VRA Core Unbound, ArnoldVRA 2014 VRA Core Unbound, Arnold
VRA 2014 VRA Core Unbound, Arnold
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web Annotation
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting Information
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
 
Briefing on OASIS XLIFF OMOS TC 20160121
Briefing on OASIS XLIFF OMOS TC 20160121Briefing on OASIS XLIFF OMOS TC 20160121
Briefing on OASIS XLIFF OMOS TC 20160121
 
Semantic web-and-public-data - en
Semantic web-and-public-data - enSemantic web-and-public-data - en
Semantic web-and-public-data - en
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 

Plus de Sebastian Hellmann

Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Sebastian Hellmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
Sebastian Hellmann
 

Plus de Sebastian Hellmann (12)

KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Tool collection as linkeddata
Tool collection as linkeddataTool collection as linkeddata
Tool collection as linkeddata
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

NIF 2.0 Phd thesis intermediate report

  • 1. BIS – 2013/04/15 – Page 1 http://lod2.eu Creating Knowledge out of Interlinked Data LOD2 Presentation . 02.09.2010 . Page http://lod2.eu AKSW, Universität Leipzig Sebastian Hellmann PhD thesis intermediate report NLP Interchange Format (NIF) 2.0 http://nlp2rdf.org http://lod2.eu http://slideshare.net/kurzum DISCLAIMER: this presentation is work in progress, example RDF is outdated
  • 2. BIS – 2013/04/15 – Page 2 http://lod2.eu NLP Interchange Format 2.0
  • 3. BIS – 2013/04/15 – Page 3 http://lod2.eu The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • NIF 2.0 will be published in 6-8 weeks • Highly probable to become the de-facto standard for modelling RDF tool output in the NLP domain NLP Interchange Format 2.0
  • 4. BIS – 2013/04/15 – Page 4 http://lod2.eu Introduction Components have pre- and postconditions auto configuration theoretical possible, but in reality a lot of manual work
  • 5. BIS – 2013/04/15 – Page 5 http://lod2.eu Introduction Components have pre- and postconditions auto configuration theoretical possible, but in reality a lot of manual work Huge potential to save time and money at the interfaces
  • 6. BIS – 2013/04/15 – Page 6 http://lod2.eu Core problems: 1. Too much heterogeneity 2. Almost no standards available 3. No open collaboration 4. Difficult and large domain Problem analysis
  • 7. BIS – 2013/04/15 – Page 7 http://lod2.eu Technical heterogeneity • Technologies: XML, Relational Databases, CSV, DOC, PDF • Similar to other domains • Formats: Negra, CoNLL, GrAF, Paula, CAS (UIMA), Penn • Virtually each tool has implemented readers for the 5-6 formats + its own serialization • Programming languages: Java, Python, ... • Java has predominance Problem analysis
  • 8. BIS – 2013/04/15 – Page 8 http://lod2.eu Domain heterogeneity • Multilingualism • Over 100 part of speech tags (several for each language) • No open mappings exist • About 20 different tasks listed on: http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP • Natural language is a difficult topic: • The roulette dealer siad: “Rien ne va plus!” – 8 words, 4 French, 4 English, one spelling mistake, impossible to decide the language of the whole. • Ban on Nude Dancing on Governor's Desk Problem analysis
  • 9. BIS – 2013/04/15 – Page 9 http://lod2.eu Problem analysis
  • 10. BIS – 2013/04/15 – Page 10 http://lod2.eu Open collaboration • LAF/GrAF is a recently released ISO standard • But it is not open (60 Euros to view the document) • Not in RDF (the main requirements for any Semantic Web tool) • Large frameworks tend to only be “inward” compatible • UIMA advocates say: “Why don't you just use UIMA?” • Gate advocates: “Integrate it into GATE!” • Generally, a large time investment and lock-in Problem analysis
  • 11. BIS – 2013/04/15 – Page 11 http://lod2.eu Summary: Hardly any reusability • Free software (as in free beer), but no open licenses • No standards and no mappings • Integration is hard-wired (you have to write software) Problem analysis
  • 12. BIS – 2013/04/15 – Page 12 http://lod2.eu • Definition for text normalization + URI Schemes (give URIs to Strings) • NIF Core Ontology: default vocabulary for most often used annotations • Predefined modules for most use cases • Infrastructure • for open collaboration / discussion • persistent hosting • validation and demo services • Reference implementation • Data conversion NIF Overview
  • 13. BIS – 2013/04/15 – Page 13 http://lod2.eu Text Normalization + URI Schemes
  • 14. BIS – 2013/04/15 – Page 14 http://lod2.eu Text Normalization + URI Schemes NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 NIF 2.0 uses RFC 5147 as base form: http://www.w3.org/DesignIssues/LinkedData.html#char=717,729 User extensions possible: http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme (but you have to link to documentation on how it was created)
  • 15. BIS – 2013/04/15 – Page 15 http://lod2.eu As a Web Service curl --data-urlencode prefix="http://prefix.given.by/theClient#" --data-urlencode input="[...]" (--data-urlencode source=”http://www.w3.org/DesignIssues/LinkedData.html”) http://nlp2rdf.lod2.eu/demo/NIFStanfordCore The new namespace is http://persistence.uni-leipzig.org/nlp2rdf/nif-core#
  • 16. BIS – 2013/04/15 – Page 16 http://lod2.eu Ontologies: • NIF Core Ontology (URI Scheme, String, Context, but also Token, Sentence, lemma, stem, etc. ) for often used annotations. • Simple Error Ontology to describe errors (fatal, message, timestamp) • Vocabulary Modules for each purpose or ontology or project Overview of Ontologies
  • 17. BIS – 2013/04/15 – Page 17 http://lod2.eu Each ontology consists of three sets of axioms: - Terminology model (definitions) - Inference model (especially transitivity) - Validation model (consistency) 1) nif-core.ttl 2) nif-core-inf.ttl imports 1 3) nif-core-val.ttl imports 1 and 2 Logical Modularity
  • 18. BIS – 2013/04/15 – Page 18 http://lod2.eu NIF simple: • Only one truth • Easy to understand and to query • Least amount of triples NIF + Stanbol (Apache Project) • Several ranked alternatives • Provenance of annotations • In collaboration with Apache Stanbol Open Annotation (W3C group) • Rich model • Not only text, but everything (images) Granularity Modularity - More triples - more complexity - worse usability - lossless conversion Well-defined conversions between the different levels - easier queries - higher performance - lossful conversion
  • 19. BIS – 2013/04/15 – Page 19 http://lod2.eu Strucural Interoperability: - URI schemes provide normalization - RDF provide graph data model - OWL provides the logical model Conceptual Interoperability - NIF Core Ontology and mapping to most often used annotations, e.g. lemma, stems - Vocabulary Module to include other terminologies and ontologies Interoperability
  • 20. BIS – 2013/04/15 – Page 20 http://lod2.eu • ITS 2.0 • FISE used in Apache Stanbol (IKS-EU Project) • LAF/GrAF XML – ISO standard, recently published • Fragment Identifiers by IETF and W3C • Lemon ontology from Monnet EU Project • NERD ontology from EURECOM and LinkedTV EU Project • Xpointer/XPath URI scheme • Open Annotation • ISOCat NIF 2.0 tries to be compatible to (Vocabulary Module)
  • 21. BIS – 2013/04/15 – Page 21 http://lod2.eu • Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst • Russian TreeTagger : http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform • German STTS: http://purl.org/olia/stts.owl#VAPP • English Penn: http://purl.org/olia/penn.owl#VBG → all map to http://purl.org/olia/olia.owl#NonFiniteVerb Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free and open, CC-By) Vocabulary Module: OLiA
  • 22. BIS – 2013/04/15 – Page 22 http://lod2.eu NIF can be extended by Vocabulary Modules OliA http://purl.org/olia Conceptual Interoperability
  • 23. BIS – 2013/04/15 – Page 23 http://lod2.eu • Java-Maven implementation • PHP implementation • Reference implementations: DBpedia Spotlight, Stanford Parser, Korean POS tagger, Keyword Search • Wiki: http://wiki.nlp2rdf.org • Validators • Code generators (convert vocabulary modules to code stubs) • NIF is free and open (CC-0 / CC-BY / Apache) • All ontologies will be hosted persistently by University Leipzig •http://persistence.uni-leipzig.org/nlp2rdf/ NIF 2.0 Infrastructure for adoption
  • 24. BIS – 2013/04/15 – Page 24 http://lod2.eu • Huge collection of use cases • e.g. Ali wants to exchange different NLP service for RDFace • LOD2 from Wolters Kluwer • A selection will be implemented. Assumption: • NIF is good, if it fulfills many use cases Evaluation 1
  • 25. BIS – 2013/04/15 – Page 25 http://lod2.eu • There are about 10 to 20 third party implementations Evaluation 2
  • 26. BIS – 2013/04/15 – Page 26 http://lod2.eu Analysis of existing frameworks and formats. Criteria: • Convertability (Adequacy) • Do the graph models match? • Coverage • Quantitative analysis of used annotations • Does NIF Core provide terms for the most common annotations, are there any gaps? Evaluation 3
  • 27. BIS – 2013/04/15 – Page 27 http://lod2.eu Data Conversion
  • 28. BIS – 2013/04/15 – Page 28 http://lod2.eu Data Conversion
  • 29. BIS – 2013/04/15 – Page 29 http://lod2.eu Data Conversion Data is available as free, open, interoperable (FOI) language resources at http://linguistics.okfn.org/resources/llod/ (work in progress)
  • 30. BIS – 2013/04/15 – Page 30 http://lod2.eu Project has a very good impact: • Many adopters • Industrial uptake • Inclusion in a W3C standard for ITS 2.0: http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html • Several projects involved as stakeholders (LOD2, Monnet, ...) • Several motivated open-source developers • Funding is coming in Critical judgement
  • 31. BIS – 2013/04/15 – Page 31 http://lod2.eu Scientific merit ? • provides scientific infrastructure • Easier to write and combine software • Free, open, interoperable (FOI) language resources • Free, open NLP test benchmarks (Future work) • What part is scientific and what part is community work and negotiation? • No progress in state of the art in NLP methods, yet • Difficult to judge were to put the emphasis on. Lot of “soft evaluation” topics, no key performance indicators(KPI) . Critical judgement
  • 32. BIS – 2013/04/15 – Page 32 http://lod2.eu • 2011: Open Knowledge Conference • 2012: Workshop and book “Linked Data in Linguistics” • 2012: Linked Data Cup @ I-Semantics • 2012: Web of Linked Entities @ ISWC • 2012: MLODE@ Sabre • 2013: Semantic Web Journal: Special Issue on Multilingual Linked Open Data (MLOD) • Future work: DBpedia & NLP @ ISWC 2013 Conference + Workshops + Proceedings
  • 33. BIS – 2013/04/15 – Page 33 http://lod2.eu Thanks for your attention