SlideShare une entreprise Scribd logo
1  sur  23
www.isocat.org




                         Linking to
                 Linguistic Data Categories
                          in ISOcat

                                              Menzo Windhouwera, Sue Ellen Wrightb
                      aThe   Language Archive - MPI for Psycholinguistics, bKent State University
                                        menzo.windhouwer@mpi.nl, sellenwright@gmail.com
www.isocat.org

                                 Outline
     • A short introduction to data categories
           – the ISOcat registry
     • How to refer to ISOcat data categories
           – using PIDs
           – from XML and RDF resources
     • Fine-tuning (personal) relationships between
       data categories
           – the RELcat registry
     • Status
     7 -9 March 2012        Linked Data in Linguistics - DGfS 2012   2
www.isocat.org

          ISOcat: a Data Category Registry
     • An implementation of ISO 12620:2009
           – Terminology and other content and language resources —
             Specification of data categories and management of a Data
             Category Registry for language resources
                  • Successor to ISO 12620:1999 which contained a hardcoded list of
                    Data Categories
     • A data category
           – is the result of the specification of a given data field
           – an elementary descriptor in a linguistic structure or an
             annotation scheme


     7 -9 March 2012                Linked Data in Linguistics - DGfS 2012            3
www.isocat.org

                       Data Category example
     • Data category: /Grammatical gender/
           – Administrative part:
                  • Identifier: grammaticalGender
                  • PID: http://www.isocat.org/datcat/DC-1297
           – Descriptive part:
                  • English definition: Category based on (depending on languages)
                    the natural distinction between sex and formal criteria.
                  • French definition: Catégorie fondée (selon la langue) sur la
                    distinction naturelle entre les sexes ou d'autres critères formels.
           – Conceptual domain:
                  • Morposyntax conceptual domain:
                    /masculine/, /feminine/, /neuter/, /common/
           – Linguistic part:
                  • French conceptual domain: /masculine/, /feminine/

     7 -9 March 2012                 Linked Data in Linguistics - DGfS 2012               4
www.isocat.org

                         Data Category types
complex: open                        closed                                 constrained

        writtenForm             grammaticalGender                                email


            string                                        string                 string
                                                                            Constraint: .+@.+

                       neuter                                   feminine


simple:                             masculine




     7 -9 March 2012               Linked Data in Linguistics - DGfS 2012                       5
www.isocat.org

                       Data Category types
container:                                     lexicon


                       language               alphabet                        entry



                       japanese                   ipa                        lemma



                                                                           writtenForm




     7 -9 March 2012              Linked Data in Linguistics - DGfS 2012                 6
www.isocat.org

                  Data Category relationships
     • Value domain membership
     • Subsumption relationships                                   partOfSpeech

       between simple data                                                        string
       categories (legacy)
                                                                     pronoun
     • Relationships between
       complex/container data
       categories are not stored in                                  personal
       the DCR                                                       pronoun




     7 -9 March 2012      Linked Data in Linguistics - DGfS 2012                           7
www.isocat.org

                       ISOcat: a Data Category Registry
     • You can:
           – Find Data Categories relevant for your resources and embed references to
             them so the semantics of (parts of) your resources are made explicit
                  • This can be supported by tools you use, e.g., ELAN, LEXUS and the CMDI Component Editor
                    directly interact with ISOcat
           – Interact with Data Category owners to improve (the coverage of) their Data
             Categories
           – Create (together with others) new Data Categories and/or selections needed
             for your resources and share those
           – Submit (your) Data Categories for standardization
                  • ISOcat is the DCR for ISO TC 37


           – Free of charge
           – Grass roots approach
                                                      www.isocat.org

     7 -9 March 2012                        Linked Data in Linguistics - DGfS 2012                            8
www.isocat.org

                       The usage of data categories?
                 wordOrder               grammaticalGender

    Language           BWO     genders
                                                                       Lexicon


                                                                        1..*

    A (schema for a) typological database
                                                                   Lexical Entry             partOfSpeech


            writtenForm       Lemma
                                                          1..*                       0..*


                                                           Form                  Sense
           writtenForm
                                                                                     0..*
   grammaticalGender         Word Form
             lexicalType
                                                                                 A (schema for a) lexicon
     7 -9 March 2012               Linked Data in Linguistics - DGfS 2012                             9
www.isocat.org

                 Referencing Data Categories
     • Each Data Category should be uniquely identifiable
           – Ambiguity: different domains use the same term but mean different
             ‘things’
           – Semantic rot: even in the same domain the meaning of a term
             changes over time
           – Persistence: for archived resources Data Category references should
             still be resolvable and point to the specification as it was at/close to
             time of creation

     • Persistent IDentifiers
           – ISO 24619:2011 Language resource management - Persistent
             identification and sustainable access (PISA)
           – ISOcat uses ‘cool URIs’:
               • http://www.isocat.org/datcat/DC-1297 (/grammaticalGender/)

     7 -9 March 2012               Linked Data in Linguistics - DGfS 2012               10
www.isocat.org

           XML – DC Reference vocabulary
     • ISO 12620:2009 is rather XML oriented
           – why not RDF?
                  • history
                       – terminology management is a separate tradition from Semantic Web/Linked Data
                       – DCIF -> GMT (TMF) -> own XML vocabulary based on UML data model
                  • but there is an RDF representation
                       – needs to cover more of the data model
     • Annex A provides the DC reference vocabulary
           – dcr:datcat to link to any DC
           – dcr:valueDatcat to link to a simple DC
                                        www.isocat.org/12620/
     • Preferably annotate a schema, e.g., a Relax NG or W3C XML Schema
       documents
     • XML vocabularies might also provide their own means to link to a data
       category
           – TBX XCS, TEI ODD, CMDI, ..., TEI (?)
     • (Semantics by reference)

     7 -9 March 2012                       Linked Data in Linguistics - DGfS 2012                       11
www.isocat.org

                                  LMF Example
     <LexicalResource xmlns:dcr="http://www.isocat.org/ns/dcr">
       <GlobalInformation>
         <feat att="languageCoding" dcr:datcat=".../DC-2008" val="ISO 639-3"/>
       </GlobalInformation>
       <Lexicon>
         <feat att="language" dcr:datcat=".../DC-1969" val="eng"/>
         <LexicalEntry>
           <feat att="partOfSpeech" dcr:datcat=".../DC-1345"
               val="commonNoun" dcr:valueDatcat=".../DC-1256"/>
           <Lemma>
               <feat att="writtenForm" dcr:datcat=".../DC-1836"
                 val="clergyman"/>
           </Lemma>
           ...
           <WordForm>
               <feat att="writtenForm" dcr:datcat=".../DC-1836“ val="clergymen"/>
               <feat att="grammaticalNumber" dcr:datcat=".../DC-1298"
                 val="plural" dcr:valueDatcat=".../DC-1354"/>
           </WordForm></LexicalEntry></Lexicon></LexicalResource>
     7 -9 March 2012                  Linked Data in Linguistics - DGfS 2012        12
www.isocat.org

              RDF – DC annotation property
     • The dcr:datcat RDF annotation property mimics the DC
       Reference vocabulary
           – minimizes impact, i.e., allows the data model to use its own terminology
           – can be tuned using OWL (2) equivalentClass, equivalentPropery or sameAs
           – problem: annotating literals with simple Data Categories (names can be
             ambiguous)

     @prefix dcr: <http://www.isocat.org/ns/dcr.rdf#> .

     :headword dcr:datcat <http://www.isocat.org/datcat/DC-258> ;
              rdfs:label "head word"@en ;
              rdfs:comment "A lemma heading a dictionary entry."@en .

     :partOfSpeech dcr:datcat <http://www.isocat.org/datcat/DC-396> ;
               rdfs:label "part of speech"@en ;
               rdfs:comment "A category assigned to a word based on its
                           grammatical and semantic properties."@en .

     7 -9 March 2012                Linked Data in Linguistics - DGfS 2012              13
www.isocat.org

        RDF – directly use Data Category PIDs
     • Container Data Categories as RDF classes
     • Complex Data Categories as RDF properties
     • Simple Data Categories
           – as RDF literals
                  • problem: names can be ambiguous
           – as RDF classes
                  • (GrAF example <f name=“” val=“.../DC-3581”/> vs <f name=“” val=“plural noun”
                    dcr:datcat=“.../DC-3581”/>)

     @prefix cat: <http://www.isocat.org/datcat/> .

     cat:DC-258 rdfs:label "head word"@en ;
                rdfs:comment "A lemma heading a dictionary entry."@en .

     cat:DC-396 rdfs:label "part of speech"@en ;
                rdfs:comment "A category assigned to a word based on its
                             grammatical and semantic properties."@en .



     7 -9 March 2012                     Linked Data in Linguistics - DGfS 2012                    14
www.isocat.org

                       Data Category Relations
     • In the linked data world its natural to
       have, next to structural, ontological
       relationships
           – RDFS, OWL (2), SKOS, ...
     • But other resource/schema formats lack these
       features
     • Relationships between Data Categories (also
       across vocabularies) are important for
       federated search, i.e., to find semantically
       related resources in another archive
     7 -9 March 2012         Linked Data in Linguistics - DGfS 2012   15
www.isocat.org

                       RELcat a Relation Registry
     • Stores relationships among Data Categories and also with ‘other’ concept
       registries
           – Dublin Core, OLAC, GOLD
           – (OLiA, OntoLingAnnot)
           – relationships can be the individual view of a (group of) linguist(s)
                  • RELcat is a quad store (graph, subject, predicate, object)
     • Based on a ‘private’ relation type taxonomy so existing relationships
       specified in other vocabularies can easily be loaded
           – OWL (2), SKOS
           – normalized RELcat queries
     • The aim is to support various levels of traversing the semantic
       network, not formal reasoning
           – conflicting (theoretical) views
                  • (parameters of variation)
           – but within known combination of sets reasoning may well be possible
           – also targets semantic search outside of the RDF domain


     7 -9 March 2012                       Linked Data in Linguistics - DGfS 2012   16
www.isocat.org

                        Relation type taxonomy
     1.     related
          1.      same as (a symmetric and transitive relationship)
          2.      almost same as (a symmetric relationship)
          3.      broader than (a transitive relationship and the inverse of the
                  ’narrower than’ relationship)
                 1.    superclass of (a transitive relationship and the inverse of the ’subclass of’
                       relationship)
                 2.    has part (a transitive relationship and the inverse of the ’part of’
                       relationship)
                       1.   has direct part (the inverse of the ’direct part of’ relationship)
          4.      narrower than (a transitive relationship and the inverse of the
                  ’broader than’ relationship)
                 1.    sub class of (a transitive relationship and the inverse of the ’super class of’
                       relationship)
                 2.    part of (a transitive relationship and the inverse of the ’has part’
                       relationship)
                       1.   direct part of (the inverse of the ’has direct part’ relationship)



     7 -9 March 2012                        Linked Data in Linguistics - DGfS 2012                       17
www.isocat.org

                             Relation set
     @prefix relcat : <http://www.isocat.org/relcat/set/> .
     @prefix rel : <http://www.isocat.org/relcat/relations#> .
     @prefix dc : <http://purl.org/dc/elements/1.1/> .
     @prefix cat : <http://www.isocat.org/datcat/> .

     relcat:cmdi {
             cat:DC-2573 rel:sameAs dc:identifier .
             cat:DC-2482 rel:sameAs dc:language .
             ...
             cat:DC-2556 rel:subClassOf dc:contributor .
             cat:DC-2502 rel:subClassOf dc:coverage .
     }


     7 -9 March 2012          Linked Data in Linguistics - DGfS 2012   18
www.isocat.org

                                   Extension
     1. related
          1. same as (a symmetric and transitive relationship)
                 1.    owl:equivalentClass
                 2.    owl:equivalentProperty
                 3.    owl:sameAs
                 4.    skos:exactMatch
          2. almost same as (a symmetric relationship)
                 1. skos:closeMatch



     7 -9 March 2012              Linked Data in Linguistics - DGfS 2012   19
www.isocat.org

                       Normalized query
     PREFIX rel:<http://www.isocat.org/relcat/relations#>
     PREFIX cat:<http://www.isocat.org/datcat/>

     SELECT ?c WHERE { cat:DC-2482 rel:sameAs ?c . }

     • Finds the same-as clique for /languageID/ (DC-2482)
       specified in any vocabulary, e.g., RELcat (CMDI) for
       Dublin Core and annotated OWL for GOLD


     7 -9 March 2012      Linked Data in Linguistics - DGfS 2012   20
www.isocat.org

                          Semantic network
   Linguistic resource (schema)            Linguistic knowledge base
                                                                                      Data categories
                                                                                      Containers
                                                                                      Concepts
                                                                                              Relation




      Schema Registry - SCHEMAcat




   Data Category Registry - ISOcat       Concept Registry                     Relation Registry - RELcat
     7 -9 March 2012                 Linked Data in Linguistics - DGfS 2012                          21
www.isocat.org

                                   Status
     • ISOcat: in production, mainly lacking in
       standardization
           – http://www.isocat.org/
     • RELcat: alpha version gives read only access to
       some relation sets, lacking some reasoning
       and UI
           – http://lux13.mpi.nl/isocat/relcat/
     • SCHEMAcat: design phase

     7 -9 March 2012        Linked Data in Linguistics - DGfS 2012   22
www.isocat.org




                       Thank you for your attention!

                                   Visit
                               www.isocat.org

                                 Questions?
                            www.isocat.org/forum/
                               isocat@mpi.nl



     7 -9 March 2012           Linked Data in Linguistics - DGfS 2012   23

Contenu connexe

Tendances

Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Mustafa Jarrar
 
Introduction to Dublin Core Metadata
Introduction to Dublin Core MetadataIntroduction to Dublin Core Metadata
Introduction to Dublin Core MetadataHannes Ebner
 
Role of Ontologies in Semantic Digital Libraries
Role of Ontologies in Semantic Digital LibrariesRole of Ontologies in Semantic Digital Libraries
Role of Ontologies in Semantic Digital LibrariesSebastian Ryszard Kruk
 
Semantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital LibrariesSemantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital LibrariesNikesh Narayanan
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 
JeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital LibraryJeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital LibrarySebastian Ryszard Kruk
 
The Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked DataThe Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked DataRichard Urban
 
Neo4j GraphTour New YorkOntologies and Knowledge Graphs
Neo4j GraphTour New YorkOntologies and Knowledge GraphsNeo4j GraphTour New YorkOntologies and Knowledge Graphs
Neo4j GraphTour New YorkOntologies and Knowledge GraphsNeo4j
 
Pal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oraclePal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oracleMustafa Jarrar
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataDave Lewis
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsMarco Grassi
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - FactforgeEuropean Data Forum
 
Linked Open Vocabularies
Linked Open VocabulariesLinked Open Vocabularies
Linked Open VocabulariesGiorgia Lodi
 

Tendances (19)

Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)
 
Introduction to Dublin Core Metadata
Introduction to Dublin Core MetadataIntroduction to Dublin Core Metadata
Introduction to Dublin Core Metadata
 
Role of Ontologies in Semantic Digital Libraries
Role of Ontologies in Semantic Digital LibrariesRole of Ontologies in Semantic Digital Libraries
Role of Ontologies in Semantic Digital Libraries
 
Semantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital LibrariesSemantic Web Technologies For Digital Libraries
Semantic Web Technologies For Digital Libraries
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
JeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital LibraryJeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital Library
 
Semantic Web in Action
Semantic Web in ActionSemantic Web in Action
Semantic Web in Action
 
Semantic Digital Libraries
Semantic Digital LibrariesSemantic Digital Libraries
Semantic Digital Libraries
 
The Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked DataThe Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked Data
 
JeromeDL Tutorial
JeromeDL TutorialJeromeDL Tutorial
JeromeDL Tutorial
 
Extended WordNet
Extended WordNetExtended WordNet
Extended WordNet
 
NIF - NLP Interchange Format
NIF - NLP Interchange FormatNIF - NLP Interchange Format
NIF - NLP Interchange Format
 
Neo4j GraphTour New YorkOntologies and Knowledge Graphs
Neo4j GraphTour New YorkOntologies and Knowledge GraphsNeo4j GraphTour New YorkOntologies and Knowledge Graphs
Neo4j GraphTour New YorkOntologies and Knowledge Graphs
 
Pal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oraclePal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oracle
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization data
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
 
Linked Open Vocabularies
Linked Open VocabulariesLinked Open Vocabularies
Linked Open Vocabularies
 

Similaire à ISOcat Data Category Registry

Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics OntologyHammad Afzal
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceMarin Dimitrov
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetAndrea Nuzzolese
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebMarin Dimitrov
 
Opening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataOpening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataGilbert Paquette
 
An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...
An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...
An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...David Massart
 
Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Andries_vanRenssen
 
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Menzo Windhouwer
 
Database Management Systems - Management Information System
Database Management Systems - Management Information SystemDatabase Management Systems - Management Information System
Database Management Systems - Management Information SystemNijaz N
 
Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Gordon Dunsire
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadatarobin fay
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Dr. Haxel Consult
 
The Mysteries of Metadata
The Mysteries of MetadataThe Mysteries of Metadata
The Mysteries of MetadataAmit Sheth
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) robin fay
 
Services semantic technology_terminology
Services semantic technology_terminologyServices semantic technology_terminology
Services semantic technology_terminologyTenforce
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep WebSamiul Hoque
 

Similaire à ISOcat Data Category Registry (20)

Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics Ontology
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Opening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataOpening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked data
 
An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...
An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...
An Introduction to the IMS Learning Object Discovery and Exchange (LODE) Spec...
 
Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003
 
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
 
Database Management Systems - Management Information System
Database Management Systems - Management Information SystemDatabase Management Systems - Management Information System
Database Management Systems - Management Information System
 
Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadata
 
Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
394 wade word2007-ssp2008
394 wade word2007-ssp2008394 wade word2007-ssp2008
394 wade word2007-ssp2008
 
The Mysteries of Metadata
The Mysteries of MetadataThe Mysteries of Metadata
The Mysteries of Metadata
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries)
 
Services semantic technology_terminology
Services semantic technology_terminologyServices semantic technology_terminology
Services semantic technology_terminology
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep Web
 

Plus de Menzo Windhouwer

Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureMenzo Windhouwer
 
ISOcat and RELcat, two cooperating semantic registries
	ISOcat and RELcat, two cooperating semantic registries	ISOcat and RELcat, two cooperating semantic registries
ISOcat and RELcat, two cooperating semantic registriesMenzo Windhouwer
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Menzo Windhouwer
 
A CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesA CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesMenzo Windhouwer
 
What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?Menzo Windhouwer
 
ISOcat: a short introduction
ISOcat: a short introductionISOcat: a short introduction
ISOcat: a short introductionMenzo Windhouwer
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Menzo Windhouwer
 

Plus de Menzo Windhouwer (11)

CMD2RDF
CMD2RDFCMD2RDF
CMD2RDF
 
Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN Infrastructure
 
ISOcat and RELcat, two cooperating semantic registries
	ISOcat and RELcat, two cooperating semantic registries	ISOcat and RELcat, two cooperating semantic registries
ISOcat and RELcat, two cooperating semantic registries
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.
 
A CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesA CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web Services
 
What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?
 
ISOcat to LMF to TEI
ISOcat to LMF to TEIISOcat to LMF to TEI
ISOcat to LMF to TEI
 
The ISO-DCR
The ISO-DCRThe ISO-DCR
The ISO-DCR
 
Use of ISOcat within CMDI
Use of ISOcat within CMDIUse of ISOcat within CMDI
Use of ISOcat within CMDI
 
ISOcat: a short introduction
ISOcat: a short introductionISOcat: a short introduction
ISOcat: a short introduction
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.
 

ISOcat Data Category Registry

  • 1. www.isocat.org Linking to Linguistic Data Categories in ISOcat Menzo Windhouwera, Sue Ellen Wrightb aThe Language Archive - MPI for Psycholinguistics, bKent State University menzo.windhouwer@mpi.nl, sellenwright@gmail.com
  • 2. www.isocat.org Outline • A short introduction to data categories – the ISOcat registry • How to refer to ISOcat data categories – using PIDs – from XML and RDF resources • Fine-tuning (personal) relationships between data categories – the RELcat registry • Status 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 2
  • 3. www.isocat.org ISOcat: a Data Category Registry • An implementation of ISO 12620:2009 – Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resources • Successor to ISO 12620:1999 which contained a hardcoded list of Data Categories • A data category – is the result of the specification of a given data field – an elementary descriptor in a linguistic structure or an annotation scheme 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 3
  • 4. www.isocat.org Data Category example • Data category: /Grammatical gender/ – Administrative part: • Identifier: grammaticalGender • PID: http://www.isocat.org/datcat/DC-1297 – Descriptive part: • English definition: Category based on (depending on languages) the natural distinction between sex and formal criteria. • French definition: Catégorie fondée (selon la langue) sur la distinction naturelle entre les sexes ou d'autres critères formels. – Conceptual domain: • Morposyntax conceptual domain: /masculine/, /feminine/, /neuter/, /common/ – Linguistic part: • French conceptual domain: /masculine/, /feminine/ 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 4
  • 5. www.isocat.org Data Category types complex: open closed constrained writtenForm grammaticalGender email string string string Constraint: .+@.+ neuter feminine simple: masculine 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 5
  • 6. www.isocat.org Data Category types container: lexicon language alphabet entry japanese ipa lemma writtenForm 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 6
  • 7. www.isocat.org Data Category relationships • Value domain membership • Subsumption relationships partOfSpeech between simple data string categories (legacy) pronoun • Relationships between complex/container data categories are not stored in personal the DCR pronoun 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 7
  • 8. www.isocat.org ISOcat: a Data Category Registry • You can: – Find Data Categories relevant for your resources and embed references to them so the semantics of (parts of) your resources are made explicit • This can be supported by tools you use, e.g., ELAN, LEXUS and the CMDI Component Editor directly interact with ISOcat – Interact with Data Category owners to improve (the coverage of) their Data Categories – Create (together with others) new Data Categories and/or selections needed for your resources and share those – Submit (your) Data Categories for standardization • ISOcat is the DCR for ISO TC 37 – Free of charge – Grass roots approach www.isocat.org 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 8
  • 9. www.isocat.org The usage of data categories? wordOrder grammaticalGender Language BWO genders Lexicon 1..* A (schema for a) typological database Lexical Entry partOfSpeech writtenForm Lemma 1..* 0..* Form Sense writtenForm 0..* grammaticalGender Word Form lexicalType A (schema for a) lexicon 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 9
  • 10. www.isocat.org Referencing Data Categories • Each Data Category should be uniquely identifiable – Ambiguity: different domains use the same term but mean different ‘things’ – Semantic rot: even in the same domain the meaning of a term changes over time – Persistence: for archived resources Data Category references should still be resolvable and point to the specification as it was at/close to time of creation • Persistent IDentifiers – ISO 24619:2011 Language resource management - Persistent identification and sustainable access (PISA) – ISOcat uses ‘cool URIs’: • http://www.isocat.org/datcat/DC-1297 (/grammaticalGender/) 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 10
  • 11. www.isocat.org XML – DC Reference vocabulary • ISO 12620:2009 is rather XML oriented – why not RDF? • history – terminology management is a separate tradition from Semantic Web/Linked Data – DCIF -> GMT (TMF) -> own XML vocabulary based on UML data model • but there is an RDF representation – needs to cover more of the data model • Annex A provides the DC reference vocabulary – dcr:datcat to link to any DC – dcr:valueDatcat to link to a simple DC www.isocat.org/12620/ • Preferably annotate a schema, e.g., a Relax NG or W3C XML Schema documents • XML vocabularies might also provide their own means to link to a data category – TBX XCS, TEI ODD, CMDI, ..., TEI (?) • (Semantics by reference) 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 11
  • 12. www.isocat.org LMF Example <LexicalResource xmlns:dcr="http://www.isocat.org/ns/dcr"> <GlobalInformation> <feat att="languageCoding" dcr:datcat=".../DC-2008" val="ISO 639-3"/> </GlobalInformation> <Lexicon> <feat att="language" dcr:datcat=".../DC-1969" val="eng"/> <LexicalEntry> <feat att="partOfSpeech" dcr:datcat=".../DC-1345" val="commonNoun" dcr:valueDatcat=".../DC-1256"/> <Lemma> <feat att="writtenForm" dcr:datcat=".../DC-1836" val="clergyman"/> </Lemma> ... <WordForm> <feat att="writtenForm" dcr:datcat=".../DC-1836“ val="clergymen"/> <feat att="grammaticalNumber" dcr:datcat=".../DC-1298" val="plural" dcr:valueDatcat=".../DC-1354"/> </WordForm></LexicalEntry></Lexicon></LexicalResource> 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 12
  • 13. www.isocat.org RDF – DC annotation property • The dcr:datcat RDF annotation property mimics the DC Reference vocabulary – minimizes impact, i.e., allows the data model to use its own terminology – can be tuned using OWL (2) equivalentClass, equivalentPropery or sameAs – problem: annotating literals with simple Data Categories (names can be ambiguous) @prefix dcr: <http://www.isocat.org/ns/dcr.rdf#> . :headword dcr:datcat <http://www.isocat.org/datcat/DC-258> ; rdfs:label "head word"@en ; rdfs:comment "A lemma heading a dictionary entry."@en . :partOfSpeech dcr:datcat <http://www.isocat.org/datcat/DC-396> ; rdfs:label "part of speech"@en ; rdfs:comment "A category assigned to a word based on its grammatical and semantic properties."@en . 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 13
  • 14. www.isocat.org RDF – directly use Data Category PIDs • Container Data Categories as RDF classes • Complex Data Categories as RDF properties • Simple Data Categories – as RDF literals • problem: names can be ambiguous – as RDF classes • (GrAF example <f name=“” val=“.../DC-3581”/> vs <f name=“” val=“plural noun” dcr:datcat=“.../DC-3581”/>) @prefix cat: <http://www.isocat.org/datcat/> . cat:DC-258 rdfs:label "head word"@en ; rdfs:comment "A lemma heading a dictionary entry."@en . cat:DC-396 rdfs:label "part of speech"@en ; rdfs:comment "A category assigned to a word based on its grammatical and semantic properties."@en . 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 14
  • 15. www.isocat.org Data Category Relations • In the linked data world its natural to have, next to structural, ontological relationships – RDFS, OWL (2), SKOS, ... • But other resource/schema formats lack these features • Relationships between Data Categories (also across vocabularies) are important for federated search, i.e., to find semantically related resources in another archive 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 15
  • 16. www.isocat.org RELcat a Relation Registry • Stores relationships among Data Categories and also with ‘other’ concept registries – Dublin Core, OLAC, GOLD – (OLiA, OntoLingAnnot) – relationships can be the individual view of a (group of) linguist(s) • RELcat is a quad store (graph, subject, predicate, object) • Based on a ‘private’ relation type taxonomy so existing relationships specified in other vocabularies can easily be loaded – OWL (2), SKOS – normalized RELcat queries • The aim is to support various levels of traversing the semantic network, not formal reasoning – conflicting (theoretical) views • (parameters of variation) – but within known combination of sets reasoning may well be possible – also targets semantic search outside of the RDF domain 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 16
  • 17. www.isocat.org Relation type taxonomy 1. related 1. same as (a symmetric and transitive relationship) 2. almost same as (a symmetric relationship) 3. broader than (a transitive relationship and the inverse of the ’narrower than’ relationship) 1. superclass of (a transitive relationship and the inverse of the ’subclass of’ relationship) 2. has part (a transitive relationship and the inverse of the ’part of’ relationship) 1. has direct part (the inverse of the ’direct part of’ relationship) 4. narrower than (a transitive relationship and the inverse of the ’broader than’ relationship) 1. sub class of (a transitive relationship and the inverse of the ’super class of’ relationship) 2. part of (a transitive relationship and the inverse of the ’has part’ relationship) 1. direct part of (the inverse of the ’has direct part’ relationship) 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 17
  • 18. www.isocat.org Relation set @prefix relcat : <http://www.isocat.org/relcat/set/> . @prefix rel : <http://www.isocat.org/relcat/relations#> . @prefix dc : <http://purl.org/dc/elements/1.1/> . @prefix cat : <http://www.isocat.org/datcat/> . relcat:cmdi { cat:DC-2573 rel:sameAs dc:identifier . cat:DC-2482 rel:sameAs dc:language . ... cat:DC-2556 rel:subClassOf dc:contributor . cat:DC-2502 rel:subClassOf dc:coverage . } 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 18
  • 19. www.isocat.org Extension 1. related 1. same as (a symmetric and transitive relationship) 1. owl:equivalentClass 2. owl:equivalentProperty 3. owl:sameAs 4. skos:exactMatch 2. almost same as (a symmetric relationship) 1. skos:closeMatch 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 19
  • 20. www.isocat.org Normalized query PREFIX rel:<http://www.isocat.org/relcat/relations#> PREFIX cat:<http://www.isocat.org/datcat/> SELECT ?c WHERE { cat:DC-2482 rel:sameAs ?c . } • Finds the same-as clique for /languageID/ (DC-2482) specified in any vocabulary, e.g., RELcat (CMDI) for Dublin Core and annotated OWL for GOLD 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 20
  • 21. www.isocat.org Semantic network Linguistic resource (schema) Linguistic knowledge base Data categories Containers Concepts Relation Schema Registry - SCHEMAcat Data Category Registry - ISOcat Concept Registry Relation Registry - RELcat 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 21
  • 22. www.isocat.org Status • ISOcat: in production, mainly lacking in standardization – http://www.isocat.org/ • RELcat: alpha version gives read only access to some relation sets, lacking some reasoning and UI – http://lux13.mpi.nl/isocat/relcat/ • SCHEMAcat: design phase 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 22
  • 23. www.isocat.org Thank you for your attention! Visit www.isocat.org Questions? www.isocat.org/forum/ isocat@mpi.nl 7 -9 March 2012 Linked Data in Linguistics - DGfS 2012 23