SlideShare une entreprise Scribd logo
1  sur  30
www.isocat.org




                          Collaboratively Defining
                 Widely Accepted Linguistic Data Categories
                     in the ISOcat Data Category Registry


                                                                       Menzo Windhouwer
                                                                      The Language Archive – DANS
                                                                                         tla.mpi.nl
                                                                  menzo.windhouwer@dans.knaw.nl




     28 March 2013             eHg - New Trends in e-Humanities                                  1
www.isocat.org

                     The Language Archive
     • Founded in September 2011
     • Supported by MPG, BBAW and KNAW (DANS)
     • Grown out of the Technical Group at the MPI for
       Psycholinguistics
     • Since 1990ies: challenge of archiving digital data
     • 2000 – 2016 VolkswagenFoundation DOBES
       project on Endangered Languages
     • Active in many European infrastructure projects:
       CLARIN, EUDAT, DASISH, …

     28 March 2013        eHg - New Trends in e-Humanities   2
www.isocat.org

           Language Archiving Technology
     • Full lifecycle support
           – Core: resources
           – Key: metadata
           – ‘New’: CMDI, ISOcat, AV recognition,
             …
     • Archive size:
           –     70 Tb of resources
           –     22.000 hours AV recordings
           –     75.000 sessions (metadata)
           –     5 million annotated segments
           –     50 lexica
     • My focus: Knowledge Systems
           – LEXUS, an online lexicon tool
           – ISOcat and companions
     28 March 2013                 eHg - New Trends in e-Humanities   3
www.isocat.org

           Typological Database Nijmegen
            TOP NOTION tds:Noun GROUPS{
              NOTION tdn:GrammaticalDistinctions
                LABEL "Grammatical distinctions for nouns."
                GROUPS {
                 NOTION tdn:AgentNouns
                  LABEL "Agent nouns."
                  DESCRIPTION "Nouns can function as the agent of a clause."
                  LINK TO CONCEPT agentRole
                  GROUPS {
                   NOTION tdn:v098_plusAffix
                    LABEL "Agent nouns formed by verb stem plus affix."
                    LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix)
                    DESCRIPTION
                     <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p>
                    NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX"
                    IS FIELD v098;
            ...




    Notes: TDN is not in archived in TLA, but curated in TDS, a previous project I worked on, and now archived at DANS;
     28 March 2013                             eHg - New Trends in e-Humanities                                     4
    also this not a TDN punchcard
www.isocat.org

                     DOBES corpora




     28 March 2013     eHg - New Trends in e-Humanities   5
www.isocat.org

                    Oxford English Dictionary




      Source: http://www.oxford-royale.co.uk/news/2010/12/04/new-online-edition-of-oxford-english-dictionary.html
     28 March 2013                          eHg - New Trends in e-Humanities                                  6
www.isocat.org

          Terminology Community of Practice
     • Community started out on paper (A5 fiches),
       just like OED
     • 80’s - 90’s projects to standardize data
       category, the ‘fields’ on the fiches/in the
       files/database records, names
     • ISO 12620:1999 Data Categories a companion
       standard to ISO 12200 Machine-readable
       terminology interchange format (MARTIF)

     28 March 2013   eHg - New Trends in e-Humanities   7
www.isocat.org

                     ISO 12620:1999




     28 March 2013     eHg - New Trends in e-Humanities   8
www.isocat.org

         Towards a Data Category Registry
     • Problems with ISO 12620:1999 a hardcoded list of data categories
           – Not easily extensible
           – Ordering heavily debated
           – Outdated and limited in range at the moment of release
     • Developments
           – In the SALT project an interchange model (TBX) based on MARTIF/data
             categories was created, which was widely adopted
           – ISO 11179 Metadata Registries was released, which describes the
             standardization of data element concepts for metadata
           – ISO released Annex ST Standards as databases, which describes an ISO
             procedure to standardize registry entries
           – In the LIRICS project a pilot Data Category Registry, SYNTAX, was
             created




     28 March 2013                eHg - New Trends in e-Humanities              9
www.isocat.org

                                  ISO 12620:2009
     • Terminology and other content and language resources — Specification of
       data categories and management of a Data Category Registry for language
       resources
        – A data model for data category specifications inspired by ISO 11179
        – A procedure to standardize data category specification compliant with
           Annex ST
        – Each data category gets a unique Persistent Identifier (PID)
        – The Max Planck Institute for Psycholinguistics is appointed as the
           Registration Authority of the ISO/TC 37 DCR
     • In use by a growing number of ISO TC 37 standards
           –     Lexical Markup Framework (LMF)
           –     Linguistic Annotation Framework (LAF)
           –     Morph-syntactic Annotation Framework (MAF)
           –     …
           –     could be more, e.g., Feature System Declarations (FSD)
     28 March 2013                         eHg - New Trends in e-Humanities   10
www.isocat.org

         Example Data Category specification
     • Data category: /Grammatical gender/
           – Administrative part:
                 • Identifier: grammaticalGender
                 • PID: http://www.isocat.org/datcat/DC-1297
           – Descriptive part:
                 • English definition: Category based on (depending on languages)
                   the natural distinction between sex and formal criteria.
                 • French definition: Catégorie fondée (selon la langue) sur la
                   distinction naturelle entre les sexes ou d'autres critères formels.
           – Linguistic part:
                 • Morposyntax conceptual domain: /masculine/, /feminine/,
                   /neuter/
                 • French conceptual domain: /masculine/, /feminine/

     28 March 2013                   eHg - New Trends in e-Humanities                    11
www.isocat.org

                     Standardization procedure
                                           Decision Group


        Submission       Thematic Domain                Data Category Registry             Stewardship
          group              Group                              Board                         group




                          Evaluation                          Validation




                         rejected                            rejected
                                                                             Publication




     28 March 2013                  eHg - New Trends in e-Humanities                                 12
www.isocat.org

                     Thematic Domain Groups
     TDG 1: Metadata                             •           TDGs are the owner and guardians
     TDG 2: Morphosyntax                                     of a coherent subset of the DCR
     TDG 3: Semantic Content Representation      •           TDGs own one or more profiles
     TDG 4: Syntax
     TDG 6: Language Resource Ontology           •           Each TDG has a chair
     TDG 7: Lexicography                         •           A number of members assigned by
     TDG 8: Language Codes                                   SC P members
     TDG 9: Terminology                          •           A number of expert members
                                                             invited by the chair (up to 50%)
     TDG 11: Multilingual Information Management
     TDG 12: Lexical Resources
                                                 •           TDGs are constituted at the
     TDG 13: Lexical Semantics                               TC37/SC plenary
                                                        •    New TDGs need to be proposed by
                                                             a SC
                                                               1. Translation
                                                               2. (Sign language)
     28 March 2013                eHg - New Trends in e-Humanities                          13
www.isocat.org

                     ISOcat - the ISO TC 37/DCR
     • A (coherent) set of Data Categories, in our case for
       linguistic resources
     • A system to manage this set:
           – Create and edit Data Categories
           – Share Data Categories, e.g., resolve PID references
           – Standardize Data Categories
     • An API for tools to access the DCR

     • Grass roots approach
           – Anyone can access the DCR and use or
              create the data categories (s)he needs


     28 March 2013             eHg - New Trends in e-Humanities    14
www.isocat.org

        Refering to ISOcat data categories
     • PIDs of data categories can easily embedded in XML documents
          <lmf:LexicalEntry>
          <tei:f
           name="partOfSpeech"
            dcr:datcat="http://www.isocat.org/datcat/DC-1345"
           fVal="commonNoun”
            dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"/>
          <lmf:Lemma type="Form">
          <tei:f
            name="writtenForm”
             dcr:datcat="http://www.isocat.org/datcat/DC-1836"
            fVal="clergyman"/>
          </lmf:Lemma>
          </lmf:LexicalEntry>

     • Also embedding in other formats is possible, e.g., via comments
     • Preferably annotate schemas, so a whole range of resources is annotated
         in one go
     28 March 2013             eHg - New Trends in e-Humanities                15
www.isocat.org

                     A glimpse of ISOcat




     28 March 2013       eHg - New Trends in e-Humanities   16
www.isocat.org

                     Collaboration in ISOcat
     • Registered user can contact eachother via
       mediated email
           – Ask the owner if a data category can be adapted a
             little to your needs
     • Registered users can start up a group and invite
       other users to join
           – Work together on a set of data categories
           – Interact via a public and/or private forum
     • A group can submit data categories for ISO
       standardization
     28 March 2013           eHg - New Trends in e-Humanities    17
www.isocat.org

         Component MetaData Infrastructure
     • CMDI is developed by CLARIN and on its way to
       standardization by ISO TC 37
           – Limitations existing metadata schemas: DC/OLAC,
             IMDI, TEI header
                 • Inflexible: too many (IMDI) or too few (OLAC) metadata
                   elements
                 • Limited interoperability (both semantic and syntactic)
                 • Problematic (unfamiliar) terminology for some sub-
                   communities.
                 • Limited support for LT tool & services descriptions
           – The idea is to address this by:
                 • Explicit defined schema & semantics
                 • User/project/community defined components

     28 March 2013                eHg - New Trends in e-Humanities          18
www.isocat.org

                           CMDI architecture

                                                  ISOcat                   component       metadata
                       metadata
                                                                            registry &     modeler
                       catalogue
                                                                              editor

        metadata
         user

                        search &                  Relation
                                                                            metadata       metadata
                        semantic                  Registry
                                                                             editor         creator
                        mapping




                           Joint                                              Local
                        metadata                                            metadata
        metadata        repository                                          repository
         curator                                                                           metadata
                                                                                            curator
                        OAI-PMH                                              OAI-PMH
                     Service provider                                      Data provider




     28 March 2013                                              DATA
                                        eHg - New Trends in e-Humanities                              19
www.isocat.org

                          Athens Core
     • Bootstrapped the Metadata data categories
       selection in ISOcat
           – Based on existing metadata standards, e.g., DC,
             OLAC, IMDI, TEI
           – Many translations in european languages
     • Users add the data categories they need to
       the Metadata profile and use them in CMDI


     28 March 2013          eHg - New Trends in e-Humanities   20
www.isocat.org

                           CMDI architecture

                                                  ISOcat                   component       metadata
                       metadata
                                                                            registry &     modeler
                       catalogue
                                                                              editor

        metadata
         user

                        search &                  Relation
                                                                            metadata       metadata
                        semantic                  Registry
                                                                             editor         creator
                        mapping




                           Joint                                              Local
                        metadata                                            metadata
        metadata        repository                                          repository
         curator                                                                           metadata
                                                                                            curator
                        OAI-PMH                                              OAI-PMH
                     Service provider                                      Data provider




     28 March 2013                                              DATA
                                        eHg - New Trends in e-Humanities                              21
www.isocat.org

                           CMDI architecture

                        metadata                  ISOcat                   component       metadata
                       catalogues                                           registry &     modeler
                        (VLO, MI)                                             editor

        metadata
         user

                        search &                  Relation
                                                                            metadata       metadata
                        semantic                  Registry
                                                                             editor         creator
                        mapping




                           Joint                                              Local
                        metadata                                            metadata
        metadata        repository                                          repository
         curator                                                                           metadata
                                                                                            curator
                        OAI-PMH                                              OAI-PMH
                     Service provider                                      Data provider




     28 March 2013                                              DATA
                                        eHg - New Trends in e-Humanities                              22
www.isocat.org

                 CMDI (intermediate) results
     • Diverse metadata profiles
           – Center or projects create specific ones, but reuses components where
             possible
     • Shared and explicit semantics help to overcome
           – Terminological differences
           – Differences in structure
     • Future
           – Get more context sensitive
                 • e.g. documentation language vs. speaker language
           – Crosswalks
                 • equivalent metadata data categories are easily introduced due to the open nature
                   of ISOcat
           – User specific relationships
                 • e.g. theory specific differences can be more important to one user then another
     28 March 2013                       eHg - New Trends in e-Humanities                             23
www.isocat.org

                                   Metadata TDG
     • Standardization efforts of the Metadata TDG stalled
           – Large overlap with the work/people at the Athens-Core meetings
                 • Community level agreement is maybe enough
           – Activity motivation should not depend on one person, the TDG chair, only
                 • The need for explicit and shared semantics is not clear enough yet … more evangelization
                   needed
           – Unfamiliarity with the work
                 • Terminologists are more used to this kind of review work
                 • Online review vs. old ISO ‘paper’ process
           – Members have little time, it is difficult to sync schedules
                 • TDG experts tend to be senior scientist
                 • Continuous process vs. sporadic bursts of activity
           – Unpaid work
                 • Project funding vs. wide acceptance in the community
                 • However, a project might bootstrap a thematic domain
     • The same problems hold for other TDGs
           – Current tendency to tie data category (selection) standardization to a
             new/revised standard, e.g., MAF and TBX
           – Redesign of the standardization process is coming up
                 • ISO is not actively supporting Annex ST Standards as Databases anymore
     28 March 2013                         eHg - New Trends in e-Humanities                              24
www.isocat.org

                          Community efforts
     • LMF-related: UBY, RELISH/GOLD
     • Sign Language
     • CLARIN
           – CMDI, Athens Core
           – CLARIN-NL/VL
                 • Call 1 – 4 projects created CMDI and annotated
                   resources/schemas
                 • ISOcat content coordinator: Ineke Schuurman
                     – Tutorials, guidelines (do’s and don’ts) and feedback
     • Better community support in ISOcat
           – Views, e.g., CLARIN-NL/VL
           – Recommended by, e.g., DC-4949
           –…
     28 March 2013                     eHg - New Trends in e-Humanities       25
www.isocat.org

                 Conclusions and future work
     • Communties can already create a coherent view on ISOcat
           – the CMDI use case shows potential
           – maybe funder support needed to bootstrap specific domains
     • The standardized core will take (a long) time
           – like all standardization work

     • Next to metadata also content
           – explicit semantics would be profitable even when not shared and/or used for
             resource discovery
           – resources created with tools that support ISOcat will create such resources
             more easy
     • Companion registries:
           – relations between data categories (RELcat)
           – annotated schemas for language resources (SCHEMAcat)
           – interaction with the CLARIN vocabulary service (CLAVAS)
     • Data categories vs. concepts

     28 March 2013                    eHg - New Trends in e-Humanities                     26
www.isocat.org

      Detour: ISOcat and LOD/Semantic Web
     • Archives and infrastructures look at the resources as
       they are, i.e., in general no conversions to triples
     • However, ISOcat data categories can easily be used in
       RDF resources
          :partOfSpeech dcr:datcat <http://www.isocat.org/datcat/DC-396> ;
               rdfs:label "part of speech"@en ;
               rdfs:comment "A category assigned to a word based on its grammatical
          and semantic properties."@en .
     • The Relation Registry, which is a tripple store, will in
       general support lightweight, semi-formal ontologies
     M. Windhouwer, S.E. Wright. Linking to linguistic data categories in ISOcat. LDL 2012.

     28 March 2013                   eHg - New Trends in e-Humanities                         27
www.isocat.org




                       Thank you for your attention!

                                                    Visit
                                                www.isocat.org

                                            Questions?
                                       www.isocat.org/forum/
                                          isocat@mpi.nl
                                                          Acknowledgements
           Thanks to anyone at TLA, Sue Ellen Wright, Ineke Schuurman, Marc Kemps-Snijders, CLARIN-NL, CLARIN, ISO TC 37




     28 March 2013                               eHg - New Trends in e-Humanities                                          28
www.isocat.org

                     A whole litter of cats!
   Linguistic resource (schema)          Linguistic knowledge base
                                                                                Data categories
                                                                                Containers
                                                                                Concepts
                                                                                        Relation




      Schema Registry - SCHEMAcat




   Data Category Registry - ISOcat     Concept Registry                 Relation Registry - RELcat
     28 March 2013                   eHg - New Trends in e-Humanities                          29
www.isocat.org

           ISO 11179: concepts vs. data elements/categories




                                        ISO 12620 Data Categories




     28 March 2013         eHg - New Trends in e-Humanities         30

Contenu connexe

Tendances

Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...National Institute of Informatics (NII)
 
Update From OCLC Research May 2008
Update From OCLC Research May 2008Update From OCLC Research May 2008
Update From OCLC Research May 2008Nancy Elkington
 
Fondly Collisions: Archival hierarchy and the Europeana Data Model
Fondly Collisions: Archival hierarchy and the Europeana Data Model   Fondly Collisions: Archival hierarchy and the Europeana Data Model
Fondly Collisions: Archival hierarchy and the Europeana Data Model Valentine Charles
 
Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...
Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...
Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...Valentine Charles
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...National Institute of Informatics (NII)
 
Ocwc global 2013 locwd a vocabulary for ocw based on linked open data techn...
Ocwc global 2013   locwd a vocabulary for ocw based on linked open data techn...Ocwc global 2013   locwd a vocabulary for ocw based on linked open data techn...
Ocwc global 2013 locwd a vocabulary for ocw based on linked open data techn...The Open Education Consortium
 
JeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital LibraryJeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital LibrarySebastian Ryszard Kruk
 
Olaf Janssen on the principles of large-scale digital libraries and their app...
Olaf Janssen on the principles of large-scale digital libraries and their app...Olaf Janssen on the principles of large-scale digital libraries and their app...
Olaf Janssen on the principles of large-scale digital libraries and their app...Olaf Janssen
 

Tendances (14)

Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
Update From OCLC Research May 2008
Update From OCLC Research May 2008Update From OCLC Research May 2008
Update From OCLC Research May 2008
 
sw owl
 sw owl sw owl
sw owl
 
EZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data CitationEZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data Citation
 
Open Science and Identifiers
Open Science and IdentifiersOpen Science and Identifiers
Open Science and Identifiers
 
Fondly Collisions: Archival hierarchy and the Europeana Data Model
Fondly Collisions: Archival hierarchy and the Europeana Data Model   Fondly Collisions: Archival hierarchy and the Europeana Data Model
Fondly Collisions: Archival hierarchy and the Europeana Data Model
 
Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...
Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...
Mapping cross-­domain metadata to the Europeana Data Model (EDM) - EDM introd...
 
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
 
Semantic Digital Libraries
Semantic Digital LibrariesSemantic Digital Libraries
Semantic Digital Libraries
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...
 
Ocwc global 2013 locwd a vocabulary for ocw based on linked open data techn...
Ocwc global 2013   locwd a vocabulary for ocw based on linked open data techn...Ocwc global 2013   locwd a vocabulary for ocw based on linked open data techn...
Ocwc global 2013 locwd a vocabulary for ocw based on linked open data techn...
 
JeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital LibraryJeromeDL - the Semantic Digital Library
JeromeDL - the Semantic Digital Library
 
Olaf Janssen on the principles of large-scale digital libraries and their app...
Olaf Janssen on the principles of large-scale digital libraries and their app...Olaf Janssen on the principles of large-scale digital libraries and their app...
Olaf Janssen on the principles of large-scale digital libraries and their app...
 

En vedette

User Engagement for Software Teams
User Engagement for Software TeamsUser Engagement for Software Teams
User Engagement for Software TeamsRachelHollowgrass
 
Aprendiendo a programar segunda parte
Aprendiendo a programar segunda parteAprendiendo a programar segunda parte
Aprendiendo a programar segunda parteInstitución Acevedo
 
Asesoria Integral Certificada
Asesoria Integral CertificadaAsesoria Integral Certificada
Asesoria Integral Certificadawebcom10
 
Clasificación de Organismos
Clasificación de OrganismosClasificación de Organismos
Clasificación de Organismosgadiz13
 
Jason Geschwind Plugged in Roundtable
Jason Geschwind Plugged in Roundtable Jason Geschwind Plugged in Roundtable
Jason Geschwind Plugged in Roundtable jasongeschwind
 
carta de la s. general a los afiliados
 carta de la s. general a los afiliados  carta de la s. general a los afiliados
carta de la s. general a los afiliados smcugt
 
Que la web 2,0 y que es google y sus herramientas
Que la    web 2,0 y que es google y sus  herramientasQue la    web 2,0 y que es google y sus  herramientas
Que la web 2,0 y que es google y sus herramientasElizabeth AJ
 
TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...
TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...
TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...SOURAV DAS
 
Indian railways-it-interface-1233484411458048-3
Indian railways-it-interface-1233484411458048-3Indian railways-it-interface-1233484411458048-3
Indian railways-it-interface-1233484411458048-3Sameer Khan
 
Gatiso dermatitis ocupacional
Gatiso dermatitis ocupacionalGatiso dermatitis ocupacional
Gatiso dermatitis ocupacional.. ..
 

En vedette (20)

Los océanos nº13 conoce la tierra
Los océanos nº13 conoce la tierraLos océanos nº13 conoce la tierra
Los océanos nº13 conoce la tierra
 
User Engagement for Software Teams
User Engagement for Software TeamsUser Engagement for Software Teams
User Engagement for Software Teams
 
Informe lab 2
Informe lab 2Informe lab 2
Informe lab 2
 
Opción iptv
Opción iptvOpción iptv
Opción iptv
 
Aprendiendo a programar segunda parte
Aprendiendo a programar segunda parteAprendiendo a programar segunda parte
Aprendiendo a programar segunda parte
 
Catalogo one light
Catalogo one lightCatalogo one light
Catalogo one light
 
Asesoria Integral Certificada
Asesoria Integral CertificadaAsesoria Integral Certificada
Asesoria Integral Certificada
 
Programas mgco
Programas mgcoProgramas mgco
Programas mgco
 
Acaso teme job a dios de balde
Acaso teme job a dios de baldeAcaso teme job a dios de balde
Acaso teme job a dios de balde
 
patrick zulauf_01
patrick zulauf_01patrick zulauf_01
patrick zulauf_01
 
Clasificación de Organismos
Clasificación de OrganismosClasificación de Organismos
Clasificación de Organismos
 
Jason Geschwind Plugged in Roundtable
Jason Geschwind Plugged in Roundtable Jason Geschwind Plugged in Roundtable
Jason Geschwind Plugged in Roundtable
 
carta de la s. general a los afiliados
 carta de la s. general a los afiliados  carta de la s. general a los afiliados
carta de la s. general a los afiliados
 
El gerente pregunton
El gerente preguntonEl gerente pregunton
El gerente pregunton
 
Que la web 2,0 y que es google y sus herramientas
Que la    web 2,0 y que es google y sus  herramientasQue la    web 2,0 y que es google y sus  herramientas
Que la web 2,0 y que es google y sus herramientas
 
TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...
TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...
TIME AND WORK , TIME SPEED AND DISTANCE FOR CAT , MAT , MBA , BANKING EXAM , ...
 
Indian railways-it-interface-1233484411458048-3
Indian railways-it-interface-1233484411458048-3Indian railways-it-interface-1233484411458048-3
Indian railways-it-interface-1233484411458048-3
 
Jobbörse Spirofrog Mediadaten
Jobbörse Spirofrog Mediadaten Jobbörse Spirofrog Mediadaten
Jobbörse Spirofrog Mediadaten
 
Gatiso dermatitis ocupacional
Gatiso dermatitis ocupacionalGatiso dermatitis ocupacional
Gatiso dermatitis ocupacional
 
Mark-P-Hooper
Mark-P-HooperMark-P-Hooper
Mark-P-Hooper
 

Similaire à Collaboratively Defining Widely Accepted Linguistic Data Categories in the ISOcat Data Category Registry

TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31Dag Endresen
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...Martin Kalfatovic
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...
SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...
SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...SSHOC
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticsCornelius Puschmann
 
Opening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataOpening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataGilbert Paquette
 
Data Management Planning at the DCC
Data Management Planning at the DCCData Management Planning at the DCC
Data Management Planning at the DCCMartin Donnelly
 
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...faflrt
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies LIBIS
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 

Similaire à Collaboratively Defining Widely Accepted Linguistic Data Categories in the ISOcat Data Category Registry (20)

TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...
SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...
SSHOC at EOSC-hub Week - Managing Training Materials Beyond Individual Projec...
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
Opening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataOpening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked data
 
Data Management Planning at the DCC
Data Management Planning at the DCCData Management Planning at the DCC
Data Management Planning at the DCC
 
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
Metadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the schemeMetadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the scheme
 

Plus de Menzo Windhouwer

Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureMenzo Windhouwer
 
ISOcat and RELcat, two cooperating semantic registries
	ISOcat and RELcat, two cooperating semantic registries	ISOcat and RELcat, two cooperating semantic registries
ISOcat and RELcat, two cooperating semantic registriesMenzo Windhouwer
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Menzo Windhouwer
 
A CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesA CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesMenzo Windhouwer
 
LDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data CategoriesLDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data CategoriesMenzo Windhouwer
 
What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?Menzo Windhouwer
 
On the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categoriesOn the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categoriesMenzo Windhouwer
 
ISOcat: a short introduction
ISOcat: a short introductionISOcat: a short introduction
ISOcat: a short introductionMenzo Windhouwer
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Menzo Windhouwer
 

Plus de Menzo Windhouwer (13)

CMD2RDF
CMD2RDFCMD2RDF
CMD2RDF
 
Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN Infrastructure
 
ISOcat and RELcat, two cooperating semantic registries
	ISOcat and RELcat, two cooperating semantic registries	ISOcat and RELcat, two cooperating semantic registries
ISOcat and RELcat, two cooperating semantic registries
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.
 
A CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesA CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web Services
 
LDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data CategoriesLDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data Categories
 
What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?
 
ISOcat to LMF to TEI
ISOcat to LMF to TEIISOcat to LMF to TEI
ISOcat to LMF to TEI
 
On the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categoriesOn the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categories
 
The ISO-DCR
The ISO-DCRThe ISO-DCR
The ISO-DCR
 
Use of ISOcat within CMDI
Use of ISOcat within CMDIUse of ISOcat within CMDI
Use of ISOcat within CMDI
 
ISOcat: a short introduction
ISOcat: a short introductionISOcat: a short introduction
ISOcat: a short introduction
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.
 

Collaboratively Defining Widely Accepted Linguistic Data Categories in the ISOcat Data Category Registry

  • 1. www.isocat.org Collaboratively Defining Widely Accepted Linguistic Data Categories in the ISOcat Data Category Registry Menzo Windhouwer The Language Archive – DANS tla.mpi.nl menzo.windhouwer@dans.knaw.nl 28 March 2013 eHg - New Trends in e-Humanities 1
  • 2. www.isocat.org The Language Archive • Founded in September 2011 • Supported by MPG, BBAW and KNAW (DANS) • Grown out of the Technical Group at the MPI for Psycholinguistics • Since 1990ies: challenge of archiving digital data • 2000 – 2016 VolkswagenFoundation DOBES project on Endangered Languages • Active in many European infrastructure projects: CLARIN, EUDAT, DASISH, … 28 March 2013 eHg - New Trends in e-Humanities 2
  • 3. www.isocat.org Language Archiving Technology • Full lifecycle support – Core: resources – Key: metadata – ‘New’: CMDI, ISOcat, AV recognition, … • Archive size: – 70 Tb of resources – 22.000 hours AV recordings – 75.000 sessions (metadata) – 5 million annotated segments – 50 lexica • My focus: Knowledge Systems – LEXUS, an online lexicon tool – ISOcat and companions 28 March 2013 eHg - New Trends in e-Humanities 3
  • 4. www.isocat.org Typological Database Nijmegen TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098; ... Notes: TDN is not in archived in TLA, but curated in TDS, a previous project I worked on, and now archived at DANS; 28 March 2013 eHg - New Trends in e-Humanities 4 also this not a TDN punchcard
  • 5. www.isocat.org DOBES corpora 28 March 2013 eHg - New Trends in e-Humanities 5
  • 6. www.isocat.org Oxford English Dictionary Source: http://www.oxford-royale.co.uk/news/2010/12/04/new-online-edition-of-oxford-english-dictionary.html 28 March 2013 eHg - New Trends in e-Humanities 6
  • 7. www.isocat.org Terminology Community of Practice • Community started out on paper (A5 fiches), just like OED • 80’s - 90’s projects to standardize data category, the ‘fields’ on the fiches/in the files/database records, names • ISO 12620:1999 Data Categories a companion standard to ISO 12200 Machine-readable terminology interchange format (MARTIF) 28 March 2013 eHg - New Trends in e-Humanities 7
  • 8. www.isocat.org ISO 12620:1999 28 March 2013 eHg - New Trends in e-Humanities 8
  • 9. www.isocat.org Towards a Data Category Registry • Problems with ISO 12620:1999 a hardcoded list of data categories – Not easily extensible – Ordering heavily debated – Outdated and limited in range at the moment of release • Developments – In the SALT project an interchange model (TBX) based on MARTIF/data categories was created, which was widely adopted – ISO 11179 Metadata Registries was released, which describes the standardization of data element concepts for metadata – ISO released Annex ST Standards as databases, which describes an ISO procedure to standardize registry entries – In the LIRICS project a pilot Data Category Registry, SYNTAX, was created 28 March 2013 eHg - New Trends in e-Humanities 9
  • 10. www.isocat.org ISO 12620:2009 • Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resources – A data model for data category specifications inspired by ISO 11179 – A procedure to standardize data category specification compliant with Annex ST – Each data category gets a unique Persistent Identifier (PID) – The Max Planck Institute for Psycholinguistics is appointed as the Registration Authority of the ISO/TC 37 DCR • In use by a growing number of ISO TC 37 standards – Lexical Markup Framework (LMF) – Linguistic Annotation Framework (LAF) – Morph-syntactic Annotation Framework (MAF) – … – could be more, e.g., Feature System Declarations (FSD) 28 March 2013 eHg - New Trends in e-Humanities 10
  • 11. www.isocat.org Example Data Category specification • Data category: /Grammatical gender/ – Administrative part: • Identifier: grammaticalGender • PID: http://www.isocat.org/datcat/DC-1297 – Descriptive part: • English definition: Category based on (depending on languages) the natural distinction between sex and formal criteria. • French definition: Catégorie fondée (selon la langue) sur la distinction naturelle entre les sexes ou d'autres critères formels. – Linguistic part: • Morposyntax conceptual domain: /masculine/, /feminine/, /neuter/ • French conceptual domain: /masculine/, /feminine/ 28 March 2013 eHg - New Trends in e-Humanities 11
  • 12. www.isocat.org Standardization procedure Decision Group Submission Thematic Domain Data Category Registry Stewardship group Group Board group Evaluation Validation rejected rejected Publication 28 March 2013 eHg - New Trends in e-Humanities 12
  • 13. www.isocat.org Thematic Domain Groups TDG 1: Metadata • TDGs are the owner and guardians TDG 2: Morphosyntax of a coherent subset of the DCR TDG 3: Semantic Content Representation • TDGs own one or more profiles TDG 4: Syntax TDG 6: Language Resource Ontology • Each TDG has a chair TDG 7: Lexicography • A number of members assigned by TDG 8: Language Codes SC P members TDG 9: Terminology • A number of expert members invited by the chair (up to 50%) TDG 11: Multilingual Information Management TDG 12: Lexical Resources • TDGs are constituted at the TDG 13: Lexical Semantics TC37/SC plenary • New TDGs need to be proposed by a SC 1. Translation 2. (Sign language) 28 March 2013 eHg - New Trends in e-Humanities 13
  • 14. www.isocat.org ISOcat - the ISO TC 37/DCR • A (coherent) set of Data Categories, in our case for linguistic resources • A system to manage this set: – Create and edit Data Categories – Share Data Categories, e.g., resolve PID references – Standardize Data Categories • An API for tools to access the DCR • Grass roots approach – Anyone can access the DCR and use or create the data categories (s)he needs 28 March 2013 eHg - New Trends in e-Humanities 14
  • 15. www.isocat.org Refering to ISOcat data categories • PIDs of data categories can easily embedded in XML documents <lmf:LexicalEntry> <tei:f name="partOfSpeech" dcr:datcat="http://www.isocat.org/datcat/DC-1345" fVal="commonNoun” dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"/> <lmf:Lemma type="Form"> <tei:f name="writtenForm” dcr:datcat="http://www.isocat.org/datcat/DC-1836" fVal="clergyman"/> </lmf:Lemma> </lmf:LexicalEntry> • Also embedding in other formats is possible, e.g., via comments • Preferably annotate schemas, so a whole range of resources is annotated in one go 28 March 2013 eHg - New Trends in e-Humanities 15
  • 16. www.isocat.org A glimpse of ISOcat 28 March 2013 eHg - New Trends in e-Humanities 16
  • 17. www.isocat.org Collaboration in ISOcat • Registered user can contact eachother via mediated email – Ask the owner if a data category can be adapted a little to your needs • Registered users can start up a group and invite other users to join – Work together on a set of data categories – Interact via a public and/or private forum • A group can submit data categories for ISO standardization 28 March 2013 eHg - New Trends in e-Humanities 17
  • 18. www.isocat.org Component MetaData Infrastructure • CMDI is developed by CLARIN and on its way to standardization by ISO TC 37 – Limitations existing metadata schemas: DC/OLAC, IMDI, TEI header • Inflexible: too many (IMDI) or too few (OLAC) metadata elements • Limited interoperability (both semantic and syntactic) • Problematic (unfamiliar) terminology for some sub- communities. • Limited support for LT tool & services descriptions – The idea is to address this by: • Explicit defined schema & semantics • User/project/community defined components 28 March 2013 eHg - New Trends in e-Humanities 18
  • 19. www.isocat.org CMDI architecture ISOcat component metadata metadata registry & modeler catalogue editor metadata user search & Relation metadata metadata semantic Registry editor creator mapping Joint Local metadata metadata metadata repository repository curator metadata curator OAI-PMH OAI-PMH Service provider Data provider 28 March 2013 DATA eHg - New Trends in e-Humanities 19
  • 20. www.isocat.org Athens Core • Bootstrapped the Metadata data categories selection in ISOcat – Based on existing metadata standards, e.g., DC, OLAC, IMDI, TEI – Many translations in european languages • Users add the data categories they need to the Metadata profile and use them in CMDI 28 March 2013 eHg - New Trends in e-Humanities 20
  • 21. www.isocat.org CMDI architecture ISOcat component metadata metadata registry & modeler catalogue editor metadata user search & Relation metadata metadata semantic Registry editor creator mapping Joint Local metadata metadata metadata repository repository curator metadata curator OAI-PMH OAI-PMH Service provider Data provider 28 March 2013 DATA eHg - New Trends in e-Humanities 21
  • 22. www.isocat.org CMDI architecture metadata ISOcat component metadata catalogues registry & modeler (VLO, MI) editor metadata user search & Relation metadata metadata semantic Registry editor creator mapping Joint Local metadata metadata metadata repository repository curator metadata curator OAI-PMH OAI-PMH Service provider Data provider 28 March 2013 DATA eHg - New Trends in e-Humanities 22
  • 23. www.isocat.org CMDI (intermediate) results • Diverse metadata profiles – Center or projects create specific ones, but reuses components where possible • Shared and explicit semantics help to overcome – Terminological differences – Differences in structure • Future – Get more context sensitive • e.g. documentation language vs. speaker language – Crosswalks • equivalent metadata data categories are easily introduced due to the open nature of ISOcat – User specific relationships • e.g. theory specific differences can be more important to one user then another 28 March 2013 eHg - New Trends in e-Humanities 23
  • 24. www.isocat.org Metadata TDG • Standardization efforts of the Metadata TDG stalled – Large overlap with the work/people at the Athens-Core meetings • Community level agreement is maybe enough – Activity motivation should not depend on one person, the TDG chair, only • The need for explicit and shared semantics is not clear enough yet … more evangelization needed – Unfamiliarity with the work • Terminologists are more used to this kind of review work • Online review vs. old ISO ‘paper’ process – Members have little time, it is difficult to sync schedules • TDG experts tend to be senior scientist • Continuous process vs. sporadic bursts of activity – Unpaid work • Project funding vs. wide acceptance in the community • However, a project might bootstrap a thematic domain • The same problems hold for other TDGs – Current tendency to tie data category (selection) standardization to a new/revised standard, e.g., MAF and TBX – Redesign of the standardization process is coming up • ISO is not actively supporting Annex ST Standards as Databases anymore 28 March 2013 eHg - New Trends in e-Humanities 24
  • 25. www.isocat.org Community efforts • LMF-related: UBY, RELISH/GOLD • Sign Language • CLARIN – CMDI, Athens Core – CLARIN-NL/VL • Call 1 – 4 projects created CMDI and annotated resources/schemas • ISOcat content coordinator: Ineke Schuurman – Tutorials, guidelines (do’s and don’ts) and feedback • Better community support in ISOcat – Views, e.g., CLARIN-NL/VL – Recommended by, e.g., DC-4949 –… 28 March 2013 eHg - New Trends in e-Humanities 25
  • 26. www.isocat.org Conclusions and future work • Communties can already create a coherent view on ISOcat – the CMDI use case shows potential – maybe funder support needed to bootstrap specific domains • The standardized core will take (a long) time – like all standardization work • Next to metadata also content – explicit semantics would be profitable even when not shared and/or used for resource discovery – resources created with tools that support ISOcat will create such resources more easy • Companion registries: – relations between data categories (RELcat) – annotated schemas for language resources (SCHEMAcat) – interaction with the CLARIN vocabulary service (CLAVAS) • Data categories vs. concepts 28 March 2013 eHg - New Trends in e-Humanities 26
  • 27. www.isocat.org Detour: ISOcat and LOD/Semantic Web • Archives and infrastructures look at the resources as they are, i.e., in general no conversions to triples • However, ISOcat data categories can easily be used in RDF resources :partOfSpeech dcr:datcat <http://www.isocat.org/datcat/DC-396> ; rdfs:label "part of speech"@en ; rdfs:comment "A category assigned to a word based on its grammatical and semantic properties."@en . • The Relation Registry, which is a tripple store, will in general support lightweight, semi-formal ontologies M. Windhouwer, S.E. Wright. Linking to linguistic data categories in ISOcat. LDL 2012. 28 March 2013 eHg - New Trends in e-Humanities 27
  • 28. www.isocat.org Thank you for your attention! Visit www.isocat.org Questions? www.isocat.org/forum/ isocat@mpi.nl Acknowledgements Thanks to anyone at TLA, Sue Ellen Wright, Ineke Schuurman, Marc Kemps-Snijders, CLARIN-NL, CLARIN, ISO TC 37 28 March 2013 eHg - New Trends in e-Humanities 28
  • 29. www.isocat.org A whole litter of cats! Linguistic resource (schema) Linguistic knowledge base Data categories Containers Concepts Relation Schema Registry - SCHEMAcat Data Category Registry - ISOcat Concept Registry Relation Registry - RELcat 28 March 2013 eHg - New Trends in e-Humanities 29
  • 30. www.isocat.org ISO 11179: concepts vs. data elements/categories ISO 12620 Data Categories 28 March 2013 eHg - New Trends in e-Humanities 30

Notes de l'éditeur

  1. PWMNLP chapter: “The most well-known early example of a structured community-based effort to create a major language resource was perhaps the Oxford English Dictionary (OED): begun in 1857, the ‘community’ in question grew from a relatively small group of dictionary-aficionados to include hundreds of men and women scholars scattered across the English-speaking world documenting words and word forms using quasi-uniform ‘slips’ designed primarily to document usage and provenance as identified in significant works of English literature. Here the designation of types of information (main forms, part of speech, etymology, etc.) in word-oriented lexical entries is achieved by the now-famous Oxford entry layout, which uses font variation to represent the different kinds of information contained in a lexicographical entry.”