SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Graph-based Ontology Analysis in the Linked Open Data

Lihua Zhao, Ryutaro Ichise
September 5, 2012, I-Semantics2012, Graz, Austria
Outline
   Introduction
   Related Work
   Our Approach
     Graph Pattern Extraction
     <Predicate, Object> Collection
     Related Classes and Predciates Grouping
     Integration for All Graph Patterns
     Manual Revision
   Experiments
     Experimental Data
     Graph Patterns of Linked Instances
     Class-level Analysis
     Predicate-level Analysis
   Comparison with Previous Work
   Conclusion and Future Work
                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 2
Introduction
Linked Open Data (LOD)
   295 data sets, 31 billion RDF triples (as of Sep. 2011).
   Interlinked instances (owl:sameAs).
                                                                                                                                                                                                                                Linked
                                                                                                                                                                                                                    LOV          User                 Slide-          tags2con
                                                                                                                                                                                                 Audio
                                                                                                                                                                                                                               Feedback             share2RDF         delicious
                                                                                                                                                                              Moseley          Scrobbler                                                                                Bricklink        Sussex
                                                                                                                                                                               Folk             (DBTune)                                                                                                 Reading            St.
                                                                                                                                                               GTAA
                                                                                                                                          Magna-
                                                                                                                                                                                                                                           Klapp-                                                         Lists          Andrews
                                                                                                                                           tune                                                                                                                                                                          Resource
                                                                                                                                                                                                                                           stuhl-                                                                                           NTU
                                                                                                                              DB                                                                                                            club                                                                           Lists          Resource
                                                                                                                            Tropes                                                                                        Lotico                            Semantic        yovisto
                                                                                                                                                        John                    Music                                                                                                         Man-                                          Lists
                                                                                                                                                                                                     Music                                                   Tweet                           chester
                                                                                                       Hellenic                                         Peel                    Brainz                                                                                                                                                                           NDL
                                                                                                                                                    (DBTune)                    (Data                Brainz                                                                                  Reading                                                           subjects
                                                                                                        FBD                                                                                         (zitgist)                                                                                 Lists                   Open
                                                                                                                       EUTC                                                   Incubator)                                                   Linked
                                                                                      Hellenic                                                                                                                                                                                                                       Library                                                     t4gm
                                                                                                                      Produc-                                                                                                              Crunch-                                                                                               Open
                                                                                        PD                             tions             Surge                                                                         RDF                                                                                                                                                        info
                                                                                                                                                                                                                                            base                                                                                                Library
                                                                                                                                         Radio                 Discogs                                                ohloh                                  Ontos          Source Code
                                                                         Crime                                                                                   (Data                                                                                                                                 Plymouth                                  (Talis)
                                                                                                                                                                                                                                                             News            Ecosystem                  Reading                                                    RAMEAU                    LEM
                                                                        Reports                           business.                                            Incubator)
                                                                                                                                                                                                                                                             Portal
                                                                                              Crime                                                                                                                                                                         Linked Data                   Lists                                                      SH
                                                                          UK                              data.gov.                                                                Music            Jamendo
                                                                                               (En-          uk
                                                                                             AKTing)                                                                              Brainz            (DBtune)                                                                                                                   Linked
                                                            Ox                                                             FanHubz                                                                                                  gnoss                                                                                                                                                                    ntnusc
                                                                                                                                                                                 (DBTune)                                                                                                       SSW                             LCCN
                                                           Points
                                                                                                                                             Last.FM                                                                Poké-                                                                      Thesau-                                              Thesau-
                                                                                  Popula-                                                     artists                                                               pédia                                        Didac-                          rus                                                 rus W                                   LIBRIS
                                                                                 tion (En-                                                   (DBTune)               Last.FM                                                                                       talia                                            theses.                                                 LCSH                                        Rådata
                                             reegle                               AKTing)           research.         patents.                                                                                                                                                                                                          MARC
                                                                                                    data.gov.         data.gov.                                     (rdfize)                                                                                                         my                                fr                Codes                                                                           nå!
                                                                 NHS                                   uk                uk                                                                                                    Good-                                              Experi-                                                List
                                  Ren.                                                                                                                                                  Classical
                                 Energy
                                                                 (En-                                                                                                                                                           win                  flickr                         ment
                                                                                                                                                                                          (DB              Pokedex             Family               wrappr                                                                                                                                                                        Norwe-
                                 Genera-                        AKTing)                Mortality                                           BBC                                                                                                                                                      Sudoc                                                PSH
                                                                                                                                                                                         Tune)                                                                                                                                                                                                                                     gian
                                  tors                                                   (En-                                            Program-
                                                                                       AKTing)                                                                                                                                                                                                                                                                                                                                    MeSH
                                                                                                                                           mes                                                                                                                   semantic                                              IdRef                                                               GND
                                                                        CO2                        education.         OpenEI                                    BBC                                                                                               web.org
                                                    Energy                                                                                                                                                                                                                             SW                              Sudoc                                            ndlna
                                                                      Emission                      data.gov.                                                   Music                                                                                                                 Dog                                                                                                                             VIAF
                          EEA                        (En-                                              uk                                                                      Chronic-                               Linked
                                                                        (En-                                                                                                                                                                                                          Food
                                                    AKTing)                                                                                                                      ling               Event              MDB               Portu-                                                                                                     UB Mann-
                                                                       AKTing)                                                                                                                                                                                                                                                                                                                                                           Europeana
                                                                                                                                                                               America              Media                                guese                                                                                                        heim
                                                                                                                                                     BBC                                                                                DBpedia                                                                                      Calames
                                                                                                                                   Recht-           Wildlife                                                                                                                                                                                                                         Deutsche
                                                                                       Ord-                                                                                                                                                                   Revyu                                    DDC
                          Open                                                                                    Openly           spraak.          Finder                                                                                                                                                                                                                              Bio-
                         Election                                                     nance                                                                                                                                                                                                                                                                                                             lobid
                                                                                                                   Local                                                                                                                                                                                                                                                              graphie                                  NSZL
                           Data             legislation                               Survey                                          nl                          Tele-                                                                                                         RDF Book                              data                                             Ulm                          Resources                                      Swedish
                         Project           data.gov.uk                                                                                                                                   New                                                                                                                                                                                                                                  Catalog
            EU Insti-                                                                                                                                            graphis                                                                                                         Mashup                               bnf.fr                                                                                                                        Open
             tutions                                                                                                                                                                     York
                                                                                                                                                                                                           URI                Greek                Open                                                                                            P20                                                                                             Cultural
                                                                    UK Post-                                                                                                            Times                                                                                                                                                                                                                                                      Heritage
                                                                                                                                                                                                          Burner             DBpedia               Calais
                                                                     codes                          statistics.                                                                                                                                                                                                                                                              ECS             Wiki                    lobid
                           GovWILD                                                                  data.gov.                                    Taxon                                                                                                          iServe                                                                                                      South-                                  Organi-
                                                                                                        uk                 LOIUS                                                                                                                                                                    BNB
                                                                                                                                                Concept                                                                                                                                                                                  ECS                                ampton                                  sations
          Brazilian                                                                                                                                                  Geo                  World                                                                                                                       BibBase                                                                                                           STW              GESIS
                                                                                                                                                                                                                                                                                   OS                                                                          ECS
             Poli-                           ESD                                                                                                                    Names                 Fact-                                                                                                                                         South-
                                                                                                                                                                                                                                                                                                                                        ampton               (RKB
           ticians                          stan-          reference.                                                                                                                     book                                                                                                                                                                                                       Budapest
                                            dards
                                                                                 data.gov.uk                                                                                                                 Freebase                                                                                                                    EPrints
                                                                                                                                                                                                                                                                                                                                                           Explorer)
                                                            data.gov.
                                                                                  intervals                                         NASA
                                                               uk                                                                                                                                                                                                     Project                                                                                                        OAI
                            Lichfield                                                                      transport.              (Data Incu-                                                                                           DBpedia                                         data                                                                                                                             Pisa
                             Spen-                                                                                                                                                                                                                                    Guten-            dcs
                                                                                                          data.gov.                 bator)                 Fishes                                                                                                      berg                                                                                                                                                              RESEX          Scholaro-
                              ding                                                                                                                                                                                                                                                                     DBLP
              ISTAT                                                                                           uk                                              of                                                                                                                                                             DBLP                                                                                                                        meter
              Immi-                        Scotland                                                                                                                          Geo                                                                                                                        (FU                  (L3S)
                                                                                                                                                           Texas                            Uberblic
             gration                       Pupils &                                                                                                                         Species                                                                                                data-               Berlin)                                DBLP                                                   IRIT
                                            Exams                                                                          Euro-                                                                                     dbpedia                                                                                                                  (RKB
                                                           London                                                           stat                                                                                                                              TCM                  open-                                                                             ACM
                                                                                                                                                                                                                       lite                                   Gene                  ac-                                                     Explorer)                                                                   IBM                       NVD
                                 Traffic                     Gazette                                                         (FUB)
                                                                                                                                              Geo
                                Scotland                                       TWC LOGD                Eurostat                                                                                                                            Daily
                                                                                                                                                                                                                                                               DIT                  uk
                                                                                                                                             Linked                                                                                                                                                                        UN/
             Data                                                                                                                                                UMBEL                                                                     Med                                                               ERA
                                                                                                                                              Data                                                                                                                                                                       LOCODE                                                                                                       DEPLOY
             Gov.ie                          CORDIS                                                                                                                               YAGO                                                                                                                                                                                                              New-
                                                                                                                                                                                                          lingvoj                                                            Disea-
                                              (RKB                                                                                                                                                                                                                           some              SIDER                                                                          RAE2001               castle                                          LOCAH
                                            Explorer)                                                                    Linked                                                                                                                                                                                                       Eurécom
                            CORDIS                                                                                                                                                                                                                           Drug                                                                                                                                                      Roma
                                                                                               Eurostat               Sensor Data                                                                                                                                                                                                                   CiteSeer
                             (FUB)                                                            (Ontology                                                                                                                                                      Bank
                                                               GovTrack                                                (Kno.e.sis)                 riese               Open                                                                                                                                           Pfam                                                                                                          Course-
                                                                                               Central)                                                                                                             Enipedia         LinkedCT
                                                                                                                                                                        Cyc              Lexvo                                                                                                                                                                                                                                       ware
                                              Linked                                                                                                                                                                                                                        UniProt                 PDB                                     VIVO
                        EURES                EDGAR                                                                                                                                                                                                                                                                                                                ePrints                       dotAC
                                                                                 US SEC                                                                                                                                                                                                                                                   Indiana                                                                        IEEE
                                            (Ontology                                                                                                                                                totl.net
                                                                               (rdfabout)
                                             Central)                                                                                                           WordNet                                                                                                                                                                                                                                                                   RISKS
                                                                                                                                                                 (VUA)                                                             Taxo-                UniProt
                                                                                                   US Census                EUNIS                Twarql                                                                                                (Bio2RDF)                                                     HGNC
                                                            Semantic                               (rdfabout)                                                                       Cornetto                                       nomy                                                                                                                   VIVO
                                 FTS                          XBRL                                                                                                                                                                                                              PRO-           ProDom                                 STITCH             Cornell                LAAS
                                                                                                                                                                                                                                                                                SITE                                                                                                                                            NSF
                                             Scotland                                                                                                                                                                                                                                                                                                                                                       KISTI
                                                                                Geo-                                                                                                                        LODE
                                               Geo-
                                              graphy                           WordNet                                                                           WordNet            WordNet                                                                                                                                                                                                 JISC
                                                                                                                                                                  (W3C)               (RKB                                                  Affy-
                                                                                                                       Climbing
                                                                                                                                              Linked                                                                                                                                                                                                KEGG
                                                                                                     SMC                                                                            Explorer)                              SISVU            metrix                                                                             Pub                  Drug               VIVO UF
                                                                Piedmont                                                                     GeoData                                                                                                             PubMed                                                                                                                                          ECCO-                            Media
                                                Finnish                                            Journals                                                                                                                                                                              Gene                 SGD             Chem
                                                                Accomo-                                                                                                                                                                                                                                                                                                                                           TCP
                                                Munici-          dations          El Viajero                                                                                                                                                                                            Ontology
                                                palities                                                                                                         Alpine                           AGROVOC                                                                                                                                                                                bible
                                                                                   Tourism                                                                         Ski                                                                                                                                                                                                                 ontology                                          Geographic
                                                                                                                                                                 Austria
                                                                                                                                                                                                                                                                                                                                                                KEGG
                                                                                                    Ocean                                                                                                                                                                                                                                                      Enzyme                                   PBAC
                                                                                                                                                                               GEMET                                               ChEMBL
                                                                       Italian                     Drilling                           Metoffice                                                                                                           OMIM                                                                               KEGG
                                                                                                                      AEMET            Weather                                                   Open                                                                                                                                                                                                                                   Publications
                                                                       public                      Codices                                                                                                          Linked                                                                          MGI                                   Pathway
                                                                                                                                      Forecasts                                                  Data                                                                           InterPro                                GeneID
                                                                      schools
                                                                                                                                                               EARTh                            Thesau-              Open                                                                                                                                                             KEGG
                                                                                     Turismo                                                                                                      rus               Colors                                                                                                                                                           Reaction                          User-generated content
                                                                                        de
                                                                                    Zaragoza                                                                                 Product                                                 Smart                                                                                                                              KEGG
                                                                                                                                                Weather                        DB                                                     Link                                                                    Medi                                                      Glycan
                                                                                                           Janus                                Stations                                   Product                                                                                                            Care                                        KEGG
                                                                                                                                                                                                                                                                                                                                                                                                                                        Government
                                                                                                            AMP                                                                                                                                      UniParc             UniRef              UniSTS
                                                                                                                                                                                            Types                Italian
                                                                                                                                                                                                                                                                                                                                        Homolo-
                                                                                                                                                                                                                                                                                                                                                          Com-
                                                                                                                                Yahoo!                         Airports                    Ontology             Museums                                                                                                                                   pound
                                                                                                                                                                                                                              Google                                                                                                     Gene                                                                                         Cross-domain
                                                                                                                                 Geo
                                                                                                                                                                                                                                Art
                                                                                                                                Planet          National                                                                      wrapper
                                                                                                                                                                                                                                                                                                                         Chem2
                                                                                                                                                 Radio-                                                                                                                                                                 Bio2RDF
                                                                                                                                                activity                                                                                                                                                    Uni                                                                                                                         Life sciences
                                                                                                                                                   JP                       Sears               Open                                                 Linked                              OGOLOD           Pathway
                                                                                                                                                                                                Corpo-           Amster-                                              Reactome
                                                                                                                                                                                                                  dam               medu-             Open
                                                                                                                                                                                                 rates                                              Numbers
                                                                                                                                                                                                                 Museum             cator
                                                                                                                                                                                                                                                                                                                                                                                                          As of September 2011




                                                                                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 3
Challenging Problems

Infeasible to understand all the ontology schema of linked data sets.
   Ontology heterogeneity problem
      Heterogeneous ontology classes
         DBpedia: http://dbpedia.org/ontology/Country.
         Geonames: http://www.geonames.org/ontology#A.PCLI.
         LinkedMDB: http://data.linkedmdb.org/resource/movie/country.
      Heterogeneous ontology predicates
         http://dbpedia.org/property/populationTotal.
         http://dbpedia.org/property/population.
   Time-consuming and infeasible to inspect large ontologies
      Misuse of classes and predicates
      DBpedia: 320 classes and thousands of predicates.



                   Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 4
Solution for the Problems



Automatically or semi-automatically integrate different ontologies
by analyzing interlinked instances.
   Semi-automatic ontology integration
      Reduce the ontology heterogeneity.
      Identify important ontology classes and predicates that link instances.
      Easy to understand simple integrated ontology.
      Simplify the queries on various data sets.




                   Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 5
Related Work
  Find useful attributes from frequent graph patterns. [Le, et al.,
  2010]
     Only for geographic data.
  Analysis of basic predicates of SameAs network, Pay-Level-Domain
  network and Class-Level Similarity network. [Ding, et al., 2010]
     Only frequent types are considered to analyze how data are connected.
  A debugging method for mapping lightweight ontologies. [Meilicke,
  et al., 2008]
     Limited to the expressive lightweight ontologies.
  Construct intermediate-layer ontology from geospatial, zoology, and
  genetics data resources. [Parundekar, et al., 2010]
     Only for specific domains and only considers at class-level.
  Construct an integrated mid-ontology from DBpedia, Geonames,
  and NYTimes. [Zhao, et al., 2011]
     Needs a hub data set and only considers at predicate-level.
                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 6
Our Approach




           Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 7
Step 1: Graph Pattern Extraction




             Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 8
Graph Pattern Extraction
Extract graph patterns from interlinked instances to discover
related ontology classes and predicates.
   SameAs Graph SG = (V, E, I), V is a set of labels of data sets, E
   ⊆ V × V, I is a set of URIs of the interlinked instances.




   Example: SGAustria = (V, E, I)
      V = {D, G, N, M}
      E = {(D,G), (D,N), (G,N), (G,M)}
      I = { db:Austria, geo:2782113, nyt:66221058161318373601,
      mdb-country:AT}.
                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 9
Step 2: <Predicate, Object> Collection




             Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 10
<Predicate, Object> Collection


An instance has a collection of <subject, predicate, object>.
(instance URI → subject, property → predicate, class → object)
   <predicate, object> (PO) pairs as the content of a SameAs Graph.
   Classify PO pairs into five types
      Class: rdf:type and skos:inScheme.
      Date: XMLSchema:date, gYear, gMonthDay, etc.
      Number: XMLSchema:integer, int, float, double, etc.
      URI: starts with “http://” and XMLSchema:anyURI.
      String: XMLSchema:string and Others.




                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 11
An Example of Collected PO pairs

               Table: PO pairs and types for SGAustria
          Predicate                       Object                      Type
          rdf:type                        owl:Thing                   Class
          rdf:type                        db-onto:Place               Class
          rdf:type                        db-onto:PopulatedPlace      Class
          rdf:type                        db-onto:Country             Class
          rdfs:label                      “Austria”@en                String
          db-onto:wikiPageExternalLink    http://www.austria.mu/      URI
          db-prop:populationEstimate      8356707                     Number
          ......                          ......                      ......
          geo-onto:name                   Austria                     String
          geo-onto:alternateName          “Austria”@en                String
          geo-onto:alternateName          “Republic of Austria”@en    String
          geo-onto:featureClass           geo-onto:A                  Class
          geo-onto:featureCode            geo-onto:A.PCLI             Class
          geo-onto:population             8205000                     Number
          ......                          ......                      ......
          rdf:type                        mdb:country                 Class
          mdb:country name                Austria                     String
          ......                          ......                      ......
          skos:inScheme                   nyt:nytd geo                Class
          skos:prefLabel                  “Austria”@en                String
          nyt-prop:first use               2004-10-04                  Date



                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 12
Step 3: Related Classes and Predicates Grouping




             Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 13
Related Classes Grouping

Group related classes from each SameAs Graph by tracking
subsumption relations owl:subClassOf and skos:inScheme.
   < C1 owl:subClassOf C2 > or < C1 skos:inScheme C2 > means the
   concept of class C1 is more specific than the concept of class C2 .




                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 14
Related Predicates Grouping
Perform pairwise comparison on <predicate, object> (PO) pairs to
find out related predicates (properties).
    Discover related predicates using different methods for the
    types of Date, URI, Number, and String.
      Date, URI: exact matching.
      Number, String: exact matching + similarity matching.
Exact matching on PO pairs to create initial sets of PO pairs.
                If OPOi = OPOj or PPOi = PPOj
                            ⇒ Sk ← POi , POj
            OPO : the object of PO.
            PPO : the predicate of PO.
            S: Initial set of PO pairs.
                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 15
Related Predicates Grouping


Similarity matching on PO pairs of type Number and String.
   Similarity between POi and POj .

                         ObjSim(POi , POj ) + PreSim(POi , POj )
    Sim(POi , POj ) =
                                            2
   Merge similar initial sets Si and Sj .

       if Sim(POi , POj ) ≥ θ, where POi ∈ Si , POj ∈ Sj




                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 16
Related Predicates Grouping
   Similarity of objects between two PO pairs.
                                        |OPOi −OPOj |
                                 1−      OPOi +OPOj                 if OPO is Number
   ObjSim(POi , POj ) =
                                 StrSim(OPOi , OPOj ) if OPO is String

     OPO : the object of PO.
     StrSim(OPOi , OPOj ): the average of the three string-based similarity
     values JaroWinkler, Levenshtein distance, and n-gram.
   Similarity of predicates between POi and POj
             PreSim(POi , POj ) = WNSim(TPOi , TPOj )

     TPO : the pre-processed terms of the predicates in PO.
     WNSim(TPOi , TPOj ): the average of the nine applied WordNet-based
     similarity values.
                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 17
Step 4: Integration for All Graph Patterns




             Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 18
Integration for All Graph Patterns
Groups of related classes and predicates are independent for each
graph pattern. Hence, we integrate them for all the graph patterns
to construct an integrated ontology.
    Select terms for integrated ontology.
      ex-onto:ClassTerm: select one concept from a set of classes.
      ex-prop:propTerm: select one concept from a set of predicates.
   Construct relations.
      ex-prop:hasMemberClasses: link sets of classes with
      ex-onto:ClassTerm.
      ex-prop:hasMemberDataTypes: link sets of predicates with
      ex-prop:propTerm.
   Construct an integrated ontology.
      Sets of related classes and predicates.
      Selected terms: ClassTerm and propTerm.
      Constructed relations.
                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 19
Step 5: Manual Revision




            Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 20
Manual Revision


Minor revision process on the automatically constructed ontology.
   Modify incorrect terms
   Not all the terms of classes and predicates are properly selected.
   Add domain information
   About 40% of the predicate sets lack of rdfs:domain information.
   Modify incorrectly grouped classes and predicates
   We can not guarantee 100% accuracy.




                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 21
Experiments



Analyze the characteristics of linked instances with the integrated
ontology constructed with our approach.
   Experimental Data
   Graph Patterns of Linked Instances
   Class-level Analysis
   Predicate-level Analysis




                  Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 22
Experimental Data




   DBpedia: cross-domain, 3.5 million things, 8.9 million URIs.
   Geonames: geographical domain, 7 million URIs.
   NYTimes: media domain, 10,467 subject news.
   LinkedMDB: media domain, 0.5 million entities.
                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 23
Graph Patterns of Linked Instances

                                                      13 graph patterns

                                                      Frequent graph patterns:
                                                      GP1, GP2, GP3

                                                      N,G,D: GP4, GP5, GP7, GP8

                                                      N,M,D: GP6

                                                      M,G,D: GP9

                                                      M,D,N,G: GP10, GP11,
                                                               GP12, GP13

             Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 24
Class-level Analysis
Successfully integrated related classes from extracted graph patters.
   Characteristics of graph patterns
            Class Type                              Graph Pattern
            Actor                                   GP2 , GP6
            Person(Athlete, Politician, etc)        GP3
            Organization/Agent                      GP1 , GP3 , GP8
            Film                                    GP2
            City/Settlement                         GP1 , GP4 , GP5 , GP7 , GP8
            Country                                 GP9 , GP10 , GP11 , GP12 , GP13
            Place(Mountain, River, etc)             GP1 , GP3 , GP7

   Integrated 97 classes into 48 groups
      Example: ex-onto:Country
       db-onto:Country     geo-onto:A.PCLI
       mdb:country         nyt:nytd geo


                    Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 25
Class-level Analysis

   Discover missing class information
   Example: db:Shingo Katori
     db:Shingo Katori rdf:type dbpedia-owl:MusicalArtist.
     mdb-actor:27092 owl:sameAs db:Shingo Katori
   Therefore, db:Shingo Katori rdf:type db-onto:Actor.
   Main classes of each data set.
     NYTimes: person, organization, and place.
     LinkedMDB: movie, actor, and country.
     Geonames: A(country, administrative region), P (city, settlement), T
     (mountain), S (building, school), and H (Lake, river).
     DBpedia: person (artist, politician, athlete), organization (company,
     educational institute, sports team), work (film), and place (populated
     place, natural place, architectural structure).


                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 26
Predicate-level Analysis
   Integrated 367 predicates into 38 groups
   Example: ex-prop:birthDate
         Predicate                                       Number of Instances
         db-onto:birthDate                                          287,327
         db-prop:datebirth                                             1,675
         db-prop:dateofbirth                                          87,364
         db-prop:dateOfBirth                                        163,876
         db-prop:born                                                 34,832
         db-prop:birthdate                                            70,630
         db-prop:birthDate                                          101,121
   Recommend standard predicates
     <db-onto:birthDate, rdfs:domain, db-onto:Person>
     “db-onto:birthDate” has the highest frequency of usage

                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 27
Comparison with Previous Work

Compare our ontology integration approach with the mid-ontology
approach [Zhao, et al., JIST2011].


 Mid-Ontology approach                         Our approach
 A hub data for data collection.               No hub data.
 String-based similarity measures              Different similarity measures for
 for all types of objects.                     different types of objects.
 105 predicates in 22 groups.                  367 predicates into 38 groups.
 No classes                                    97 classes into 48 groups



                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 28
Conclusion and Future Work

   Conclusion
     Integrate heterogeneous ontologies from various data sets.
     Identify the characteristics of graph patterns using the integrated
     ontology classes.
     Recommend standard predicates using the integrated ontology
     predicates.
     Reduce the heterogeneity of ontologies.
     Construct an integrated ontology without learning the entire ontology
     schema.
   Future Work
     Use more data sets in the LOD cloud.
     Apply MapReduce method to solve scalability and ontology
     heterogeneity problem.


                 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 29
Questions?
 Lihua Zhao, lihua@nii.ac.jp
Ryutaro Ichise, ichise@nii.ac.jp




Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 30

Contenu connexe

Tendances

World Map - Europe
World Map - EuropeWorld Map - Europe
World Map - Europe
glassyglass
 
Manual rd xv48-dtkf(en)_last
Manual rd xv48-dtkf(en)_lastManual rd xv48-dtkf(en)_last
Manual rd xv48-dtkf(en)_last
dawspeed
 
DF Report Final 30 November 2010
DF Report Final 30 November 2010DF Report Final 30 November 2010
DF Report Final 30 November 2010
Adrian Teja
 
Public radio and New Media platforms 2011
Public radio and New Media platforms 2011Public radio and New Media platforms 2011
Public radio and New Media platforms 2011
Peter Malec
 
Empowering web portal users with personalized text mining services
Empowering web portal users with personalized text mining servicesEmpowering web portal users with personalized text mining services
Empowering web portal users with personalized text mining services
mjmeurs
 

Tendances (15)

Europe
EuropeEurope
Europe
 
World Map - Europe
World Map - EuropeWorld Map - Europe
World Map - Europe
 
Goodhue Wind project
Goodhue Wind projectGoodhue Wind project
Goodhue Wind project
 
Social media campaign (Groove Temple Live in Goa)
Social media campaign (Groove Temple Live in Goa)Social media campaign (Groove Temple Live in Goa)
Social media campaign (Groove Temple Live in Goa)
 
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
 
Manual rd xv48-dtkf(en)_last
Manual rd xv48-dtkf(en)_lastManual rd xv48-dtkf(en)_last
Manual rd xv48-dtkf(en)_last
 
DF Report Final 30 November 2010
DF Report Final 30 November 2010DF Report Final 30 November 2010
DF Report Final 30 November 2010
 
Public radio and New Media platforms 2011
Public radio and New Media platforms 2011Public radio and New Media platforms 2011
Public radio and New Media platforms 2011
 
Drupal project management Edward Kay 20120622
Drupal project management Edward Kay 20120622Drupal project management Edward Kay 20120622
Drupal project management Edward Kay 20120622
 
2011 Honda Insight Wisconsin
2011 Honda Insight Wisconsin2011 Honda Insight Wisconsin
2011 Honda Insight Wisconsin
 
Op schedules ctoi
Op schedules ctoiOp schedules ctoi
Op schedules ctoi
 
Vote 2011 (4)
Vote 2011 (4)Vote 2011 (4)
Vote 2011 (4)
 
System map baggio
System map baggioSystem map baggio
System map baggio
 
Empowering web portal users with personalized text mining services
Empowering web portal users with personalized text mining servicesEmpowering web portal users with personalized text mining services
Empowering web portal users with personalized text mining services
 
School Distrcts by Telecommunications Access Membership
School Distrcts by Telecommunications Access MembershipSchool Distrcts by Telecommunications Access Membership
School Distrcts by Telecommunications Access Membership
 

En vedette

SPSS: Praxis-Leitfaden
SPSS: Praxis-LeitfadenSPSS: Praxis-Leitfaden
SPSS: Praxis-Leitfaden
René Reineke
 
Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.
Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.
Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.
Elastix México
 
Social advertising in 2011 with Havas Sports & Entertainment
Social advertising in 2011 with Havas Sports & EntertainmentSocial advertising in 2011 with Havas Sports & Entertainment
Social advertising in 2011 with Havas Sports & Entertainment
Havas Sports & Entertainment
 
Iwa profile 2013
Iwa profile 2013Iwa profile 2013
Iwa profile 2013
Karen Hover
 

En vedette (20)

spectrumK-Tagung_Versorgungsmanagement.pdf
spectrumK-Tagung_Versorgungsmanagement.pdfspectrumK-Tagung_Versorgungsmanagement.pdf
spectrumK-Tagung_Versorgungsmanagement.pdf
 
Msds Sodium Isobutyl Xanthate Sol
Msds Sodium Isobutyl Xanthate SolMsds Sodium Isobutyl Xanthate Sol
Msds Sodium Isobutyl Xanthate Sol
 
SPSS: Praxis-Leitfaden
SPSS: Praxis-LeitfadenSPSS: Praxis-Leitfaden
SPSS: Praxis-Leitfaden
 
Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.
Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.
Elastix: Rompiendo las fronteras de la comunicación. Paul Estrella, Elastix.
 
Sas
SasSas
Sas
 
Social advertising in 2011 with Havas Sports & Entertainment
Social advertising in 2011 with Havas Sports & EntertainmentSocial advertising in 2011 with Havas Sports & Entertainment
Social advertising in 2011 with Havas Sports & Entertainment
 
Desrespeito ao meio ambiente e à vida: crime ambiental e agressão à professor
Desrespeito ao meio ambiente e à vida: crime ambiental e agressão à professorDesrespeito ao meio ambiente e à vida: crime ambiental e agressão à professor
Desrespeito ao meio ambiente e à vida: crime ambiental e agressão à professor
 
Bock psicanálise
Bock psicanáliseBock psicanálise
Bock psicanálise
 
Tour de l'Ariège FSGT 2010
Tour de l'Ariège FSGT 2010Tour de l'Ariège FSGT 2010
Tour de l'Ariège FSGT 2010
 
Building The Perfect Offer
Building The Perfect OfferBuilding The Perfect Offer
Building The Perfect Offer
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
 
Curso tecnologias educacionais
Curso tecnologias educacionaisCurso tecnologias educacionais
Curso tecnologias educacionais
 
La inmaculada
La inmaculadaLa inmaculada
La inmaculada
 
Español para ti
Español para tiEspañol para ti
Español para ti
 
a innovación disruptiva de las redes sociales de investigadores frente a las ...
a innovación disruptiva de las redes sociales de investigadores frente a las ...a innovación disruptiva de las redes sociales de investigadores frente a las ...
a innovación disruptiva de las redes sociales de investigadores frente a las ...
 
Asesoramiento patrimonial - Axa Exclusiv 2016
Asesoramiento patrimonial - Axa Exclusiv 2016Asesoramiento patrimonial - Axa Exclusiv 2016
Asesoramiento patrimonial - Axa Exclusiv 2016
 
Iwa profile 2013
Iwa profile 2013Iwa profile 2013
Iwa profile 2013
 
Problemas de aprendizaje
Problemas de aprendizajeProblemas de aprendizaje
Problemas de aprendizaje
 
Feb. 2016 webinar_condensate_return_piping
Feb. 2016 webinar_condensate_return_pipingFeb. 2016 webinar_condensate_return_piping
Feb. 2016 webinar_condensate_return_piping
 
Dossier de premsa cap infant sense colònies 2016
Dossier de premsa cap infant sense colònies 2016Dossier de premsa cap infant sense colònies 2016
Dossier de premsa cap infant sense colònies 2016
 

Similaire à Graph-based Ontology Analysis in the Linked Open Data

Krextor – An Extensible Framework for Contributing Content Math to the Web of...
Krextor – An Extensible Framework for Contributing Content Math to the Web of...Krextor – An Extensible Framework for Contributing Content Math to the Web of...
Krextor – An Extensible Framework for Contributing Content Math to the Web of...
Christoph Lange
 
Identifying Information Needs by Modelling Collective Query Patterns
Identifying Information Needs by Modelling Collective Query PatternsIdentifying Information Needs by Modelling Collective Query Patterns
Identifying Information Needs by Modelling Collective Query Patterns
kelbedweihy
 

Similaire à Graph-based Ontology Analysis in the Linked Open Data (8)

Euroscipy SemNews 2011
Euroscipy SemNews 2011Euroscipy SemNews 2011
Euroscipy SemNews 2011
 
Ontology Alignment using Linked Data
Ontology Alignment using Linked DataOntology Alignment using Linked Data
Ontology Alignment using Linked Data
 
Krextor – An Extensible Framework for Contributing Content Math to the Web of...
Krextor – An Extensible Framework for Contributing Content Math to the Web of...Krextor – An Extensible Framework for Contributing Content Math to the Web of...
Krextor – An Extensible Framework for Contributing Content Math to the Web of...
 
Semantic Pingback (EKAW)
Semantic Pingback (EKAW)Semantic Pingback (EKAW)
Semantic Pingback (EKAW)
 
ReDD-Observatory
ReDD-ObservatoryReDD-Observatory
ReDD-Observatory
 
Identifying Information Needs by Modelling Collective Query Patterns
Identifying Information Needs by Modelling Collective Query PatternsIdentifying Information Needs by Modelling Collective Query Patterns
Identifying Information Needs by Modelling Collective Query Patterns
 
20111110 LOD のご紹介
20111110 LOD のご紹介20111110 LOD のご紹介
20111110 LOD のご紹介
 
The Workflow Abstraction
The Workflow AbstractionThe Workflow Abstraction
The Workflow Abstraction
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Dernier (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Graph-based Ontology Analysis in the Linked Open Data

  • 1. Graph-based Ontology Analysis in the Linked Open Data Lihua Zhao, Ryutaro Ichise September 5, 2012, I-Semantics2012, Graz, Austria
  • 2. Outline Introduction Related Work Our Approach Graph Pattern Extraction <Predicate, Object> Collection Related Classes and Predciates Grouping Integration for All Graph Patterns Manual Revision Experiments Experimental Data Graph Patterns of Linked Instances Class-level Analysis Predicate-level Analysis Comparison with Previous Work Conclusion and Future Work Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 2
  • 3. Introduction Linked Open Data (LOD) 295 data sets, 31 billion RDF triples (as of Sep. 2011). Interlinked instances (owl:sameAs). Linked LOV User Slide- tags2con Audio Feedback share2RDF delicious Moseley Scrobbler Bricklink Sussex Folk (DBTune) Reading St. GTAA Magna- Klapp- Lists Andrews tune Resource stuhl- NTU DB club Lists Resource Tropes Lotico Semantic yovisto John Music Man- Lists Music Tweet chester Hellenic Peel Brainz NDL (DBTune) (Data Brainz Reading subjects FBD (zitgist) Lists Open EUTC Incubator) Linked Hellenic Library t4gm Produc- Crunch- Open PD tions Surge RDF info base Library Radio Discogs ohloh Ontos Source Code Crime (Data Plymouth (Talis) News Ecosystem Reading RAMEAU LEM Reports business. Incubator) Portal Crime Linked Data Lists SH UK data.gov. Music Jamendo (En- uk AKTing) Brainz (DBtune) Linked Ox FanHubz gnoss ntnusc (DBTune) SSW LCCN Points Last.FM Poké- Thesau- Thesau- Popula- artists pédia Didac- rus rus W LIBRIS tion (En- (DBTune) Last.FM talia theses. LCSH Rådata reegle AKTing) research. patents. MARC data.gov. data.gov. (rdfize) my fr Codes nå! NHS uk uk Good- Experi- List Ren. Classical Energy (En- win flickr ment (DB Pokedex Family wrappr Norwe- Genera- AKTing) Mortality BBC Sudoc PSH Tune) gian tors (En- Program- AKTing) MeSH mes semantic IdRef GND CO2 education. OpenEI BBC web.org Energy SW Sudoc ndlna Emission data.gov. Music Dog VIAF EEA (En- uk Chronic- Linked (En- Food AKTing) ling Event MDB Portu- UB Mann- AKTing) Europeana America Media guese heim BBC DBpedia Calames Recht- Wildlife Deutsche Ord- Revyu DDC Open Openly spraak. Finder Bio- Election nance lobid Local graphie NSZL Data legislation Survey nl Tele- RDF Book data Ulm Resources Swedish Project data.gov.uk New Catalog EU Insti- graphis Mashup bnf.fr Open tutions York URI Greek Open P20 Cultural UK Post- Times Heritage Burner DBpedia Calais codes statistics. ECS Wiki lobid GovWILD data.gov. Taxon iServe South- Organi- uk LOIUS BNB Concept ECS ampton sations Brazilian Geo World BibBase STW GESIS OS ECS Poli- ESD Names Fact- South- ampton (RKB ticians stan- reference. book Budapest dards data.gov.uk Freebase EPrints Explorer) data.gov. intervals NASA uk Project OAI Lichfield transport. (Data Incu- DBpedia data Pisa Spen- Guten- dcs data.gov. bator) Fishes berg RESEX Scholaro- ding DBLP ISTAT uk of DBLP meter Immi- Scotland Geo (FU (L3S) Texas Uberblic gration Pupils & Species data- Berlin) DBLP IRIT Exams Euro- dbpedia (RKB London stat TCM open- ACM lite Gene ac- Explorer) IBM NVD Traffic Gazette (FUB) Geo Scotland TWC LOGD Eurostat Daily DIT uk Linked UN/ Data UMBEL Med ERA Data LOCODE DEPLOY Gov.ie CORDIS YAGO New- lingvoj Disea- (RKB some SIDER RAE2001 castle LOCAH Explorer) Linked Eurécom CORDIS Drug Roma Eurostat Sensor Data CiteSeer (FUB) (Ontology Bank GovTrack (Kno.e.sis) riese Open Pfam Course- Central) Enipedia LinkedCT Cyc Lexvo ware Linked UniProt PDB VIVO EURES EDGAR ePrints dotAC US SEC Indiana IEEE (Ontology totl.net (rdfabout) Central) WordNet RISKS (VUA) Taxo- UniProt US Census EUNIS Twarql (Bio2RDF) HGNC Semantic (rdfabout) Cornetto nomy VIVO FTS XBRL PRO- ProDom STITCH Cornell LAAS SITE NSF Scotland KISTI Geo- LODE Geo- graphy WordNet WordNet WordNet JISC (W3C) (RKB Affy- Climbing Linked KEGG SMC Explorer) SISVU metrix Pub Drug VIVO UF Piedmont GeoData PubMed ECCO- Media Finnish Journals Gene SGD Chem Accomo- TCP Munici- dations El Viajero Ontology palities Alpine AGROVOC bible Tourism Ski ontology Geographic Austria KEGG Ocean Enzyme PBAC GEMET ChEMBL Italian Drilling Metoffice OMIM KEGG AEMET Weather Open Publications public Codices Linked MGI Pathway Forecasts Data InterPro GeneID schools EARTh Thesau- Open KEGG Turismo rus Colors Reaction User-generated content de Zaragoza Product Smart KEGG Weather DB Link Medi Glycan Janus Stations Product Care KEGG Government AMP UniParc UniRef UniSTS Types Italian Homolo- Com- Yahoo! Airports Ontology Museums pound Google Gene Cross-domain Geo Art Planet National wrapper Chem2 Radio- Bio2RDF activity Uni Life sciences JP Sears Open Linked OGOLOD Pathway Corpo- Amster- Reactome dam medu- Open rates Numbers Museum cator As of September 2011 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 3
  • 4. Challenging Problems Infeasible to understand all the ontology schema of linked data sets. Ontology heterogeneity problem Heterogeneous ontology classes DBpedia: http://dbpedia.org/ontology/Country. Geonames: http://www.geonames.org/ontology#A.PCLI. LinkedMDB: http://data.linkedmdb.org/resource/movie/country. Heterogeneous ontology predicates http://dbpedia.org/property/populationTotal. http://dbpedia.org/property/population. Time-consuming and infeasible to inspect large ontologies Misuse of classes and predicates DBpedia: 320 classes and thousands of predicates. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 4
  • 5. Solution for the Problems Automatically or semi-automatically integrate different ontologies by analyzing interlinked instances. Semi-automatic ontology integration Reduce the ontology heterogeneity. Identify important ontology classes and predicates that link instances. Easy to understand simple integrated ontology. Simplify the queries on various data sets. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 5
  • 6. Related Work Find useful attributes from frequent graph patterns. [Le, et al., 2010] Only for geographic data. Analysis of basic predicates of SameAs network, Pay-Level-Domain network and Class-Level Similarity network. [Ding, et al., 2010] Only frequent types are considered to analyze how data are connected. A debugging method for mapping lightweight ontologies. [Meilicke, et al., 2008] Limited to the expressive lightweight ontologies. Construct intermediate-layer ontology from geospatial, zoology, and genetics data resources. [Parundekar, et al., 2010] Only for specific domains and only considers at class-level. Construct an integrated mid-ontology from DBpedia, Geonames, and NYTimes. [Zhao, et al., 2011] Needs a hub data set and only considers at predicate-level. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 6
  • 7. Our Approach Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 7
  • 8. Step 1: Graph Pattern Extraction Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 8
  • 9. Graph Pattern Extraction Extract graph patterns from interlinked instances to discover related ontology classes and predicates. SameAs Graph SG = (V, E, I), V is a set of labels of data sets, E ⊆ V × V, I is a set of URIs of the interlinked instances. Example: SGAustria = (V, E, I) V = {D, G, N, M} E = {(D,G), (D,N), (G,N), (G,M)} I = { db:Austria, geo:2782113, nyt:66221058161318373601, mdb-country:AT}. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 9
  • 10. Step 2: <Predicate, Object> Collection Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 10
  • 11. <Predicate, Object> Collection An instance has a collection of <subject, predicate, object>. (instance URI → subject, property → predicate, class → object) <predicate, object> (PO) pairs as the content of a SameAs Graph. Classify PO pairs into five types Class: rdf:type and skos:inScheme. Date: XMLSchema:date, gYear, gMonthDay, etc. Number: XMLSchema:integer, int, float, double, etc. URI: starts with “http://” and XMLSchema:anyURI. String: XMLSchema:string and Others. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 11
  • 12. An Example of Collected PO pairs Table: PO pairs and types for SGAustria Predicate Object Type rdf:type owl:Thing Class rdf:type db-onto:Place Class rdf:type db-onto:PopulatedPlace Class rdf:type db-onto:Country Class rdfs:label “Austria”@en String db-onto:wikiPageExternalLink http://www.austria.mu/ URI db-prop:populationEstimate 8356707 Number ...... ...... ...... geo-onto:name Austria String geo-onto:alternateName “Austria”@en String geo-onto:alternateName “Republic of Austria”@en String geo-onto:featureClass geo-onto:A Class geo-onto:featureCode geo-onto:A.PCLI Class geo-onto:population 8205000 Number ...... ...... ...... rdf:type mdb:country Class mdb:country name Austria String ...... ...... ...... skos:inScheme nyt:nytd geo Class skos:prefLabel “Austria”@en String nyt-prop:first use 2004-10-04 Date Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 12
  • 13. Step 3: Related Classes and Predicates Grouping Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 13
  • 14. Related Classes Grouping Group related classes from each SameAs Graph by tracking subsumption relations owl:subClassOf and skos:inScheme. < C1 owl:subClassOf C2 > or < C1 skos:inScheme C2 > means the concept of class C1 is more specific than the concept of class C2 . Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 14
  • 15. Related Predicates Grouping Perform pairwise comparison on <predicate, object> (PO) pairs to find out related predicates (properties). Discover related predicates using different methods for the types of Date, URI, Number, and String. Date, URI: exact matching. Number, String: exact matching + similarity matching. Exact matching on PO pairs to create initial sets of PO pairs. If OPOi = OPOj or PPOi = PPOj ⇒ Sk ← POi , POj OPO : the object of PO. PPO : the predicate of PO. S: Initial set of PO pairs. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 15
  • 16. Related Predicates Grouping Similarity matching on PO pairs of type Number and String. Similarity between POi and POj . ObjSim(POi , POj ) + PreSim(POi , POj ) Sim(POi , POj ) = 2 Merge similar initial sets Si and Sj . if Sim(POi , POj ) ≥ θ, where POi ∈ Si , POj ∈ Sj Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 16
  • 17. Related Predicates Grouping Similarity of objects between two PO pairs. |OPOi −OPOj | 1− OPOi +OPOj if OPO is Number ObjSim(POi , POj ) = StrSim(OPOi , OPOj ) if OPO is String OPO : the object of PO. StrSim(OPOi , OPOj ): the average of the three string-based similarity values JaroWinkler, Levenshtein distance, and n-gram. Similarity of predicates between POi and POj PreSim(POi , POj ) = WNSim(TPOi , TPOj ) TPO : the pre-processed terms of the predicates in PO. WNSim(TPOi , TPOj ): the average of the nine applied WordNet-based similarity values. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 17
  • 18. Step 4: Integration for All Graph Patterns Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 18
  • 19. Integration for All Graph Patterns Groups of related classes and predicates are independent for each graph pattern. Hence, we integrate them for all the graph patterns to construct an integrated ontology. Select terms for integrated ontology. ex-onto:ClassTerm: select one concept from a set of classes. ex-prop:propTerm: select one concept from a set of predicates. Construct relations. ex-prop:hasMemberClasses: link sets of classes with ex-onto:ClassTerm. ex-prop:hasMemberDataTypes: link sets of predicates with ex-prop:propTerm. Construct an integrated ontology. Sets of related classes and predicates. Selected terms: ClassTerm and propTerm. Constructed relations. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 19
  • 20. Step 5: Manual Revision Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 20
  • 21. Manual Revision Minor revision process on the automatically constructed ontology. Modify incorrect terms Not all the terms of classes and predicates are properly selected. Add domain information About 40% of the predicate sets lack of rdfs:domain information. Modify incorrectly grouped classes and predicates We can not guarantee 100% accuracy. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 21
  • 22. Experiments Analyze the characteristics of linked instances with the integrated ontology constructed with our approach. Experimental Data Graph Patterns of Linked Instances Class-level Analysis Predicate-level Analysis Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 22
  • 23. Experimental Data DBpedia: cross-domain, 3.5 million things, 8.9 million URIs. Geonames: geographical domain, 7 million URIs. NYTimes: media domain, 10,467 subject news. LinkedMDB: media domain, 0.5 million entities. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 23
  • 24. Graph Patterns of Linked Instances 13 graph patterns Frequent graph patterns: GP1, GP2, GP3 N,G,D: GP4, GP5, GP7, GP8 N,M,D: GP6 M,G,D: GP9 M,D,N,G: GP10, GP11, GP12, GP13 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 24
  • 25. Class-level Analysis Successfully integrated related classes from extracted graph patters. Characteristics of graph patterns Class Type Graph Pattern Actor GP2 , GP6 Person(Athlete, Politician, etc) GP3 Organization/Agent GP1 , GP3 , GP8 Film GP2 City/Settlement GP1 , GP4 , GP5 , GP7 , GP8 Country GP9 , GP10 , GP11 , GP12 , GP13 Place(Mountain, River, etc) GP1 , GP3 , GP7 Integrated 97 classes into 48 groups Example: ex-onto:Country db-onto:Country geo-onto:A.PCLI mdb:country nyt:nytd geo Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 25
  • 26. Class-level Analysis Discover missing class information Example: db:Shingo Katori db:Shingo Katori rdf:type dbpedia-owl:MusicalArtist. mdb-actor:27092 owl:sameAs db:Shingo Katori Therefore, db:Shingo Katori rdf:type db-onto:Actor. Main classes of each data set. NYTimes: person, organization, and place. LinkedMDB: movie, actor, and country. Geonames: A(country, administrative region), P (city, settlement), T (mountain), S (building, school), and H (Lake, river). DBpedia: person (artist, politician, athlete), organization (company, educational institute, sports team), work (film), and place (populated place, natural place, architectural structure). Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 26
  • 27. Predicate-level Analysis Integrated 367 predicates into 38 groups Example: ex-prop:birthDate Predicate Number of Instances db-onto:birthDate 287,327 db-prop:datebirth 1,675 db-prop:dateofbirth 87,364 db-prop:dateOfBirth 163,876 db-prop:born 34,832 db-prop:birthdate 70,630 db-prop:birthDate 101,121 Recommend standard predicates <db-onto:birthDate, rdfs:domain, db-onto:Person> “db-onto:birthDate” has the highest frequency of usage Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 27
  • 28. Comparison with Previous Work Compare our ontology integration approach with the mid-ontology approach [Zhao, et al., JIST2011]. Mid-Ontology approach Our approach A hub data for data collection. No hub data. String-based similarity measures Different similarity measures for for all types of objects. different types of objects. 105 predicates in 22 groups. 367 predicates into 38 groups. No classes 97 classes into 48 groups Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 28
  • 29. Conclusion and Future Work Conclusion Integrate heterogeneous ontologies from various data sets. Identify the characteristics of graph patterns using the integrated ontology classes. Recommend standard predicates using the integrated ontology predicates. Reduce the heterogeneity of ontologies. Construct an integrated ontology without learning the entire ontology schema. Future Work Use more data sets in the LOD cloud. Apply MapReduce method to solve scalability and ontology heterogeneity problem. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 29
  • 30. Questions? Lihua Zhao, lihua@nii.ac.jp Ryutaro Ichise, ichise@nii.ac.jp Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 30