Information Content based 
Ranking Metric for Linked Open 
Vocabularies 
Ghislain A. Atemezing (@gatemezing) 
Raphaël Tron...
Goal and Agenda 
 Goal: Present a new ranking metric for reusing 
vocabularies 
 Motivation 
 Combine Information Theor...
Vocabulary Purpose 
 Model to understand a domain’s semantics 
 Vocabulary terms contain information 
 A term = Class, ...
Existing catalogs of vocabularies 
Some catalogs of vocabularies 
201/09/05 SEMANTICS 2014 - Leipzig, Germany - 4
Linked Open Vocabularies (LOV) 
 A curated list of vocabularies 
 More than 420 vocabularies 
Each of them described by...
LOV DESCRIPTION: http://lov.okfn.org/dataset/lov/ 
CORE FEATURES OF THE FRAMEWORK 
Domain Intended Use Collection Gatekeep...
LOV Evolution since March, 2011 
Quasi linearity of the growth, 
started with 75 vocabularies 
The glitch in 2012 
corresp...
Proposal: Metrics for Ranking LOV 
 Metrics 
Information Content Metric (IC): value of 
information associated with a gi...
Information Content Metrics for LOV 
 Information Content 
 Formula: 
 N = MAX value of term 
occurrence in LOV 
 φ(t)...
Information Content Metrics for LOV 
 (Light)weighting 
scheme 
 wf=2 if datasets are using 
vocabulary 
 wf=1 if vocab...
Ranking Algorithm 
1- Candidate terms selection in LOV 
2- Grouping terms by namespace & 
weight assignment 
3- Compute IC...
Running Example: dcterms vs foaf 
 dcterms: 
http://purl.org/dc/terms/ 
 Candidate terms: 53 (39 
properties + 14 classe...
Results on Ranking 
Top-15 terms (IC value) Top-15 vocabs (PIC value) 
201/09/05 SEMANTICS 2014 - Leipzig, Germany - 13
Comparison 
 Relative stable position of foaf in prefix.cc, 
vocab.cc and lodstats catalogues. 
 LOV-PIC/LODstats: skos,...
Applications of the Ranking Metrics 
 Vocabulary life-cycle management 
Help assessing the use of terms and vocabulary u...
Conclusion and Future Work 
 We have presented new metrics for ranking 
vocabularies 
 By applying Information Content c...
Thanks for your attention! 
Q/A Session
Prochain SlideShare
Chargement dans…5
×

Information Content based Ranking Metric for Linked Open Vocabularies

729 vues

Publié le

This talk was presented in Leipzig, during the SEMANTiCS '2014 Conference, in September. It basically gives an overview of how Information Content Theory metrics can be applied to Semantic Web, and especially to vocabularies. The results of the proposed ranking metrics can be applied in three areas: (1) vocabulary life-cycle management, (ii) semantic web visualizations and (iii) Interlinking process.

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Information Content based Ranking Metric for Linked Open Vocabularies

  1. 1. Information Content based Ranking Metric for Linked Open Vocabularies Ghislain A. Atemezing (@gatemezing) Raphaël Troncy (@rtroncy)
  2. 2. Goal and Agenda  Goal: Present a new ranking metric for reusing vocabularies  Motivation  Combine Information Theory with metadata information  Find new assessment metric for vocabularies  Current situation  Unicity of popularity based-metric (e.g. prefix.cc or lodstats)  Only ONE dimension used for assessing vocabularies  Proposal: compute informativeness of LOV terms  Experiments and Results  Applications 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 2
  3. 3. Vocabulary Purpose  Model to understand a domain’s semantics  Vocabulary terms contain information  A term = Class, Object Property, Data Property  Essential for publishing data on the Web  How to quantify value of a term?  Informativeness value = negative relation with probability 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 3
  4. 4. Existing catalogs of vocabularies Some catalogs of vocabularies 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 4
  5. 5. Linked Open Vocabularies (LOV)  A curated list of vocabularies  More than 420 vocabularies Each of them described by the vocabulary-of-a-friend (voaf) schema  Track the (temporal) evolution of vocabularies  Some related services SPARQL endpoint: http://lov.okfn.org/endpoint/lov Search function: http://lov.okfn.org/dataset/lov/search An Aggregator endpoint: http://lov.okfn.org/endpoint/lov_aggregator An intelligent bot agent for updates: http://lov.okfn.org/dataset/lov/bot 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 5
  6. 6. LOV DESCRIPTION: http://lov.okfn.org/dataset/lov/ CORE FEATURES OF THE FRAMEWORK Domain Intended Use Collection Gatekeeping Number of Ontologies Dynamics Search metadata Search within ontology Search across ontologies Navigation criteria General Promote and facilitate the reuse of vocabularies in the linked data ecosystem. Submitted by any user via LOV-Suggest tool. Manual curation and automatic URI validation 450+ Growing Yes, with visual depiction Yes Keyword-based; structured search (query-based) Ordered by prefix, namespace, title and visual links navigation CORE FEATURES OF THE FRAMEWORK Metrics Comments and review Ranking Web service access SPARQL endpoint Content available Read/ Write Ontology directory Ontology registry Applicatio n platform Reuse popularity on the LOD Cloud N/A - Only by the curators Metric-based API Yes Ontology metadata , URI Read Yes Yes Yes LOV DESCRIPTION WITH THE FRAMEWORK OF [d’Aquin-Noy2012-Survey] 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 6
  7. 7. LOV Evolution since March, 2011 Quasi linearity of the growth, started with 75 vocabularies The glitch in 2012 corresponds to the migration to OKFN 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 7
  8. 8. Proposal: Metrics for Ranking LOV  Metrics Information Content Metric (IC): value of information associated with a given entity Partition Information Content Metric (PIC) Proposed a ranking based on IC and PIC  Method Adapt IC and PIC function on semantics Select candidate vocabularies in LOV catalog Compute the scores 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 8
  9. 9. Information Content Metrics for LOV  Information Content  Formula:  N = MAX value of term occurrence in LOV  φ(t)=occurrence of term in LOV  Partitioned IC  LOV is a semantic network of resources  Formula:  wf= weight for vocab f +objectURI+ = owl:ObjectProperty/Datatyp eProperty; rdfs:Property 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 9
  10. 10. Information Content Metrics for LOV  (Light)weighting scheme  wf=2 if datasets are using vocabulary  wf=1 if vocabulary reused other vocabularies.  wf=3 if vocabulary reused elsewhere 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 10
  11. 11. Ranking Algorithm 1- Candidate terms selection in LOV 2- Grouping terms by namespace & weight assignment 3- Compute IC score 4- Compute PIC score Output ranking 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 11
  12. 12. Running Example: dcterms vs foaf  dcterms: http://purl.org/dc/terms/  Candidate terms: 53 (39 properties + 14 classes)  wf = 1+ 2+3 = 6  PIC = 1724.844  foaf: http://xmlns.com/foaf/0.1/  Candidate terms: 35 (26 properties + 9 classes)  wf = 1+ 2+ 3 = 6  PIC = 1033.197 PIC(dcterms) > PIC(foaf) 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 12
  13. 13. Results on Ranking Top-15 terms (IC value) Top-15 vocabs (PIC value) 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 13
  14. 14. Comparison  Relative stable position of foaf in prefix.cc, vocab.cc and lodstats catalogues.  LOV-PIC/LODstats: skos, dcterms with “relative” stable raking.  List of “most popular” vocabularies: foaf, skos, dcterms, time, dce, prov. 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 14
  15. 15. Applications of the Ranking Metrics  Vocabulary life-cycle management Help assessing the use of terms and vocabulary updates  Monitoring the use of http://www.w3.org/2003/06/sw-vocab- status/ns#:term_status or owl:deprecated  Semantic Web applications Vocabularies with higher PIC might be proposed to a user as much as possible, e.g. for choosing properties to display in a facetted browsing interface  Interlinking datasets  Generate sameAs links with data based on vocabularies terms with lower IC value 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 15
  16. 16. Conclusion and Future Work  We have presented new metrics for ranking vocabularies  By applying Information Content concept to LOV  By taking more dimensions in the ranking metrics  The metrics can be applied to vocabulary reused, ontology modelling and visualizations  Future work Add equivalence axioms in the ranking model Compare (P)IC with other graph-based ranking (e.g. pagerank)  Investigate the dependency ranking between vocabularies 201/09/05 SEMANTICS 2014 - Leipzig, Germany - 16
  17. 17. Thanks for your attention! Q/A Session

×