The choice of which vocabulary to reuse when modeling and publishing Linked Open Data (LOD) is far from trivial. There is no study that investigates the different strategies of reusing vocabularies for LOD modeling and publishing. In this paper, we present the results of a survey with 79 participants that examines the most preferred vocabulary reuse strategies of LOD modeling. The participants, LOD publishers and practitioners, were asked to assess different vocabulary reuse strategies and explain their ranking decision. We found significant differences between the modeling strategies that range from reusing popular vocabularies, minimizing the number of vocabularies, and staying within one domain vocabulary. A very interesting insight is that the popularity in the meaning of how frequent a vocabulary is used in a data source is more important than how often individual classes and properties are used in the LOD cloud. Overall, the results of this survey help in better understanding the strategies how data engineers reuse vocabularies and may also be used to develop future vocabulary engineering tools.
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014
1. Survey on Common Strategies of Vocabulary
Reuse in Linked Open Data Modeling
Johann Schaible
GESIS Leibniz-Institute for the
Social Sciences, Cologne,
Germany
johann.schaible@gesis.org
Thomas Gottron
Institute for Web Science and
Technologies, University of Koblenz-
Landau, Germany
gottron@uni-koblenz.de
Ansgar Scherp
Kiel University and Leibniz
Information Center for Economics,
Kiel, Germany
mail@ansgarscherp.net
1) Extended Version as technical report: http://bit.ly/lodsurveyreport
2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible
2. • How to…
– …choose which vocabulary to reuse?
– …find an appropriate mix of vocabularies?
• In order to achieve aspects, such as
– providing a clear data structure
– making data easier to be consumed
– Achieving ontological agreement
Leads to different reuse strategies
Based on experience and “gut-feeling”
Motivation…
3. …and Contribution
Condense and aggregate expert’s knowledge
and experience (“gut-feeling”)
1. Which aspects for reusing vocabularies are
most important
2. Which vocabulary reuse strategy to follow
in a real-world scenario
4. Survey Design
Ranking
Task T1
Ranking
Task T2
Ranking
Task T3
Aspects for reusing
vocabularies
Reasons for
ranking decision
Reasons for
ranking decision
Reuse vs. Interlink Appropriate Mix of
vocabularies
Additional Meta-
Information
• Perspective of a LOD modeler
• “Suppose, you have to model data as LOD…“
5. Ranking Tasks Structure
Assignment:
• Model data from a specific
domain as LOD
• Need to reuse vocabularies
• “Which of the provided
options do you consider the
better vocabulary reuse
strategy”
6. Ranking Tasks Example
Strategy minV:
Reuse a minimum
amount of vocabularies
Strategy pop:
Reuse mainly popular
vocabularies
7. Features for Popularity
Number of datasets using vocabulary V
Total occurrence of vocabulary term vi
Strategy:
minV
Strategy:
pop
8. Ranking Task T1
Reuse vs. Interlink
• Domain: Movies and actors
• Vocabulary reuse strategies:
1. pop: Reuse popular vocabularies
2. link: Define own vocabulary and link it to existing
popular vocabulary ()
3. max: Reuse a maximum amount of vocabularies
(lower boundary)
• Number of possible models to choose from: 3
9. Ranking Task T2
Find appropriate mix of different vocabularies
• Domain: Publications and authors
• Vocabulary reuse strategies:
1. minV: Reuse a minimum amount of vocabularies
2. max: Reuse a maximum amount of vocabularies (lower
boundary)
3. pop: Reuse popular vocabularies
4. minC: Reuse a minimum amount of vocabularies per
concept
• Number of possible models to choose from: 4
10. Ranking Task T3
Vocabulary reuse given additional
meta-information
• Domain: Music and musical artists
• Vocabulary reuse strategies:
1. minD: Reuse only domain specific vocabularies
2. minV: Reuse a minimum amount of vocabularies
3. pop: Reuse popular vocabularies
• Number of possible model to choose from: 3
11. Results of Ranking Tasks
Key insights
• Reusing over interlinking
• Popular vocabularies over minimizing number of vocabularies
• Additional meta-information has effect on choice 11
12. Meta-Information Useful?
Key insights
• No definite favorite support
• # of datasets a vocabulary over total term occurrence
• Most common use by others information: not valuable 12
13. Aspects for vocabulary reuse
0
1
2
3
4
5
Clear Data
Structure
Data easier to
be consumed
Ontological
Aggreement
Before Ranking Tasks
After first ranking task
After second ranking
task
Ratingsona5-pointLikert-scale
13
14. • Linked Data experts and practitioners
• Acquired through LOD and Semantic Web mailing lists
• N = 79 (16 female, 63 male) (n.s. difference in answers)
• 67% academia, 23% industry, 10% both
• Research associates (22), postdocs (14), professors (8),
engineers and other professions (27).
• Age: M = 34.6, SD = 8.6
• Experience in LOD ( in years): M = 4, SD = 2.64
• Expertise in consuming and publishing LOD:
M = 3.64, S = 1 (on a 5-point-Likert Scale)
(n.s. difference in answers of group > 4 and group < 4)
Participants
15. • Which aspect are more important?
– All aspects are „somewhat important“ (Mdn = 4 )
– Aspects are rated higher in theory than in real-life
• Which strategy to follow?
– Preferred choice: reuse popular vocabularies
Better than minimizing number of vocabularies
– Popular vs. domain specific vocabularies: unclear
– Interlinking has not a good uptake
• Which meta-information is most useful?
– # of datasets using a vocabulary
– Most common use has no good uptake
Conclusion
15
16. 1) Extended Version as technical report: http://bit.ly/lodsurveyreport
2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible
Questions?
Thank you very much for participating in the survey and helping me
with my research
Notes de l'éditeur
Provide clear data structure
Make data easier to be consumed
Establish an ontological agreement in data representation
Provide clear data structure
Make data easier to be consumed
Establish an ontological agreement in data representation