2. Outline
• Background
• What is the RDF Data Cube Vocabulary?
• Semi-automated approach
• OntoWiki's CSVImport plug-in
• RDFized GHO data
• Limitations and Future Work
2
3. Background
• Biomedical statistical data
• Published as Excel sheets
• Advantage
• Readable by humans
• Disadvantages
• Cannot be queried efficiently
• Difficult to integrate with other data (in different formats)
• Our approach
• Converting data into a single data model - RDF
• Using the RDF Data Cube Vocabulary*
• designed particularly to represent multidimensional
statistical data using RDF.
*http://www.w3.org/TR/vocab-data-cube/
3
5. What is the RDF Data Cube
Vocabulary?
• Dimensions
• Attributes
• Measures
• Observations
5
6. Semi-automated approach
• Transforming CSV to RDF in a fully automated way is not
feasible.
• Dimensions may often be encoded in heading or
label of a sheet
• Our semi-automatic approach:
• As a plug-in in OntoWiki#
• a semantic collaboration platform developed by
the AKSW research group.
• A CSV file is converted into RDF using the RDF
Data Cube Vocabulary: http://aksw.org/Projects/
Stats2RDF
# Sören Auer, Sebastian Tramp (geb. Dietzold), Jens Lehmann, and Thomas Riechert:
OntoWiki: A Tool for Social Semantic Collaboration In: Proceedings of the Workshop on
Social and Collaborative Construction of Structured Knowledge CKC 2007 at the 16th
International World Wide Web Conference WWW2007 Banff, Canada, May 8, 2007 6
15. RDFized GHO data
• Available at http://gho.aksw.org
• 50 datasets
• ~ 8 million triples
• Paper published at SWJ Call for Dataset descriptions:
http://www.semantic-web-journal.net/content/publishing-and-
interlinking-global-health-observatory-dataset
15
16. Limitations and Future Work
• Conversion
• Coherence
• Temporal Comparability
• Exploring GHO
16