The document outlines the plans for a PhD research project on enhancing semantic interoperability among spreadsheets. The research will build upon a previous master's degree which identified construction patterns in spreadsheets and linked labels to ontologies within a single domain. The PhD plans to address limitations of the previous work by considering multiple domains, developing a model to relate elements across spreadsheets, and linking spreadsheet structure to ontologies at the concept level. Key research questions involve defining when spreadsheets share the same purpose, canonical representations among similar spreadsheets, and using representations to predict spreadsheet purpose and domain. The goal is achieving semantic interoperability across spreadsheets.
13. Which elements must be
considered in this
interpretation process?
Unity Interpretation
14. Related Work
isolated label
(Han et al,. 2008) - RDF123: from spreadsheets to RDF, The Semantic Web. Lecture Notes in Computer Science, vol. 5318. Springer
(Langegger & Wolfram, 2009) - XLWrap Querying and Integrating Arbitrary Spreadsheets with SPARQL, The Semantic Web. Lecture
Notes in Computer Science, vol. 5823. Springer
15. Related Work
template
(Abraham & Erwig, 2006) - Inferring Templates from Spreadsheets, Proceedings of the International Conference on Software Engineering
16. Related Work
instances
(Zhao et al, 2010) - A spreadsheet system based on data semantic object, IEEE International Conference on Information Management and
Engineering
17. Related Work
isolated label associated to
linked data
(Syed et al., 2010) - Exploiting a Web of Semantic Data for Interpreting Tables, Proceedings of the Web Science Conference
18. Related Work
correlation of labels
associated to linked data
(Venetis et al., 2011) - Recovering Semantics of Tables on the Web, Proceedings of the VLDB Endowment
(Mulwad et al., 2010) - Using linked data to interpret tables, Proceedings of the International Workshop on Consuming Linked Data
19. Related Work
correlation between several
spreadsheet elements
associated to linked data
(Limaye, 2010) - Annotating and Searching Web Tables Using Entities, Proceedings of the VLDB Endowment
20. How far the system can
interpret, considering labels and
their correlations?
26. Research Strategy
1. To identify construction patterns followed by biologists
during the creation of these spreadsheets
2. To verify if these construction patterns could lead us to
recognition of the spreadsheet purpose
3. To achieve a semantic interoperability among these
spreadsheets
43. Architecture Evaluation
Automatic analysis of 11,150 spreadsheets
the system recognized 1,151 spreadsheets
806 spreadsheets were classified as catalogue
345 spreadsheets were classified as collection
Total: 748,459 records analyzed
*
44. Architecture Evaluation - Results
• Random subset of 1,203 spreadsheets was
selected to evaluate precision/recall
– Precision: 0.84
– Recall: 0.76
– Specificity: 0.95
*
46. Main Limitations● Single Domain
Specific spreadsheets (catalogue and
collection)
● Lack of a Model to represent
construction patterns
○ after, model for construction
patterns isolated for each other
● Linking labels to ontologies
○ not able to aggregate different
labels belonging to the same
concept
○ the ontology was selected by us, it
is not necessarily the best
representation for spreadsheets'
data
47. ● Single Domain
○ Specific spreadsheets (catalogue
and collection)
● Lack of a Model to represent
construction patterns
○ after, model for construction
patterns isolated for each other
● Linking labels to ontologies
○ not able to aggregate different
labels belonging to the same
concept
○ the ontology was selected by us, it
is not necessarily the best
representation for spreadsheets'
data
● Multiple Domains
● Model as an association
network
○ relates elements and
concepts of several
spreadsheets
● Linking spreadsheet structure
to ontologies
○ the link is made between
concepts
87. Research Questions
• When spreadsheets could be considered of the
same purpose?
• Is there a canonical representation among
spreadsheets of the same purpose?
• Is it possible to define a canonical representation
for a spreadsheet group
• Can this representation be used to predict
spreadsheets of a given purpose?
88. Acknowledgements
● Laboratory of Information Systems (LIS)
● UNICAMP
● FAPESP
● Microsoft Research FAPESP Virtual Institute
(NavScales project)
● CNPq (MuZOO Project and PRONEX-FAPESP)
● INCT in Web Science(CNPq 557.128/2009-9)
● CAPES