A presentation given on the IDEAS 2014 Conference about database modelling using triple stores for research data management.
IDEAS '14, July 07 - 09 2014, Porto, Portugal.
Paper Abstract:
Most current research data management solutions rely on a fixed set of descriptors (e.g. Dublin Core Terms) for the description of the resources that they manage. While these are easy to understand and use, their semantics are limited to general concepts, leaving out domain-specific metadata and representing values as sets of text values. While this enables retrieval through free-text search, faceted search and dataset interlinking becomes limited. From the point of view of the relational database schema modeler, designing a more flexible metadata model represents a non-trivial challenge because of the open nature of the model. This work demonstrates the current approaches followed by current open-source platforms and propose a graph-based model for achieving modular, ontology-based metadata for interlinked data assets in the Semantic Web. This proposed model was implemented in a collaborative research data management platform currently under development at the University of Porto.
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
Ontology-based multi-domain metadata for research data management using triple stores
1. Ontology-based multi-domain
metadata for research data
management using triple stores
João Rocha da Silva
joaorosilva@gmail.com
Faculdade de
Engenharia da
Universidade do
Porto / INESC TEC
Cristina Ribeiro
mcr@fe.up.pt DEI—Faculdade de
Engenharia da
Universidade do
Porto / INESC TEC
João Correia Lopes
jlopes@fe.up.pt
IDEAS '14, July 07 - 09 2014, Porto, Portugal
2. Contents
• Diverse metadata: relational modeling challenges
• Current approaches built on relational databases
• Dendro: graph-based research data management
• Live demo
• Conclusions
2
5. Common challenges in RDB
schema modeling
• Entities with unknown attributes at time of
modeling
• Time-variant attribute values
• Inheritance / sub-class mapping
• Resource hierarchies (parents of parents…)
• Schemas rely on external documentation
5
7. DSpace
• Academic publications management platform
• Not targeted specifically at data
• More than 1000 active installations
• Mature open-source codebase
7
8. DSpace
• Designed for self-deposit by common users
• Good deposit workflow (validation, licensing…)
8
13. •Metadata profiles for objects other than Items
•Descriptor hierarchy for specialization
•Collaborative schema derivation
•Validation of metadata completeness against different
schemas
•Restricting possible metadata for each type of resource
New requirements
13
25. Invenio
• Software behing Zenodo, a data publishing portal
• Static metadata model
• Very complex relational schema generated by
business logic code
• Tight coupling between DB and code
• Open-Source
19
38. Semantic MediaWiki
• Semantic extension of MediaWiki, the code behind
Wikipedia
• Semantic Links between pages
• Uses ontologies
• Strong emphasis on page versioning
• DB schema built around the time dimension
25
41. Semantic Forms
From DataNotes + UPBox
http://purl.pt/24107/1/iPres2013_PDF/UPBox%20and%20DataNotes%20a%20collaborative%20data%20management%20environment%20for%20the%20long%20tail%20of%20research%20data.pdf
28
42. Semantic Forms
From DataNotes + UPBox
http://purl.pt/24107/1/iPres2013_PDF/UPBox%20and%20DataNotes%20a%20collaborative%20data%20management%20environment%20for%20the%20long%20tail%20of%20research%20data.pdf
29
43. Semantic Forms
From DataNotes + UPBox
http://purl.pt/24107/1/iPres2013_PDF/UPBox%20and%20DataNotes%20a%20collaborative%20data%20management%20environment%20for%20the%20long%20tail%20of%20research%20data.pdf
30
49. Issues review
• Entities with unknown attributes at time of modeling
• Time-variant attribute values
• Inheritance / sub-classing
• Hierarchies (parents of parents of parents…)
• Need for external documentation
35
51. Graph databases
• Represent entities (Users, Products, Places…) as
vertexes (entity types are called classes)
• Connections between them are directed graph
edges (edge types are called properties)
!
• The meaning of these connections is expressed in
ontologies that can be shared and reused
37
52. Getting all my Projects
• Will fetch all the projects created by the user
• Will also return their attributes (“database columns”)
• Different projects may have different attributes
38
54. Loading an ontology
• Load ontology straight from the web
• No platform-specific syntax (like in SMW)
40
55. Nothing comes for free
• Aggregation operators slow
• No ACID properties
• Transactions are not supported in standard
SPARQL
• (“SPARQL 1.1 Query/Update Services should be atomic but that they are
not required to be atomic.”)
• Graph DBMS Solutions are in early stages (many
bugs, many “beta”s, many mailing lists…)
41
56. Dendro
• Dropbox and File/Folder description platform
• Variable descriptions
• Time-dependent values
• Directory structures (hierarchy)
• Need for simple querying…
42
64. Conclusions
• Recording rich metadata requires data model
flexibility
• Unknown attributes, time-variant information or
hierarchies can be hard to model in a relational
database
• Several current solutions make compromises due
to their relational database layer
45
65. Conclusions (cont’d)
• Graph-based models are more flexible and easily
expansible through ontology loading
• Ontologies are shareable on the web, and document
the database “schema”
• Queries become simpler due to the graph model’s
ability to easily model challenging scenarios for RDBs
• Dendro is a collaborative data management platform
fully built on a graph model
46
66. João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of
Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the
adequate preservation and discovery of research data assets.!
!
He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-
taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam,
also known as Hachi-Roku or AE86.!
Research Data Management and Semantic Web
Researcher, Web & iPhone Developer
João Rocha da Silva!
João Correia Lopes is an Assistant Professor in Informatics Engineering at Universidade do Porto and a
researcher at INESC TEC. He has graduated in Electrical Engineering in the University of Porto in 1984 and holds
a PhD in Computing Science by Glasgow University in1997. His teaching includes undergraduate and graduate
courses in databases and web applications, software engineering and object-oriented programming, markup
languages and semantic web. He has been involved in research projects in the area of long-term preservation,
service-oriented architectures and e-Science. Currently his main research interests are e-Science and the
management of research data.
Cristina Ribeiro is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at
INESC TEC. She has graduated in Electrical Engineering, holds a Master in Electrical and Computer Engineering
and a Ph.D. in Informatics. Her teaching includes undergraduate and graduate courses in information retrieval,
digital libraries, knowledge representation and markup languages. She has been involved in research projects in
the areas of cultural heritage, multimedia databases and information retrieval. Currently her main research
interests are information retrieval, digital preservation and the management of research data.
Assistant Professor in Informatics Engineering at
Universidade do Porto, Researcher at INESC TECCristina Ribeiro!
Assistant Professor in Informatics Engineering at
Universidade do Porto, Researcher at INESC TEC
João Correia Lopes!