FAIRPORT is an international project to develop a lightweight interoperability architecture for biomedical - and potentially other - data repositories.
This slide deck is a presentation to the FAIRPORT technical team. It describes a proposed model for supporting domain-specific search metadata using a common schema model across all repositories.
The proposal makes use of the following existing technologies, with minor extensions:
- the W3C DCAT model for dataset description
- the W3C SKOS knowledge organization system
- OWL2 Ontology Language
- Dublin Core Vocabulary
- NCBO Bioportal biomedical ontologies collection
2. Fairport Metadata:
Use Case 1
UC1 - Dataset discovery
Without knowing the dataset’s UID, find it on the web using
a Google-like search, or a faceted search .
• Example 1.1: Find datasets relevant to these terms:
<Mus musculus> <C57/Bl6J> <LT-HSC> <Flk2>
<CD34> <Mouse Genome 430 2.0 Array>
• Example 1.2: Find datasets by [all / someOf ] these
authors: <Rossi, Derrick J> <Bryder, David> <Zahn,
Jacob M>
3. UC 1 Requirements
• UC 1 only requires us to be able to search across
well-known sets of structures defining datasets, and
linked to commonly agreed terms and fields.
• In the next example, the repository has its own local
vocabularies set up for each facet.
• These vocabularies are subsets of terms from
various relevant ontologies.
4. UC 2 Requirements
• UC 2 requires us to be able to let users specify
commonly agreed terms and fields that characterize
their datasets, that are drawn from NCBO
ontologies.
• But without requiring them to choose from the too-
large comprehensive sets of terms in NCBO
Bioportal
• There is also the case of repositories like FigShare,
that support only folksonomic tagging.
5. Fairport Metadata:
Use Cases 2 & 3
UC2 - Core metadata characterization
• Example 2.1: User attaches metadata to indicate
the name of the study, the authors, the date and
version.
UC3 - Domain-specific metadata characterization:
• Example 3.1: Indicate the organism species &
strain, cell type, associated gene names, and
technology platform used to produce the dataset
7. Ontology Views
• The repository as a whole implements a “view” on the terms from
OBI, EFO, NCI & NCBI Taxon, relevant to its users - its “domain” -
by implementing Drupal taxonomies containing the terms & URIs.
• There is a much more elegant way to define ontology views, using
SKOS and OWL2 punning, outlined in
S. Jupp et al.“Taking a view on bio-ontologies”, Proceedings of
ICBO 2012, Graz, Austria.
• Download PDF: http://ceur-ws.org/Vol-897/session4-paper22.pdf
8. Advantages of Ontology
Views
• They allow useful term sets from multiple ontologies
to be combined.
• They allow you to restrict the terms only to those
needed in your domain or specific repository.
• Avoiding user confusion …
• …while preserving generality provided by the
underlying ontologies.
9.
10. Domain-specific metadata
template in SKOS
• Create a domain-specific metadata “template” as a
SKOS Concept Scheme, which defines the view your
repository takes over a set of ontology terms.
• The Concept Scheme has a tree structure.
• Top node -> facet -> facetTerm
• Facet example: StudyDesignType
• facetTerm example:
<http://purl.obolibrary.org/obo/OBI_0000951> (OBI,
“compound treatement design”)
11. W3C DCAT + SKOS
Ontology Views
• W3C DCAT already provides a standard dataset
description.
• It already references SKOS.
• DCAT assumes the SKOS Concept Scheme will
apply at the whole-repository level.
• This may not be the case for multi-domain
repositories such as Dryad, Dataverse, Figshare.
12. W3C DCAT Model
Each dataset also has a DCAT theme described by
terms from a SKOS vocabulary or “concept scheme”.
Each dataset and distribution
has a set of standard DCMI terms