The availability of high-quality metadata is key to facilitating discovery in the large variety of scientific datasets that are increasingly becoming publicly available. However, despite the recent focus on metadata, the diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality. There is a pressing need for a metadata representation format that provides strong interoperation capabilities together with robust semantic underpinnings. In this talk, we describe such a format, together with open-source Web-based tools that support the acquisition, search, and management of metadata. We outline an initial evaluation using metadata from a variety of biomedical repositories.
An open repository model for acquiring scientific metadata
1. An Open Repository Model for Acquiring Knowledge about Scientific
Experiments
EKAW 2016 – November 21th, 2016
Bologna, Italy
Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett,
John Graybeal, and Mark A. Musen
Stanford University, Stanford, CA, USA
Stanford Universitymetadatacenter.org
3. Metadata Key to Addressing
Problem
• Crucial for reproducibility in biomedicine
– Locate experimental datasets online
– Understand how the experiments were performed
– Reuse the data to perform new analyses
• Journals and funding agencies increasingly
require making experimental data and metadata
available
7. age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
Result: Poor Metadata
Variants of ‘age’
metadata field in Gene
Expression Omnibus
(GEO) repository
8. Our Solution: CEDAR - A Metadata
Ecosystem
• Overcome the impediments to creating high-quality
metadata
• Facilitate
– Creation
– Acquisition
– Use
– Evaluation
– Refinement
• Key goal: create a sharable metadata exchange
format – a template model - for publishing, searching,
exchanging metadata
9. CEDAR Template Model Goals
• Must describe composite
structure of templates
• Implemented using standard
formats
• Express semantics
• Metadata instances:
– Linked to controlled terms
– Easily serializable
– Easily validated
– Easily indexed
– Interchange with RDF
– High readable
– Produced/consumed via
REST APIs and usable in
JavaScript front ends
– Meets FAIR goals
Study
Principal Investigator
Description
Name
Institution
Name
ZIP
Title
Metadata Template
FieldsTemplate
Elements
10. JSON Schema + JSON-LD JSON-LD
Using JSON Schema and JSON-
LD for CEDAR Template Model
11. What is JSON Schema?
• Technology for describing and validating the
structure of JSON documents
• Provides a structural description of any JSON
document
• JSON documents that are specified with JSON
Schema can be structurally validated against their
associated schemas
• Analogous to XML Schema
12. What is JSON-LD?
• A lightweight syntax to serialize Linked Data in JSON
• Allows existing JSON to be interpreted as Linked Data with
minimal changes
• JSON-LD is primarily intended to be a way to:
– use Linked Data in Web-based programming environments
– build interoperable Web services
– store Linked Data in JSON-based storage engines
• Core contribution: add semantics to JSON documents
• W3C Recommendation: https://www.w3.org/TR/json-ld/
21. Initial Results
• Public alpha release in September 2016
• Represented all public metadata in
ImmPort repository (146 studies)
• Represented an array of public ISA-
created biomedical studies (~300)
• Represented 60k ISO 11179-based
Common Data Elements from NCI
• Currently working with Stanford Digital
Repository and several research groups
22. Summary
• We have developed a standards-based
template model for representing,
publishing, and sharing templates and
metadata
• Provides strong interoperation with Linked
Open Data
• Metadata easy to create/consume using
off-the-shelf tools
• Very easy to work with using CEDAR tools
23. CEDAR Resources
• Web site: http://metadatacenter.org
• Workbench: https://cedar.metadatacenter.net
• GitHub: https://metadatacenter.github.io