An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Toward Semantic Representation of Science in Electronic Laboratory Notebooks (ELNs)
1. Toward Semantic Representation
of Science in Electronic Laboratory
Notebooks (ELNs)
Stuart J. Chalk
Department of Chemistry, University of North Florida
schalk@unf.edu
CINF Paper 50 – 251st ACS Meeting Spring 2016
#ACSCINFDataSummit
2. Utopia: A Global Research Network
What is an Electronic Notebook?
The Semantics of Semantics
What Needs to be Semantically Represented?
Current lay of the land
ELN Item Manifest
P-PLAN Ontology
VIVO-ISF Ontology
Chemical Analysis Metadata Platform
HCLS Community Profiles
Electronic Notebook Ontology
A generic scientific data model
Experimental information for LD (ExptLD)
Take Home
Conclusion
Outline
3. “Big Data” and the “Semantic Web” are the current buzz
words du jour but what do they mean for chemistry?
Lots of heterogeneous data and metadata with even more
“semantic” data to represent it
Look at what we want rather that what we have…
We went chemical data that is:
Easy to share, find, and compare
Freely available but with provenance
Globally sourced and without IP restrictions on reuse
Utopia: A Global Research Network
4. An electronic way to record data…
...equivalent to a laboratory notebook
But ELN’s should not be thought of so lowly...
An ELN must:*
Keep track of research data
Reference resources used in research and…
…capture the story of research
What is an
Electronic Laboratory Notebook?
* Insight from Tony Williams
5. The interface should mirror a laboratory notebook
Behind the scenes though it should use state of the art
software, data formats, data/metadata practices, and web
technologies to manage data generation, workflows,
remote data access, authentication etc…
As a result it needs to speak the same language as other
data sources and store data in a format that others can read
and reuse
Foundational building block of a Global Research Network
What should an ELN be?
6. Semantics is the study of meaning
-> We need to give meaning to what is created in an ELN
Described in computers using the Resource Description
Framework (RDF) which:
Makes statements about objects…
… their relationships to other objects...
...using subject-predicate-object “triples”
RDF allows knowledge representation
Meaning is represented by using one or more ontologies
The Semantics of Semantics
9. Everything!
What areas?
Data, Results and Resources
Models, Tools for Data Workup (Equations, Tests, Stats)
General Workflows (Protocols and Procedures)
The Research Story (What, Why, How)
User discussion and annotation
ELN usage timeline
The Science (Area, Hypotheses, Theories)
The People (Expertise, Provenance, Integrity, Eminence)
What Needs to be
Semantically Represented?
10.
11. The P-PLAN Ontology
http://purl.org/net/p-plan
Workflows
Implement in Kepler, Taverna, Knime?
12. People: The VIVO-ISF Ontology
https://wiki.duraspace.org/download/attachments/51052811/PeopleOrgsRolesGrants.2014-03-14.png
13. The Chemical Analysis Metadata Platform (ChAMP)
http://champ-project.org/
Identification of metadata related to chemical
analysis and definition of an ontology to describe
terms
Examples in both XML and JSON-LD with associate
XML Schema and JSON-LD context
Journal Article
Standard Method of Analysis
Reference Material
The Science: ChAMP (an example)
16. The Healthcare and Life Science (HCLS) Community Profile
is a Note from the Semantic Web HCLS Interest Group
Access to consistent, high-quality metadata is critical to finding,
understanding, and reusing scientific data. This document
describes a consensus among participating stakeholders in the
Health Care and the Life Sciences domain on the description of
datasets using the Resource Description Framework (RDF). This
specification meets key functional requirements, reuses existing
vocabularies to the extent that it is possible, and addresses
elements of data description, versioning, provenance,
discovery, exchange, query, and retrieval.
Data Descriptions:
HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/
17. Describes three levels for
description of datasets
Summary Level
Type declaration (rdf:type =
dctypes:Dataset)
Title (dct:title = rdf:langString)
Description (dct:description =
rdf:langString)
Publisher (dct:publisher = IRI)
Version Level
Type declaration (rdf:type =
dctypes:Dataset)
Title (dct:title = rdf:langString)
Description (dct:description =
rdf:langString)
Creator (dct:creator = IRI)
Publisher (dct:publisher = IRI)
Version identifier (pav:version =
xsd:string)
Version linking (dct:isVersionOf = IRI)
Distribution Level
Type declaration (rdf:type =
void:Dataset OR dcat:Distribution)
Title (dct:title = rdf:langString)
Description (dct:description =
rdf:langString)
Creator (dct:creator = IRI)
Publisher (dct:publisher = IRI)
License (rdf:type = IRI)
Data Descriptions:
HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels
21. Use a Generic
Scientific Data
Model
Captures data
and metadata
about datasets
and links to
related data
JSON-LD is
ideal file format
Data and
Resources
22. A specification (written in XML) that describes
different data types of information recorded during
the scientific process (http://exptml.sourceforge.net
Experiment Markup Language (ExptML)
Sample
Solution
Space
Specimen
Substance
Task
Template
Timeline
User
Vendor
Annotation
Api
Calculation
Chemical
Citation
Communication
Customer
Data
Dataset
Definition
Element
Equipment
Event
Experiment
Group
Project
Protocol
Quote
Report
Result
23. Experimental Linked Data (ExptLD)
Define data packets
that capture the
metadata of
Resources
Data
Integrate with other
ExptLD packets to
create a SciData
document
Or convert to RDF and
store in a triplestore
24. A lot exists to semantically represent the scientific process
that can be leveraged as part of an ELN system
A data standard needs to be agreed upon
Agreeing on implementation standards will take time
because of size of user community
Integration and coverage of ontologies will be necessary to
fully implement a system that underpins a Global Research
Network
Domain specific knowledge representation needed in many
areas
Take Home