To integrate science into the semantic web it is important to capture the context of research as it is done. ExptML is designed to store information and workflows from the scientific process.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
247th ACS Meeting: Experiment Markup Language (ExptML)
1. Experiment Markup Language:
A Combined Markup Language and
Ontology to Represent Science
Stuart J. Chalk
Department of Chemistry
University of North Florida
schalk@unf.edu
2014 Spring ACS Meeting – CINF Paper 19
2. Digital Representation of Science
Electronic Notebooks
The Eureka Research Workbench
Experiment Markup Language
ExptML Schema and Files
Semantic Data and Ontologies
File Storage
Eureka Interface
Web Interface
Conclusion
Outline
3. Most research on digital science is focused on the data
Standards exist for the digital representation of
Data -> individual measurements, time series, spectra
Molecules
Chemical Reactions
Context is important!
Context can be added ad-hoc
Needs to be added systematically - to be searchable
We need a digital representation of the scientific process
Digital Representation of Science
4. Conceptualized in 2006
Need a way to store
Research activities
Laboratory resources
Data
Need to capture the workflow of scientists – not define it
Writing in a lab notebook is equivalent to blogging…
…but the context of the entries is important and varies
Many data types, so how to capture information?
Experiment Markup Language (ExptML)
Eureka Research Workbench
5. A specification (written in XML) that describes different
types of information recorded during the scientific process
(http://exptml.sourceforge.net)
Experiment Markup Language (ExptML)
Sample
Solution
Space
Specimen
Substance
Task
Template
Timeline
User
Vendor
Annotation
Api
Calculation
Chemical
Citation
Customer
Data
Dataset
Definition
Element
Equipment
Event
Experiment
Group
Message
Project
Protocol
Quote
Report
Result
9. To allow ExptML to capture a scientific workflow, an ontology
is needed to represent the structure
Needs to be
Flexible – able to be used in a wide variety of areas
Logical – the links make sense in the context of science
Searchable – so we can find research done in a similar way
Comprehensive! This is the BIG problem
Many existing ontologies
Linking ExptML Files
10. In computer science and ontology
“formally represents knowledge as a set of concepts within
a domain, and the relationships between those concepts. It
can be used to model a domain and support reasoning about
concepts.”*
In essence, an ontology allows us to define the
relationships and assertions about concepts
For samples represented in ExptML we define
isSample (assertion)
hasSample (relationship)
isSampleOf (relationship)
ExptML Ontology
*https://en.wikipedia.org/wiki/Ontology_(information_science)
12. XML is nice for storage, archiving and transmitting
information…
…but it is not so easy to use in software
Many XML readers but each have their own syntax
Can be cumbersome to deal in software with
File size (XML is verbose)
Namespaces
Data types (e.g. string, decimal, etc…)
So the solution is…
Developments in ExptML
13. JSONize it!
Compact string representation of arrays of data
Used in AJAX requests in web browsers
Javascript Object Notation (JSON)
{
“exptmlid”: “exptml:ann1”,
“anntype”: “comment”,
“text”: “Had to wait for the biochemistry lab
to finish using the spectrophotometer before the I
could get on it. The standards sat around for 1 hr
30 minutes before I could run them.”,
“date”: “2011-11-25T11:05:17-04:00”
}
<annotation id="exptml_ann1" xmlns="urn:exptml:schema:draft:0.4"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:exptml:schema:draft:0.4
http://exptml.sourceforge.net/files/schema/exptml_annotation.xsd"
version="0.4">
<anntype>comment</anntype>
<text>Had to wait for the biochemistry lab to finish using
the spectrophotometer before the I could get on it. The standards
sat around for 1 hr 30 minutes before I could run them.</text>
<date>2011-11-25T11:05:17-04:00</date>
</annotation>
14. JSON-based Serialization for Linked Data
Current W3C recommendation*
Allows us to define a specification for the JSON data
“@content” is equivalent to an XML Schema
JSON-LD
*http://www.w3.org/TR/json-ld
{
“@context”:
{
“exptmlid”: “http://www.w3.org/2001/XMLSchema#string”,
“anntype”: “http://www.w3.org/2001/XMLSchema#string”,
“text”: “http://www.w3.org/2001/XMLSchema#string”,
“date”: “http://www.w3.org/2001/XMLSchema#dateTime”
}
}
16. @id represents an Internationalized Resource Identifier (IRI)
The IRI identifies a node and allows this data to be linked
JSON-LD
{
“@context”: “http://exptld.org/annotation.jsonld”
“@id”: “https://eureka.coas.unf.edu/exptml:ann1”,
“anntype”: “comment”,
“text”: “Had to wait for the biochemistry lab to finish
using the spectrophotometer before the I could get on it. The
standards sat around for 1 hr 30 minutes before I could run
them.”,
“date”: “2011-11-25T11:05:17-04:00”
}
17. Current the ontology defines generic relationships
Should be expanded to provide additional context
Developments in the Ontology
<rdf:Property rdf:ID="http://exptml.sourceforge.net/exptml_ontology.owl#hasSolution">
<rdfs:label>has solution</rdfs:label>
<rdfs:comment>Indicates that an experiment makes use of a particular
solution</rdfs:comment>
<rdfs:subPropertyOf rdf:resource="http://exptml.sourceforge.net/exptml_ontology.owl#rels"/>
</rdf:Property>
<rdf:Property rdf:ID="http://exptml.sourceforge.net/exptml_ontology.owl#hasBuffer">
<rdfs:label>has buffer</rdfs:label>
<rdfs:comment>Indicates that an experiment makes use of a buffer (solution)</rdfs:comment>
<rdfs:subPropertyOf rdf:resource="http://exptml.sourceforge.net/exptml_ontology.owl#hasSolution"/>
</rdf:Property>
<rdf:Property rdf:ID="http://exptml.sourceforge.net/exptml_ontology.owl#hasReagent">
<rdfs:label>has reagent</rdfs:label>
<rdfs:comment>Indicates that an experiment makes use of a reagent (solution)</rdfs:comment>
<rdfs:subPropertyOf rdf:resource="http://exptml.sourceforge.net/exptml_ontology.owl#hasSolution"/>
</rdf:Property>
<rdf:Property rdf:ID="http://exptml.sourceforge.net/exptml_ontology.owl#hasCalibrationStandard">
<rdfs:label>has calibration standard</rdfs:label>
<rdfs:comment>Indicates that an experiment makes use of a calibration standard</rdfs:comment>
<rdfs:subPropertyOf rdf:resource="http://exptml.sourceforge.net/exptml_ontology.owl#hasSolution”/>
</rdf:Property>
18. BIG Problem!
Context is specific to the science and the scientist
How many sub-properties of “hasSolution” are needed?
Additional context is domain specific so…
… we need to integrate other related ontologies
Map “hasSolution” to predicates in other ontologies
Use VIVO to choose the ‘best’ domain specific ontology
Aggregate science ontologies? – requires software/time
Evaluate ElasticSearch (http://www.elasticsearch.org)
Expand the Ontology
19. JSON-LD is a concrete RDF syntax!*
JSON-LD can be converted to triples
Combine ML and Ontology?
*http://www.w3.org/TR/json-ld/#relationship-to-rdf
{
"@context": "http://exptld.org/annotation.jsonld",
"@id": "https://eureka.coas.unf.edu/exptml:ann1",
"anntype": "comment",
"text": "Had to wait for the biochemistry lab to finish using the
spectrophotometer before the I could get on it. The standards
sat around for 1 hr 30 minutes before I could run them.",
"date": "2011-11-25T11:05:17-04:00",
"hasUser": [
{ "@id": "https://eureka.coas.unf.edu/exptml:usr1” },
{ "@id": "https://eureka.coas.unf.edu/exptml:usr11”}
],
"hasExperiment": { "@id": "https://eureka.coas.unf.edu/exptml:exp1" }
}
20. Nice start - allows for conceptual evaluation of the approach
Needs work – “science cannot be described by one alone”
TODO
Integrate and aggregate existing ontologies
Work with ELN developers e.g. LabTrove and elnItemManifest*
Encourage ontology development in areas where gaps exist
e.g. Chemical Analysis
Contribute to standards development
e.g. Research Data Alliance (RDA) – http://rd-alliance.org
Conclusion
* “First steps towards semantic descriptions of electronic laboratory notebook records“,
S J Coles, J G Frey, C L Bird, R J Whitby and A E Day, J. Cheminformatics, 2013, 5:52 http://doi.dx.org/10.1186/1758-2946-5-52