1. SMART Protocols: SeMAntic
RepresenTation for
Experimental Protocols
Olga Giraldo
ogiraldo@fi.upm.es
Ontology engineering group (OEG)
Universidad Politécnica de Madrid
2. Agenda
• What is a lab protocol
• Motivation
• Our general research question
• Our assumption
• Our propose
• Preliminary results
• Future work
3. What is a lab protocol
• Laboratory protocols are like cooking recipes
• They have ingredients: reagents and sample
• They have appliances: equipment,
• They have a total time
• They have a list of instructions,
• They have critical steps.
• The laboratory protocols are “the how to do” an
experiment.
4. Some problems in lab protocols
some of them present
insufficient granularity,
the instructions can be
imprecise or ambiguous due to
the use of natural language.
• Incubate the
centrifuge tubes in a
water bath.
• Incubate the samples
for 5 min with gentle
shaking.
• Rinse DNA briefly in
1-2 ml of wash.
• Incubate at -20C
overnight.
5. Why do we need to formalize and extract information from
lab protocols?
Because we want a recommendation system…
• That matches protocols according to my situation, for
instance
• samples I have,
• availability of equipment, reagents, lab conditions
• expertise
We also want content based information retrieval
• Meaningful sentences, sample used, purpose of the
protocol, applicability, critical steps, etc. Also,
identification of instructions
• Find all protocols for DNA extraction that have been used in
Oryza sativa that are suitable for processing a large number of
samples with a low execution time.
Motivation
7. Our assumption
“Experimental protocols
are fundamental
information structures that
should support the
description of the
processes by means of
which results are
generated in experimental
research”
9. Methods to represent and extract information
• Ontology model representing lab protocols
• Gazetteer-based method: use existing lists of named
entities
Lists of proper nouns, which refer to real-life entities
• Rule-based approaches: write manual extraction rules
• Combination of the above
12. SMART Protocols - document
The Protocol as a document
sp:application of the protocol
sp:advantage of the protocol
sp:limitation of the protocol
sp:provenance of the protocol
sp:purpose of the protocol
sp:introduction section
sp:buffer list
sp:equipment and supplies list
sp:kit list
sp:primer list
sp:reagent list
sp:software list
sp:solution list
sp:materials section
exact:caution
sp:critical step
sp:hint
sp:pause point
sp:storage condition
sp:timing
sp:troubleshooting
sp:methods section
sp:experimental
protocol
iao:document iao:document part
iao:textual entity iao:data set
owl:subClassOf
ro:hasPart
ro:partOf
owl:subClassOf
owl:subClassOfowl:subClassOf
ro:hasPart
ro:hasPart
ro:hasPart
ro:partOf
ro:partOf
ro:partOf
owl:subClassOf owl:subClassOf
exact:alert message
owl:subClassOf
Rhetorical and structural components (e.g. introduction, materials, and methods);
Information like application of the protocol, advantages and limitations, list of reagents,
critical steps.
13. SMART Protocols - wf
sp:basic step of
DNA extraction
p-plan:Step
p-plan:Variable
sp:cell disruption
sp:plant tissue
Basic Steps of DNA Extraction
sp:DNA purification
obi:DNA extract
p-plan:hasInputVariable
p-plan:hasOutputVariable
p-plan:hasOutputVariable
owl:subClassOf
sp:digestion
reaction
sp:powdered tissue
owl:subClassOf owl:subClassOf
owl:subClassOf
p-plan:hasInputVariable
sp:digested
contaminant
p-plan:hasInputVariable
p-plan:hasOutputVariable
owl:subClassOfowl:subClassOfowl:subClassOfowl:subClassOf
bfo:isPrecededBy bfo:isPrecededBy
Representation of the workflow aspects in protocols
implicit order in the instructions, following the input output structure.
14. SMART Protocols documentation
• SMART Protocols ontology is available here:
• http://vocab.linkeddata.es/SMARTProtocols/
• Paper accepted in the Linked Science 2014
(LISC2014)
• Authors: Giraldo O, Garcia A, Corcho O
• Title: SMART Protocols: SeMAntic RepresenTation for
Experimental Protocols
• Collocated with the 13th International Semantic Web
Conference (ISWC2014)
• http://xurl.es/smart_protocols
16. Classification of the protocols
• Our corpus of protocols was classified according to the
purpose.
DNA / RNA extraction
DNA amplification
Electrophoresis or sequencing of nucleic acids
Genetic transformation
• Identification of instructions common to a type of protocol
Key steps in DNA extraction: cell disruption, digestion reaction and
DNA purification
≈ pasta recipe
Key steps of pasta recipe: boil water, add the pasta until it is cooked
and mix the sauce with the pasta.
17. Creation of gazetteer lists
Lists containing keywords to find occurrences of these key words in
the text.
22. Continue…
• Analysis of the protocols. Focus on the identification of
keywords and/or constructs in English –e.g. instructions,
actions.
• Writing rules.
• Executing, testing and debugging the rules.
23. Goal of the internship
To take advantage of the previous experiences
in the formalization of lab protocols and apply
them in a new OHSU-Elsevier project focused
on research data management systems.
And as I mentioned before an experimental protocol is a how to do an experiment. For this reason our assumption is that experimental protocols are…
What do we propose?
These set of methods to represent and extract intelligent information from laboratory protocols: the first one is an ontology model…
The use of gazetteer-based method, this is a list of entities or objects from lab protocols that we like to recovery.
The manual creation of rules,
And a combination of all of these methods.
which results we have obtained
The development of two ontology modules, one of them represent the metadata to report a laboratory protocol and the another module represent the protocol as a executable element.
The ontologies are available here and recently were accepted a paper in the workshop linked science 2014 where is describing the ontology design.
So far, we have covered a way about how to report formally a lab protocol.
Now, we start describing the methods used to extract linguistic patterns from the lab protocols
The first step in this stage, was the classification of lab protocols according to their propose. In our corpus we identify 4 types of protocols: protocols designed to extract nucleic acids, protocols designed to DNA amplification, protocols designed to electrophoresis or sequencing and protocols designed to genetic transformation.
Then were created rules to find occurrences of basic steps from lab protocols. Here there are an example: this rule was created to annotate an instruction associated to the digestion reaction. This rule describe a quantity used of a reagent that participate only in the digestion reaction stage. The annotation continue until the nearest period.