The Evidence and Conclusion Ontology (ECO) describes types of evidence relevant to biological investigations. First developed in the early 2000s, ECO now consists of over 1700 defined classes and is used by a large, and growing, list of resources. ECO imports close to 1000 classes from the Ontology for Biomedical Investigations and the Gene Ontology for use in logical definitions. Historically, ECO terms have generally been categorized by either the biological context of the evidence (e.g. gene expression) or the technique used to generate the evidence (e.g. PCR-based evidence). The result is that sometimes terms that have related biological context are found under different unrelated nodes. To address this, we have been performing a rigorous review of the structure and logic of the branches of ECO. Working with additional input from collaborators through the issue tracker on GitHub, term labels, definitions, and relationships are being evaluated and updated. The goal of these changes is to increase the logical consistency of ECO, make it easier for users to find and understand terms, and allow for ECO to continue to grow and support its users. In addition to the structural review, we have been working with CollecTF to utilize ECO for automated text mining. To generate a curated corpus for this effort, we have been annotating ECO terms to sentences which contain evidence-based assertions about gene products, taxonomic entities, and sequence features. From this effort we have developed clearly-defined annotation guidelines that have been passed on to a team of undergraduates who are continuing the curation effort.
Annotations are limited to single sentences, or to two consecutive sentences, containing the evidence instance and assertion clause. The quality of the mapping to ECO
and the strength of the author’s assertion are also captured. ECO is freely available at http://evidenceontology.org/ and https://github.com/evidenceontology.
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
1. The Evidence and Conclusion Ontology systematically
describes scientific evidence types that support biological
assertions. ECO is structured around two root classes:
'evidence' and 'assertion method’. Terms describing types of
evidence are grouped under 'evidence’, while the 'assertion
method', provides a mechanism for recording if a particular
assertion was made by a human or in an automated fashion.
ECO supports >20 user groups with their annotation
efforts, e.g. UniProt-Gene Ontology Annotation1
(UniProt-
GOA) has >628 million evidence-linked GO annotations2.
ECO is released into the public domain under CC0 1.0
Universal (CC0 1.0) license.
James B. Munro1, Elizabeth T. Hobbs2, Suvarna Nadendla1*, Rebecca C. Tauber1*, Stephen
Goralski2, Ivan Erill2, Marcus C. Chibucos1, & Michelle Giglio1
1Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
2Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD
*Contact: email - rctauber@gmail.com; snadendla@som.umaryland.edu
Abstract: The Evidence and Conclusion Ontology (ECO) describes types of evidence relevant to biological investigations. First developed in the early 2000s, ECO now
consists of over 1700 defined classes and is used by a large, and growing, list of resources. ECO imports close to 1000 classes from the Ontology for Biomedical
Investigations and the Gene Ontology for use in logical definitions. Historically, ECO terms have generally been categorized by either the biological context of the
evidence (e.g. gene expression) or the technique used to generate the evidence (e.g. PCR-based evidence). The result is that sometimes terms that have related
biological context are found under different unrelated nodes. To address this, we have been performing a rigorous review of the structure and logic of the branches of
ECO. Working with additional input from collaborators through the issue tracker on GitHub, term labels, definitions, and relationships are being evaluated and updated.
The goal of these changes is to increase the logical consistency of ECO, make it easier for users to find and understand terms, and allow for ECO to continue to grow
and support its users. In addition to the structural review, we have been working with CollecTF to utilize ECO for automated text mining. To generate a curated corpus for
this effort, we have been annotating ECO terms to sentences which contain evidence-based assertions about gene products, taxonomic entities, and sequence features.
From this effort we have developed clearly-defined annotation guidelines that have been passed on to a team of undergraduates who are continuing the curation effort.
Annotations are limited to single sentences, or to two consecutive sentences, containing the evidence instance and assertion clause. The quality of the mapping to ECO
and the strength of the author’s assertion are also captured. ECO is freely available at http://evidenceontology.org/ and https://github.com/evidenceontology.
/evidenceontology
Thank you to our collaborators and various user groups for supporting
the growth of ECO.
Collaborations:
ECO is supported by the National Science
Foundation (NSF) Division of Biological
Infrastructure (DBI) under Award Number 1458400.
Find us at http://evidenceontology.org/
1. E.C. Dimmer, R.P. Huntley, Y. Alam-Faruque, T. Sawford, C. O'Donovan, M.J.
Martinet, … R. Apweiler. (2012). The UniProt-GO Annotation database in 2011.
Nucleic Acids Res., 40, D565–D570.
2. M.C. Chibucos, D.A. Siegele, J.C. Hu, M. Giglio (2017). The Evidence and Conclusion
Ontology (ECO): Supporting GO Annotations. Methods in Mol. Biol., 1446, 245-
259.
3. S. Kilic, E.R. White, D.M. Sagitova, J.P. Cornish, & I. Erill. (2014). CollecTF: A
database of experimentally validated transcription factor-binding sites in bacteria.
Nucleic Acids Res., 42, D156-D160.
4. The Gene Ontology Consortium. (2015). Gene Ontology Consortium: going forward.
Nucleic Acids Research, 43, D1049-D1056.
5. A. Bandrowski, R. Brinkman, M. Brochhausen, M.H. Brush, B. Bug, M.C. Chibucos. et
al. (2016). The Ontology for Biomedical Investigations, PLoS One, 11(4):e0154556.
6. L.M. Schriml, E. Mitraka, J. Munro, B. Tauber, M. Schor, L. Nickle, V. Felix, Li. Jeng,
C. Bearer. et al. Human Disease Ontology 2018 update: classification, content and
workflow expansion, Nucleic Acids Research, Volume 47, Issue D1, 08 January
2019, Pages D955–D962.
7. M.C. Chibuocos, A.E. Zweifel, J.C. Herrera, W. Meza, S. Eslamfam, P. Uetz, … M.G.
Giglio. (2014). An ontology for microbial phenotypes. BMC Microbiology, 14, 294.
8. Wikidata. https://www.wikidata.org/wiki/Wikidata:Main_Page
• Currently, there are 1760 terms in ECO. All the terms have
textual definitions.
• 1339 ECO terms have logical definitions. Of these, 186 have
logical definitions that link out to other vocabularies such as the
GO4 and the OBI5, 1147 terms have logical definitions linking the
class to an ECO assertion method, and 6 terms have logical
definitions linking to other internal class.
Future direction
• Continue to work with our collaborators.
• Collaboration with Confidence Information Ontology for
expanding the model of capturing confidence information.
The Human Disease Ontology6
, to
incorporate classes representing
definition sources.
The Ontology for Microbial Phenotypes7
,
to expand classes for phenotype
annotations.
The Ontology for Biomedical
Investigations5, to complete the
harmonization project.
The Gene Ontology4
, to continue
support representing evidence in
gene products annotations.
Wikidata8, to support annotations of
genes, proteins and diseases in it’s
structured data storage repository.
Increased Logical Consistency
Node Expansion
We have been working with CollecTF3
to utilize ECO for an automated text mining effort. As a part of this project, a
curated corpus of high quality experimental evidence annotations consisting of gene products, sequence feature,
phenotype, and taxonomy/phylogeny, etc. is generated from sentences in scientific articles. This corpus is used as
an annotated training set for building an automated text mining model.
Guidelines for annotation
Annotation process
Interactive Text Mining (Future plan)
Before
After
Inter-Annotator Agreement
Kappa Equation :
Ao = observed agreement; Ae = expected agreement
deprecated