08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Chemical ontologies: what are they, what are they for, and what are the challenges
1. EBI is an Outstation of the European Molecular Biology Laboratory.
Chemical Ontologies
What are they?
What are they for?
What are the challenges?
Janna Hastings, EBI Chemoinformatics and Metabolism
6th
German Conference on Chemoinformatics,
Goslar, 8 November 2010
2. Problem
How do we find
the information
we need?
Data deluge
Multiple databases, heterogeneous data
Ambiguity, multiple synonyms
J. Hastings Chemical Ontology30.01.152
Data lost in
3. J. Hastings Chemical Ontology30.01.153
Intelligent systems
The answer
is
42
I’ll show you
why
4. Logical inference
All men are mortal
Socrates is a man
Therefore, Socrates is mortal
J. Hastings Chemical Ontology30.01.154
finding the implications of what you know
5. J. Hastings Chemical Ontology30.01.155
Community terminological standardisation
Dictionary: synonyms, definitions
Hierarchical organisation
Logical model allowing computer inferences
beyond what is explicitly encoded
Knowledge-based applications
6. Ontologies to filter and organise data
J. Hastings Chemical Ontology30.01.156
The Web Ontology Language (OWL)
Hierarchical
organisation
Synonyms
Cross-references
Logical
definitions
Can be re-used in
multiple applications
root
leaves
7. J. Hastings Chemical Ontology30.01.157
ChEBI Ontology
Chemical entity Role
catecholamines
Biological role
Application
hormone
vasodilator agent
(R)-adrenaline
CHEMINF Ontology
DescriptorSoftware library
ACD Labs
logP
-.539
-2.369
logD
8. J. Hastings Chemical Ontology30.01.158
Chemical entity
carboxylic acid
acetylsalicylic acid
(aspirin)
chlorfenvinfos
organophosphorous
compound
aldehyde
organic molecular entity
inorganic molecular entity
pyridoxal
(vitamin B6)
sodium chloride
Molecular entity
Group hydroxy
group
Chemical substance
9. J. Hastings Chemical Ontology30.01.159
Role
analgesic
acetylsalicylic acid
(aspirin) chlorfenvinfos
insecticide
vitamin
pyridoxal
(vitamin B6)
Biological role Application
drug
pesticide
Chemical role
acid
sulfuric acid
10. J. Hastings Chemical Ontology30.01.1510
Chemical information entity
DescriptorSoftware library
CDK
logP
OpenBabel
Algorithm
Molecular
Descriptor
implements calculates
Substance
Descriptor
atom count
boiling point
melting point
largest chain
fused cycles
Hueckel’s
aromaticity
11. J. Hastings Chemical Ontology30.01.1511
Chemical
database Bioactivity
database
Bioactivity
database Metabolism
database
Metabolism
database
Pathway
database
LiteratureLiterature
Chemical entities
Roles
Properties
Unified browsing and querying
13. Chemicals and roles
J. Hastings Chemical Ontology30.01.1513
de Matos, P. et al: Chemical Entities of Biological Interest: an update. NAR Database issue 2010
vitamin
hormone
neurotransmitter
CNS stimulant
carboxylic acid
peptide
trimethylxanthine
polycyclic cage
has role
14. Chemicals and structures
J. Hastings Chemical Ontology30.01.1514
J. Hastings, C. Batchelor, C. Steinbeck, S. Schulz: What are chemical structures and their relations? FOIS 2010
chemical entity
molecule
chemical graph
molecular structure
has attribute
What is the
structure of
Vancomycin?
15. Representing complex structures
J. Hastings Chemical Ontology30.01.1515
Chemical classes can be defined by
parts of structures
and/or
properties of structures
carboxylic acid
cyclic molecule
if molecule has part some carboxy group
if molecule has property cyclic, i.e. a self-connected
cyclic path exists through the molecule’s atoms
16. J. Hastings Chemical Ontology30.01.1516
Pre-compute
and assert
all parts and
properties
Represent
atoms and
bonds in
ontology
Integration of chemoinformatics and ontology toolsIntegration of chemoinformatics and ontology tools
J. Hastings et al.: Representing chemicals using OWL, description graphs and rules. OWLED 2010
17. Purpose and mode of action
J. Hastings Chemical Ontology30.01.1517
epinastine
application
antiallergic drug
is a
biological role
histamine antagonist
is a
has rolehas role
C. Batchelor, J. Hastings, C. Steinbeck: Ontological dependence, dispositions and institutional reality in chemistry.
FOIS 2010
Single molecule
Independent of
intent
Bulk quantity of molecules
Depends on human intent
(e.g. license, prescription)
18. J. Hastings Chemical Ontology30.01.1518
Conditions in bioactivity models
Consider aspirin as treatment for a headache
Too few individual molecules will have no effect
Too many tablets will have unpleasant additional effects
Image credit: tell.fll.purdue.edu
J. Hastings, C. Steinbeck, L. Jansen, S. Schulz: Substance concentrations as conditions for the realization
of dispositions. ISMB Bio-Ontologies SIG 2010
19. J. Hastings Chemical Ontology30.01.1519
Christoph Steinbeck
Paula de Matos
Marcus Ennis
Steve Turner
Adriano Dekker
Kenneth Haug
Rafael Alcántara Martin
Zara Josephs
Pablo Moreno
Kalai Jayaseelan
Mark Rijnbeek
Nico Adams
Colin Batchelor, RSC
Stefan Schulz, Freiburg
Egon Willighagen, Uppsala
Michel Dumontier, Carleton
Leonid Chepelev, Carleton
Notes de l'éditeur
Researchers in the increasingly data-overridden scientific domains face ever-growing difficulty in working their way through the mounds of data spread across different resources, interfaces, languages and databases
We need more and more use of computational tools to intervene between the mountains of distributed, heterogeneous data. We need annotations to shared, controlled IDs, in order to harmonise data across different heterogeneous sources.
The human mind is an amazing thing: most people are able to correctly answer very quickly when asked the following questions: Are there any footprints on the moon? (YES) Are there any purple dogs on the moon? (NO) (nor bats, nor dinosaurs, nor trees...)
How do they do that? They are not taught itineraries of what things are on the moon in high school. Rather they are taught the simple fact that there is no life on the moon at all. From this fact they are able to infer that there are no purple dogs on the moon, because purple dogs are a kind of life form.
What is an ontology? It is at least all of these things: a community-wide standardised terminology and dictionary of terms in a particular domain; a hierarchically organised map of entities in the domain; a logical model which allows compact representation but logical inference to additional implications; and a tool which supports multiple, knowledge-based applications.
Ontologies are organised hierarchically from a very general root term to the most specialised leaf terms (utility: grouping items at different levels)
They gather together synonyms and other metadata (utility: ‘glue’ for data integration)
They provide logical definitions to allow automatic inferences thus providing a compact storage mechanism (utility: automated reasoning and query answering)
They therefore provide a sophisticated searching and organising medium for multiple applications
And there is one standard (OWL) format for ontology development which is supported by many tools and resources
This slide illustrates our chemical ontologies (currently in development at the EBI and with collaborators)
Software libraries implement algorithmsAlgorithms calculate descriptors
Descriptors are about chemical entities of various sorts (molecules, substances, atoms...)
Now, because you have a single ontology on top of multiple annotations across several databases (a standard), you can perform cross-database querying for data related to the same thing. But that’s not all – not only can you query across several databases, but your query is semantic – it *knows* that leukemia is a kind of cancer, and you don’t have to implement a custom search solution in each database capable of inferring this, because the hierarchy and the synonyms lives outside of any one database – in the community-wide shared ontology. Image: different databases, literature resources. Organising ontology: semantic searching, multi-level aggregation.
What are the challenges?
Many chemical classification systems do not differentiate between structure-based and role-based classification systems (e.g. MeSH). They therefore say that caffeine IS A `cns stimulant’ in exactly the same way that they say caffeine IS A ‘trimethylxanthine’. Humans can distinguish between these two types of classification and make correct inferences, but it leads to invalid inferences when computers are asked to reason over the classification, since the terms on the left share structural features while those on the right do not; the terms on the left are ‘timeless, condition-less’ properties of the chemical entities while the terms on the right describe context-specific behaviour of chemical entities. We therefore separated the structure-based and role-based classifications and introduced the has-role cross-ontology relationship.
A term such as `antibiotic’ is ambiguous in sense between meaning an <activity> (role) and a particular chemical entity which may have that activity.
In common language (particularly in the realm of databases), chemical ‘structure’ and chemical ‘entity’ are referred to synonymously. For example the GDB database refers to its total size in terms of ‘organic structures’ while calling itself a database of ‘molecules’. However, it is crucial to differentiate these senses in classification, since it is possible to have a chemical entity and not know its structure, or be mistaken about its structure (e.g. vancomycin).
If you pre-compute all parts of a molecule and all properties, you can make ontology definitions for classes which use those properties BUT your ontology becomes very, very large in asserted parts/propertiesBetter is if, at least for simple properties and parts, the minimal information needed to deduce the relationship can be included in the ontology itself
Research in our group is investigating the applicability of the new ontology extension description graphs for addition of elements of chemical structures to the ontology to allow structure-based classification to be more automated in easier cases.
Difficulty is that this appears to be reinventing a wheel that has already been well invented by the cheminformatics community, and our challenge moving forward is to bring in the cheminformatics libraries and toolkits and integrate them with the ontology ones.
One of the challenges which we are investigating is to accurately include in the ontology model the relevant conditions under which bioactivity holds. These conditions might be concentrations of the active substance in the organism, or the organism itself. These conditions are often THRESHOLD phenomena, that is, it is not sufficient to merely indicate a fixed border at which an effect starts to take place.