The "International Chemical Ontology Network" is established by major chemical industries in order to elaborate and understand the connections in the Big Data collections of chemistry / pharmacology. The collections come from the different natural sciences and human sciences and can only be explained or experienced in overlapping structures and connections. In this way, new innovative approaches are to be defined, new knowledge generated, products produced and marketed. Thus, the whole value creation process can be simulated and defined in the initial phase of a project, and it is possible to counteract unfavorable developments even at a very early stage. The clear structure and the incorporation of all relevant and common rules will help to improve the understanding of overlapping structures in the research, development and production processes, which will lead to a considerable saving of costs and resources, Full innovation potential of the industry 4.0 approach.
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
II-SDV 2017: The "International Chemical Ontology Network"
1. INCOT.NET
René Deplanque | International Chemical Ontology Network
@The International Information Conference on Search, Data Mining and Visualization.
3. SUMMERY
• The final goal of this project is to build a system that is
available to all member of the network.
• It will be developed to make Big Data collections from various
fields of chemistry / pharmacology manageable using parent
ontologies.
• Thus the development of an innovative industry 4.0 approach
will be simplified and accelerated.
4. PAINPOINTS OF TODAY'S DATA COLLECTIONS:
Access Issues => Problems with finding and/or getting access to
data
Audience Issues => who is looking at data, how they perceive it,
perspectives, language of discipline
Chemical Structure Representation Issues
=>
what areas are problems - inorganic,
organometallic, large molecules, mixtures, chiral
centers
Community Issues => policies, procedures, and best practices we need to
adopt to move things forwards
Data Issues => standardization/interoperability, metadata, gaps,
scale, and sharing, dark data
Ontology/Vocabulary Issues => consensus on terms, maintenance, versions,
optimal vocabularies, areas where needed
Tools to Help Data/Metadata Capture
Issues =>
adding metadata, feedback, consistency,
synchronization
6. Source: Krallinger, M. et al. (2005) Text-mining approaches in molecular biology and biomedicine. DDT 10(6) 440
7. Ontology Defined
Google Definitions on the web
• An ontology is a controlled vocabulary that describes objects and the
relations between them in a formal way, and has a grammar for using the
vocabulary terms to express something meaningful within a specified domain
of interest. Source: members.optusnet.com.au/~webindexing/Webbook2Ed/glossary.htm
• Ontology is the newest label attached to some KOSs. Ontologies are being
developed as specific concept models by the Knowledge Management
community. They can represent complex relationships between objects, and
include the rules and axioms missing from semantic networks. Ontologies
that describe knowledge in a specific area are often connected with systems
for data mining and knowledge management.
Source: www.und.nodak.edu/dept/library/Departments/abc/SACSEM-SemInGlossary.htm
8. CREATING A COMPUTABLE CHEMICAL TAXONOMY
REQUIRES THREE KEY COMPONENTS:
A well-defined hierarchical taxonomic structure;
A dictionary of chemical classes (with full definitions
and category mappings); and
Computable rules or algorithms for assigning chemicals
to taxonomic categories.
9. Semantic Web
The Semantic Web "layer cake" as presented by Tim Berners-Lee.
Source: Hendler, J. (2001) Agents and the semantic web. http://www.cs.umd.edu/users/hendler/AgentWeb.html
10. KNOWN CLASSIFICATION SYSTEMS OF
CHEMICAL SUBSTANCES
Classification as defined by EU regulations
Regulation (EC) No 1272/2008 on classification, labelling and packaging of
substances and mixtures (the 'CLP Regulation').
Classification as defined by UBA (Germany’s environmental protection
agency)
These criteria and limiting values help to determine hazardous physical-chemical
properties as well as health and environmental hazards
The Anatomical Therapeutic Chemical Classification System (ATC/DDD of
the world health organisation WHO)
The purpose of the ATC/DDD system is to serve as a tool for drug utilization research
in order to improve quality of drug use.
11. GUIDANCE ON THE CLASSIFICATION OF HAZARDOUS CHEMICALS UNDER
THE WHS REGULATIONS
This Guidance is intended for manufacturers and importers of
substances, mixtures and articles who have a duty under the World
Health and Safety (WHS) Act and Regulations to classify them
the Globally Harmonised System of Classification and Labelling of
Chemicals (the GHS).
The WHS Regulations also implement the harmonised hazard
communication elements of the GHS that are to appear on labels and
safety data sheets (SDS)
The Chemical Fragmentation Coding system
It was developed in 1963 by the Derwent World Patent Index (DWPI) to
facilitate the manual classification of chemical compounds reported in
patents.
The system consists of 2200 numerical codes corresponding to a set of
pre-defined, chemically significant structure fragments
12. Tools for developing Chemical Ontologies
HOSE (Hierarchical Organisation of Spherical Environments) code.
This hierarchical substructure system, allows one to automatically
characterize atoms and complete rings in terms of their spherical
environment
Gene Ontology (GO) system,
was one of the first open-source, automated functional group ontologies
to be formalized.
CO functional groups can be automatically assigned to a given structure
by Checkmol a freely available program. CO’s assignment of functional
groups is accurate and consistent, and it has been applied to several
small datasets. However,
the CO system is limited to just ~200 chemical groups
SODIAC tool for automatic compound classification.
It uses a comprehensive chemical ontology and an elegant structure-
based reasoning logic.
The underlying chemical ontology can be freely downloaded and the
SODIAC software, which is closed-source, is free for academics
13. WHAT ARE THE MAJOR PROBLEMS
➢ In contrast to biology, geology, and many other scientific
disciplines, the world of chemistry still lacks a standardized
chemical ontology or taxonomy
➢ The chemical classification of a compound could help predict its
metabolic fate in humans, its drug ability or potential hazards
associated with it.
➢ The sheer number (tens of millions of compounds) and complexity
of chemical structures is such that any manual classification effort
would prove to be near impossible
14. two-ring heterocyclic compounds
isoquinolines
isoquinoline alkaloids
morphinans
morphine
grouped_by_chemistry
FRAGMENT OF CHEMICAL ONTOLOGY
molecules
organic molecules
heterocyclic compounds
bridged-ring heterocyclic compounds
morphinans
morphine
IsA
O
N
OH
OH
CH3
H
NH
H
morphine
morphinan
IsA
Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology.
SOFG-2, Philadelphia, October 23-26 2004
21. WHAT DO WE HAVE - WHAT DO WE NEED
➢Chemists have a standardized nomenclature (IUPAC, CAS,
REAXIS)
➢Chemists have standardized methods for drawing or
exchanging chemical structures
➢Chemistry still lacks a standardized, comprehensive, and
clearly defined chemical taxonomy or chemical ontology
22. WHAT WAS DONE
➢ Chemist have developed domain specific ontologies
➢ Medical Chemist classify according to pharmaceutical
activities (antibacterial antihypertensive)
➢ Biochemist classify according biosynthetic origin
(nucleic acids, terpenoids)
➢ They do not fit
➢ In the PubChem database only 0.12% of the >91,000,000 compounds (as
of June 2016) are classified via the MeSH thesaurus
23. WHO AND WHAT IS INCOT.NET
The Problem of defining overlapping Ontologies is of such
a magnitude that it can not be solved by a single
Organisation.
INCOT.NET is an organisation based on an idea, need and
interest of major Chemical Companies.
It is organized as independent Partnership
It is attempting to coordinate a large variety of
Organisations to solve major pre-production problems.
One of the prototype problems will be: The use of
Ontologies in the development of new methodologies for
the development of new Antibiotics.
24. Thank you for your patience
you will need it for your future