Muthukumarasamy Karthikeyan (CSIR-National Chemical Laboratory, India)
Surojit Sadhu (Advent Informatics, India)
Virtual screening (VS) and chemical data extracted from evidence based sources are the backbone of computational drug discovery workflow, an indispensable component in all drug design programs. It involves a host of modelling techniques from simple similarity search methods to advanced algorithms for finding the accurate bioactive conformation of a molecule to bind to its corresponding target. Chemoinformatics supports virtual screening at multiple levels during the lead optimization stage by suggesting suitable filters for numerous screenings by utilizing the power of data integration from multiple sources and derived knowledge that is essential for decision support in drug discovery and development. It is therefore pertinent to develop tools, data and emerging methods in chemoinformatics to fully understand their role and applications in virtual screening. Recently we have developed chemical informatics tools to assist drug discovery by chemical data extraction from literature, virtual library design, analysis and screening methods on selected case studies.
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
II-PIC 2017: Drug Discovery of Novel Molecules using Chemical Data Mining tool
1. Muthukumarasamy Karthikeyan, Ph.D
CSIR-National Chemical Laboratory.
Chem. Engg. and Proc. Devp. Div. & Digital Inf. Res. Centre, Pune – 411 008, INDIA
November 2-3, 2017 – The International Indian Patent Information Conference, Bangalore, India
Drug Discovery of Novel Molecules
using Chemical Data Mining tool
2. Outline
• Chemoinformatics methods to harvest chemical data from textual / images of
published literature (Journal, Patents and other scientific reports)
• 50 + years of Data, Tools and Methods
• 30+ million publications (index by PubMed, CAS etc)
• 10 million patents (Source for Chemical Data-mining for Drug Discovery)
• Drug Discovery & Pharmaceuticals (NCEs, Side Effects & Risk , Toxicity
Assessment)
3. Chemoinformatics for Drug Design
• Identification of new lead structures
• Optimization of lead structures
• Establishment of quantitative structure-activity relationships
• Comparison of chemical libraries
• Definition and analysis of structural diversity
• Planning of chemical libraries
• Analysis of high-throughput data
• Docking of a ligand into a receptor
• de novo design of ligands
• Modeling of ADME-Tox properties
• Prediction of the metabolism of xenobiotics
• Analysis of biochemical pathways
4. Powerful Molecular Mining Tools
• So far chemistry has produced an enormous amount of
data and is rapidly increasing.
• More than 70+ million chemical compounds are known
and this number is increasing by several millions each
year.
5. Chemical Structure Representation & Challenges
• Machine readable chemical structure representations
• Building connection tables that represent molecules
• Building databases of chemical structures and reactions.
• Challenges: Search (Exact, Similar, Substructure,
Superstructure)
• Markush Representation of molecules for patent
applications
• Mapping the chemical-biological space
6. Chemical Mining Relevant Data for Drug Discovery
SPC : Structure Property Correlation
INTRINSIC PROPERTIES
MolarVolume
Connectivity Indices
Charge Distribution
MolecularWeight
Polar surfaceArea
CHEMICAL PROPERTIES
pKa
Log P
Solubility
Stability
BIOLOGICAL PROPERTIES
Activity
Toxicity
Biotransformation
Pharmacokinetics
Others
Synthesis & Processes
Spectral / Characterization
7. Molecular Descriptors (2D & 3D)
• Physical, Chemical, or Biological properties cannot be
directly calculated from the structure of a compound.
• Representing the structure of the compound by structure
descriptors or ‘Features’, and, then, to establish a
relationship (Models, Equations..) between the structure
descriptors
• A variety of structure descriptors has been developed
encoding 1D, 2D, or 3D structure information ..(Practical
Chemoinformatics)
11. Textmining and Datamining using Chemoinformatics
Text mining has every potential to transform
the way we look at data.
It converts unstructured data to structured
form
Helps detect patterns, hidden connections
from chemistry specific data.
Automation can address data explosion
problem by collecting and consolidating it..
12. Textmining Workflow
Tokenization Stemming
POS
Tagging
Named
Entity
Recognition
Remove non-scientific
terms
STOPWORDS
PUNCTUATIONS
SENTENCE
BOUNDARY
TOKEN DETECTION
Finding root of
word
REDUCTION IN
WORD SPACE
DIMENSION
Best Part Of
Speech is
determined
Often CONTEXT
DEPENDENT
Recognition of
entities
e.g. PROTEIN,
GENES, DRUGS
Methods: Statistical
or Dictionary based
13. Case study on Chemical Data-mining Literature
66,000
articles
Common Drugs Proteins Genes
specific
classes
of terms
Protein
Gene
Disease
Drugs
Binding DB PharmGKB CTD
Co-occurence
based
Connections
Corpus building
10100010100010101010
10100010100010101010
10100010100010101010
10100010100010101010
Document fingerprints
using corpus
Document clustering
using ED and TC
Chemical Text Mining
35. • OpenBabel
• CDK
• RDKit
• C++
• JAVA
• Python
• Perl
• Ruby
ISBN 978-81-322-1780-0
Open source Chemoinformatics Tools
Chemoinformatics for Drug Discovery.-
36. • Open Source Tools, Techniques and Data in Chemoinformatics.-
• Chemoinformatics Approach for the Design and Screening of focused
virtual libraries.-
• Machine Learning Methods in Chemoinformatics for Drug Discovery.-
• Docking and pharmacophore modeling for virtual screening.-
• Active site directed pose prediction programs for efficient filtering of
molecules.-
• Representation, fingerprinting and modeling of chemical reactions.-
• Predictive methods for Organic Spectral data Simulation.-
• Chemical Text mining for Lead Discovery.- Integration of Automated
Work flow in Chemoinformatics for drug discovery.-
• Cloud computing Infrastructure for Chemoinformatics.
36
Practical Cheminformatics (Do It Yourself)
39. Questions?
karthincl@gmail.com (send your queries)
+91 020-2590 2483 / +91 9767427981 (whatsapp)
skype: karthincl
http://moltable.ncl.res.in/
(pl. visit the link for updates on recent publications
and patents!)