The document discusses open source drug discovery (OSDD) for neglected tropical diseases like tuberculosis. Key points:
- OSDD takes a collaborative, open innovation approach to drug discovery by involving research groups, industry, and individual participants in open data sharing.
- Their first disease target is tuberculosis, which infects over 1 million people per year and kills over 1,000 people per day.
- OSDD has built computational resources and databases with community participation to facilitate drug discovery. They have also integrated over 300 tools into their ChemBio toolkit.
- OSDD utilizes grid computing resources like the Garuda Grid in India to enable complex computational analysis for experimental biologists and chemists working on the
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Indo us 2012
1. Open Source Drug Discovery
CSIR-led Team India Consortium with Global Partnership
Affordable Healthcare for All
Cheminformatics and Open Source Drug Discovery: a
case study in academic collaboration between the
U.S. and India
Abhik Seal
Phd Student Indiana University)
(Researcher OSDD CSIR)
Anshu Bhardwaj
Scientist, OSDD Unit
Council of Scientific & Industrial Research
Delhi, India
http://www.osdd.net 23rd March 2012, Washington DC
2. OSDD Focus :
Tropical Neglected Diseases
First Disease Target : Tuberculosis
Tuberculosis (TB) is one of leading causes of fatality, ranking second only to HIV as
the killer infectious disease of adults worldwide.
New TB cases 2010
At least one person in
the world is newly
infected with TB bacilli
every second
Over 1000 deaths a day or
3 deaths every 2 mins
Source:
http://www.globalhealthfacts.org/data/topic/map.aspx?ind=12
3. Countries that had reported at least
one XDR-TB case by end March 2011
Argentina Bhutan France Japan Namibia Republic of Korea Thailand
Armenia Cambodia Georgia Kazakhstan Nepal Republic of Moldova Togo
Australia Canada Germany Kenya Netherlands Romania Tunisia
Austria Chile Greece Kyrgyzstan Norway Russian Federation Ukraine
Azerbaijan China India Latvia Pakistan Slovenia United Arab Emirates
Bangladesh Colombia Indonesia Lesotho Peru South Africa United Kingdom
Belgium Czech Republic Iran (Islamic Rep. of) Lithuania Philippines Spain United States of America
Botswana Ecuador Ireland Mexico Poland Swaziland Uzbekistan
Brazil Egypt Israel Mozambique Portugal Sweden Viet Nam
Burkina Faso Estonia Italy Myanmar Qatar Tajikistan
5. World TB Day is 24th March 2012
It commemorates the discovery of TB
bacillus (Mycobacterium tuberculosis)
through sputum microscopy which is
still the diagnostics used to detect TB!
No progress whatsoever, and we are
discussing 'network communications'
6. Challenges with Drug Discovery
of Neglected Diseases
• Lack of market incentives
• TB is a complex disease – latency, relapse, resistance
• Clinical trials take a long time & study of relapse
needs long follow up (up to 18months)
• Patient access is not direct, is through government
agencies
7. Conventional vs Open Innovation Approach to Drug Discovery
…
Corporate R&D
R&D Diabetics
Cancer HQ R&D
Neurological
…
Disorder
Sales
Production
Packaging Pre-Clinical
Formulation Trial
Clinical Trial
…
8. Conventional vs Open Innovation Approach to Drug Discovery
Research groups
Industry collaboration
Individual participation
Open Data Sharing
9. OSDD Process Flow
Clinical
trials
Public Funding of
Clinical Trials
Government of India commitment - $46 million
10. Status: OSDD Projects
Chemical Screening/ Hit Clinical
Drug Target Virtual
Synthesis Hit to Trials
Identification Screening
/library identification Lead Candidate
45
Other projects aim to
develop tools, databases
and repositories for the
19 OSDD community
9
6
2
1
September 2008…………………………………………………………………March 2012
11. OSDD Platform
System Architecture
Collaborative tools to accelerate neglected diseases research” in the book “Collaborative
Computational Technologies for Biomedical Research”. Wiley and Sons. 2011
12. Post-genomics data on Mtb is ‘Linked’
from disparate resources
More than a Million Data
Points are now “Linked”
Pathway/ Gene/operon
Networks predictions
Gene
Drug targets
Mtb Expression
Data
Regulatory
Orthologs
Elements
Variation and
repeats
* This is representative set of post-genomics data available on TB
Collaborator:
Deeksha Bhartiya Nitin Kumar
Dr. Vinod Scaria
13. Comparison of Browsers
s.no. Source Tracks
UCSC Genome Browser on Mycobacterium
1 6
tuberculosis H37Rv 06/20/1998 Assembly
2 WebTb Operon Map
3 Argo Genome Browser not web based
4 PGBrowser: Pathogen Genome Browser 3
5 BioHealthBase 16
6 Ensembl ~15
7 Tbrowse 100
14. DeekshaBhartiya
Deeksha Bhartiya Nitin Kumar
OpenLabNoteBook on SysBorgTB
http://sysborgtb.osdd.net/bin/view/OpenLabNotebook/TBMapDataset
15. Biology is complex !!
From a mathematical point
of view, to create an
accurate model of a single
mammalian cell may require
generating and then solving
somewhere between
100,000 to one million
equations
The human brain can only process
Need automation & new seven pieces of data at a time!!!
technology to address the
complexity
http://news.vanderbilt.edu/2011/10/robot-biologist/
16. The “Connect to Decode” Programme
OSDD C2D Collaborative
Community Curation
Literature 800+ Student
Researchers Curated
Annotations
Annotation
Tools
Raw
Annotations
Genomic
Databases
Pathway/Interactome | Gene Ontology | Protein
Structure/Fold | Glycomics| Immunome
17. Working on the cloud..
Online
discussion
Right Wrong
(mark in green) (mark in red)
Many eye balls, make
Community Curation!!
the ‘bug’ shallow!!!
18.
19. Mtb Metabolome Map on Payao
Sub-map of the metabolic network
on Payao
SBI developed
customized plug ins for
OSDD for generating
the metabolic map
23. Within weeks, 830 volunteered to re-annotate the entire M.
tuberculosis genome. The work started in December 2009 and
was completed by April 2010, packing nearly 300 man-years into
4 months!
Source: Munos B. Can Open-Source Drug R&D
Repower Pharmaceutical Innovation?
Clin Pharmacol Ther 2010;87:534–536
Social engineering for
virtual 'big science' in
systems biology
Source: Hiroaki Kitano
Nature Chemical Biology 7, 323–326 (2011)
25. Large student community from colleges and university are
Cloning, Expressing and Purifying selected Mtb genes
To clone and express select genes
of Mycobacterium tuberculosis
Open Access Repository of Mtb
clones
More than 120 sequence
confirmed clones are
ready for distribution
http://sysborg2.osdd.net/group/sysborgtb/project-
details/-/projects/show/3212
26. OSDDChem: Open Chemistry Initiative
A Large number of
molecules are being
submitted for screening
27. Computational Resources developed
with Community participation
http://tbrowse.osdd.net http://sysborg2.osdd.net
Bhardwaj et al. Tuberculosis (Edinb). Bhardwaj et al. 2011 John Wiley & Sons, Inc.
2009 Sep;89(5):386-7
Chembio Toolkit TrapTB
Workflow engine with federated resources Mtb drug targets database
AmPhyDB
Mtb essential genes database Antimycobacterial Phytomolecule Database
A Comprehensive database of Mtb transporters Mtb-Human Interaction Database
28. Enabling Complex Computational Analysis
For Experimental Biologists/Chemists
Q. Find novel genes and mutations & map known drug resistance mutations
on genome of an MDR-TB strain
29. Galaxy provides -
Simplified GUI design
Ease of integrating modules
Fewer components for creating workflows
Sharable workflows for better collaboration
30. Custom APIs for importing input files
from OSDD’s open lab note books
Get data customized for extracting
files from open lab note book
31. Custom APIs for exporting results to
OSDD’s Open lab note book
Workflows and the result of the workflows are stored as separate lab note books
Lab note book has details of the experiments performed
Results of one experiment may be invoked for analysis in another experiment
All versions of the workflow and the results are stored
Flexibility to execute nested workflows
32. Our Approach :
Data & Tool integration
In addition to access heterogeneous sources of data like BioMart
Central/UCSC Table Browser (http://genome.ucsc.edu/), Open lab note
book of http://sysborg2.osdd.net is interfaced with Galaxy
Standalone databases and tools
Tools as web services:
• Web services can be added as tools in Galaxy
• Extends the potential of galaxy workflows
The process
Configure &
Identify the Search for Code for Write XML
Integrate to
module the WSDL client for Galaxy
Galaxy
33. ChemBio toolkit :
>300 Modules integrated by OSDD Community
S. No Resources Clients
1 KEGG: Kyoto Encyclopedia of Genes and Genomes 60
2 GetEntry: DDBJ sequence search by accessionID 43
3 GPSR : tools 33
4 PDB : Protein Data Bank 30
5 BioModel:mathematical models of biological DB 25
6 Gtps : Gene Trek in Prokaryote Space 8
WSDbfetch: retrieve entries from biological dbs using
7 7
entry identifiers or accession no.
8 Gibv: Genome Information Broker for Viruses 7
9 DDBJ :DNA Data bank of Japan 7
10 Mafft: a multiple sequence alignment program 4
11 Fasta:- DDBJ database 4
12 Ensembl : maintains automatic annotation 4
13 VecScreen vector contamination 4
14 OMIM:Online Mendelian Inheritance in man 4
15 Gtop: Gene-product Informatics 3
16 GO: Gene Ontology 3
17 SPS : Splicing Profile based Score 2
18 GIBIS: Genome Information Broker for Insertion Sequence 1
19 RefSeq: database of sequence 1
20 GIB: Genome Information Broker 1
21 GIBEnv- DDBJ database 1
22 TxSearch: Database indexing & searching 1
35. Data amplification: Cheminformatics
Pubchem
Bioassay data
(approx.
100,000
molecules/
dataset
Successful Screen Potential
PubChem
Models
(30 million) Hits
6000
descriptors
/molecule
o Down sizing and random validation require multiple calculation for validation of results
o Cross validation up to 50+ time for each experiment
36. C-DAC’s Garuda Grid –
Indian Grid Computing Initiative
C-DAC is R&D organization under Ministry of
Communication & Information
Technology, India
C-DAC’s Garuda Grid is targeted at providing
a facility for the scientific community,
which would enable them to seamlessly
access the distributed resources.
Compute Power of GARUDA: ~ 70TFs (6000
CPUs)
Currently there are 55 Garuda Partners
Has NKN (National Knowledge
Network) connectivity at 10Gbps
37. Features:
Customized Galaxy on GARUDA
• Integrated with Grid Authentication mechanism - Indian Grid Certificate
Authority (IGCA)
• Integrated with Gridway Metascheduler - Job scheduling and
management
• Integrated OSDD tools - Weka (for data mining) and Autodock (Virtual
screening).
• Provided support to upload multiple input files as tar file
• Data libraries of OSDD community are uploaded and are shared by all
users
• Integrated with PostgreSQL
38.
39.
40. Garuda- Galaxy Job Submission - Flow
Galaxy Job 2. Based on Tool, it
Galaxy GUI
Manager sends the job to the
correct runner.
Gridway
Job runner
3. Gridway job runner
Garuda-OSDD Server uses user’s Garuda proxy
file for job submission
1. User selects
tool and Input
parameters Internet
44. Customized Galaxy with applications as Web Services and
on the Grid for Open Source Drug Discovery (OSDD)
A CSIR led team India consortium with global partnership for affordable healthcare
Anshu Bhardwaj
Council of Scientific & Industrial Research (CSIR),
India
Chintalapati Janaki,
Center for Development of Advanced Computing (C-DAC),
India
www.osdd.net 25-26 May 2011
45. “In the long history of human mankind those who
have learned to collaborate and improvise most
effectively have prevailed.” --
Charles Darwin
46. Cheminformatics: a strong case for
community collaborative science
There is now an incredibly rich resource of public
information relating compounds, targets, genes, pathways,
and diseases. Just for starters there is in the public domain
information on:
~30 million compounds and ~500,000 bioassays (PubChem,
ChemSpider)
~60 million compound bioactivities (PubChem Bioassay)
~5,000 drugs (DrugBank)
~9 million protein sequences (SwissProt) and ~60,000 3D
structures (PDB)
~14 million human nucleotide sequences (EMBL)
~20 million life science publications (PubMED) Multitude of
other sets (drugs, toxicogenomics, chemogenomics,
metagenomics …)
47. Community Speaks: What excites
them about Cheminformatics
I have thus chosen ‘Cheminformatics’ to study the vast pool of chemical compounds much more
in details and analyze so as to narrow down to potential drug candidate. With the unique
combination of IT and Chemistry, I am confident that one can actually derive much more
meaningful information of a chemical entity on this earth. Rajdeep (BioIT)
I am organic chemist. I prepared several organic molecules.We go for biological activity,
maximum times it gives negative result. But with help of informatics in chemistry we can
predict molecular properties. We can replace many ligands or substituents or functional
group easily. And we can design our desirable molecule. ---Chirupulo
I am doing my M.Pharm in pharmaceutical chemistry,and i like cheminformatics that i need
accurate results but soon....and i am really interested in molecular modelling...so I am here.
--- Haffy manaf
Cheminformatics deals with information about chems. It combines tools and techniques of IT
for information about chemical entities at the finger tip on click of a mouse. Databases are
available for properties of descriptors. Softwares help to calculate molecular
properties. Cheminformatics thus come handy tool for learning chemistry.------ Dr Keshav
Mohan
48. Challenges in implementation of
Cheminformatics projects
• Access to Journals for Chemical Structures
• Lack of proper communication systems other than skype
• Lack of software tools for accelerated drug discovery
• Need of high speed internet
• Need more experts to teach/train community members
• Proper time schedule of IU cheminformatics classes
50. Tools Developed for Large Scale
Bio-Chemical Data Minning
Association Search – visualize literature supported associations
between any two entities (compound, drug, gene, pathway,
disease, side effect). PLoS One, in press.
Semantic Link Association Prediction (SLAP) – find most highly
associated entities (compound, drug, gene, pathway, disease,
side effect) to any other entity, based on probabilistic weightings
of graph edges based on public experimental datasets. Paper in
preparation
BioLDA – find most highly associated entities to any other entity
based on a complex topic model analysis of the literature
(PubMed). PLoS One, 2011, 6 (3), e17243
See also: WENDI (J. Cheminf., 2010,2,6); Chemogenomic Explorer
(BMC Bio. 2011,12,256), ChemLDA, ChemBioGrid (J. Chem. Inf.
Model., 2007; 47(4) pp 1303-1307)
52. Cheminformatics
Community of About 400
PubChem
ChEMBL
DrugBank
HT Virtual
screening
Curated molecule Cheminformatics Data Mining Experimental
datasets Models and Analysis Assays
Other Active Communities:
•OSDD Women Scientists Forum
•OSDD Junior Scientists Forum
53. Ideal Case US-India
Cheminformatics Collaboration
Research
Wet lab
research
IU
CCRG
Industry
partnerships Education OSDD
Many Open
interested cheminfo.
students group
54. But in order to sustain…?
Funding for
research in U.S.
$1.3m NIH Funding for
$360,000 Eli Lilly research in
$120,000 Pfizer $0 osdd
$46m Govt
55. What should be our approach
to reach out and integrate?
Most of the biologists and chemists do not use computational
workflows for their analysis
Awareness about the advantages of using such workflow engines
The Community needs to be trained for using the workflows
The Community needs to be trained for integrating applications
Web services vs standalone applications – each have their own set
of advantages and limitations
Developers of algorithms should be encouraged to report results
in globally accepted standard formats with standard ontologies
56. OSDD Open Access Resources
Assembly line for drug discovery
I Biological Repository
i. Open access clinical strains repository
ii. Open access clone repository
iii. Open access protein repository
II Chemical Repository
i. Open access small molecule repository
III Open Screening Facility
I. Submit your compounds for anti-tuberculosis
screening
57. Public Private Partnerships as Open Collaborative
Endeavors to solve Scientific Challenges
s12
• Five synthetic ‘thiophene
containing trisubstituted
methanes’, which showed a
s14 s15 MIC of <1.56 µg/ml, no
cytotoxicity in mammalian
N
O N
O
CF3
O cells being synthesised in PPP
Mode
Inhibition of FAAL and FACL Preclinical development of
enzymes by acyl-sulfamoyl thiophene containing
analogues trisubstituted methanes
58. Collaboration with TB Alliance on Human Clinical Trials
PA-824 in combination with other drugs
Affordable Healthcare for All
59. Target
based
approach Human
Systems
Clinical
Biology
Trials
Ligand Hit to Lead
based
approach
60. An Innovative Approach to Drug Discovery:
A New Paradigm
Biology/ Genomics High Risk,
Innovation Funnel Innovation Driven
Target Identification Sphere
Strategy-> Open
Target Validation Innovation with
best minds from
academia/ industry
Hit(s)
Value
Risk
Validated/ Quality Lead Process Oriented –
Strategy-> Industry
CRO’s Participation
Optimised Candidate Drug
Strategy->
Clinical Trials OSDD to support
clinical trials in
Registered Drug collaboration with
pharma
Drugs to be available without IP encumbrances
61. Major International Collaborations
Metabolic Map Network Generation
Structural Interactome to predict Off-
Site Interactions of Drug Candidates
Cheminformatics and e-learning
62. Geek Nation:
How Indian Science Is Taking Over The World
Author, Angela Saini
http://www.sunday-guardian.com/bookbeat/tour-of-indian-science-that-fails-to-see-full-picture
66. Some of the OSDD PIs
Mtb Systems
Target Validation
Biology
Cloning of potential
drug targets
PPI Validation OSDDChem Cheminformatics Community
+ E-learning
Mtb Genome Analysis
Galaxy Integration with Grid
Email: anshub@osdd.net Skype: anshu.bhardwaj
67. OSDD : A Global Community -
More than 5500 members from over 130 countries
Statistics as of March 2012
68. Open Source Drug Discovery (OSDD) Model
“Team India Consortium with International Participation”
Open Synthesis and
Exchange
of Knowledge
Candidate Lead
Targets Molecules PRECLINICAL & CLINICAL Drug
in silico SCREENING TRIAL
Mycobacterium tuberculosis in vivo VALIDATION
Wiki Portal
Contract
Academia
Research
& Hospitals
Organisations
Exchange of Ideas/Results
Community Participation
Lead Organization Current Partners
Council of Scientific and
Industrial Research (CSIR), India
69. Together we can …
.. and we should !
http://www.osdd.net
http://c2d.osdd.net
http://sysborg2.osdd.net
Email: info@osdd.net
anshub@osdd.net
abhik1368@gmail.com
Skype: anshu.bhardwaj
http://scienceopenscience.blogspot.com/2011
/12/osdd-song.html Matt Smadley | Flickr.com