SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Chemoinformatics and
information management

Peter Willett, University of Sheffield, UK
Overview
• What is chemoinformatics and why is it
  necessary
• Managing structural information
• Typical facilities in chemoinformatics
  software
• Examples of current research
Drug discovery: I
• Drug discovery is a vastly complex, multi-disciplinary
  task that can extend over two decades
• The total cost for the discovery and development of a
  novel therapeutic agent is now ca. $1.5B
• Even so, only about 1 in 3 cover the R&D costs
   • But when they can do the pay-offs can be massive: Lipitor in
     2006 made $12.5B (cf MS Windows and Boeing 747)
• Patent cover is 20 years from initial announcement
   • Time is money so need to find potential drugs (and to reject non-
     drugs) much faster (and similarly for agrochemicals)
Drug discovery: II
• Chemoinformatics is one way of increasing the cost
  effectiveness of drug discovery
• Initial work in chemoinformatics as early as the Sixties:
  current interest because of developments in
   • Combinatorial chemistry
   • High throughput screening (HTS)
   • Change from sequential to massively parallel processing
• Resulting explosion in the amounts of data available in
  drug-discovery programmes, and an increased interest
  in computational methods
   • Focus on chemical structure diagram, cf development of other
     types of -informatics specialisms
Definitions
•   F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33,
    375-384
    • “The use of information technology and management has become
      a critical part of the drug discovery process. Chemoinformatics is
      the mixing of those information resources to transform data into
      information and information into knowledge for the intended
      purpose of making better decisions faster in the area of drug lead
      identification and optimization”
•   G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at
    http://www.warr.com/warrzone.htm
    • “Chem(o)informatics is a generic term that encompasses the
      design, creation, organization, management, retrieval, analysis,
      dissemination, visualization and use of chemical information”
•   J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics:
    a textbook. Wiley-VCH.
    •   “Chemoinformatics is the application of informatics methods to
        solve chemical problems.”
Representation of molecules
• Need for a machine-readable representation
  • 1D – computed/experimental global properties
  • 2D – the chemical structure diagram
  • 3D – atomic coordinate data
• 1D representations handled using conventional
  DBMS software
• Need to manipulate 2D and 3D data
Connection tables

                     9       1   C   2   2   6   1   7   1
                     O       2   C   1   2   3   1
         2                   3   C   2   1   4   2
             1
     3                       4   C   3   2   5   1
                     7
                         8   5   C   4   1   6   2
    4                        6   C   1   1   5   2
                 6
                             7   C   1   1   8   1   9   2
         5                   8   C   7   1
                             9   O   7   2



• An unambiguous representation of a 2D chemical
  structure diagram
• A connection table is a graph, the underlying data
  structure in chemoinformatics
Graph theory and chemistry
• Graph theory
   • Branch of mathematics that describes sets of objects, called
     nodes and the relationships between them, called edges
                                                         O
• A 2D connection table is a graph:                                 Br
   • Nodes correspond to atoms
   • Edges correspond to bonds
                                                  NH 2
• Graph matching algorithms
   • Search chemical databases
• Generation of other representations
Types of search
• Exact structure search (hashed connection table with
  graph isomorphism for collision handling)
• Substructure search (subgraph isomorphism)
   • cf partial or boolean matching in text
• Similarity searching (maximal common subgraph
  isomorphism (or simpler))
   • cf best match search or web searching
• Graph matching algorithms are effective
   • But time is factorial with the number of nodes
   • Need for efficient heuristics
Fingerprints                C
                                         O
                                       C C C
                        C   C   C
                            C




• A fingerprint (or fragment bit-string) is a binary vector
  encoding the presence (“1”) or absence (“0”) of
  fragment substructures in a molecule
• Each bit in the fingerprint represents one molecular
  fragment. Typical length is ~1000 bits
• An approximate representation, but one that can be
  processed very efficiently and hence often used as a
  precursor to graph matching
Chemoinformatics facilities
• Database searching as described previously
   • Structure and substructure searching originally
   • Similarity searching from mid-Eighties
   • 3D substructure searching from mid-Nineties (first rigid then
     flexible)
• Applications
   •   Database clustering
   •   Molecular diversity analysis
   •   Drug-likeness
   •   Virtual screening
        Ligand-based
        Structure-based
3D substructure
                               searching
• Generation of pharmacophore patterns
• Use of MOGA and hyperstructure approaches
                       O           a = 8.62+ 0.58 Angstroms
                                           -                                                       N
                                                                                                               O
                                   b = 7.08+ 0.56 Angstroms
                                           -
               c           a       c = 3.35+ 0.65 Angstroms                                                O
                                           -
                                                                                               O                   O


                       b       N                                                                       O       O
       O
                                                                               S
                                                                                           N
               O                                                           N
                                O
        O                  O                                                               N
                                     O                                         N
                                                   N
                                                                                               O
                       O                                                               O
                                               N       N               N                                   O
                           O                                                               O           O P O O
                                                   N       O                       N
                                                                   N
           O       N                                                                                       O   P O
                                                       O                           N                                   O
                                                               O       N
                           N                                                                                   O P     O
                                                                                           O
                                              O                            O                                       O
                           N   O                       O
                                                                                   O           O
Similarity searching
                     using 2D fingerprints
    Use of data fusion methods to enhance performance,
    combining information from multiple searches


                                   H
                                   N        O
        H    H                                                  H2N
        N    N       OH                                    H
N                                  N        NH2            N
                                                  N
                                   Q uery
        N
                                                           N

                          OH
                                                      H   H2N
HO                                                    N
             N
                               H                      N
                 N             N

                               N
Molecular modelling and QSAR
• Use of computational chemistry to obtain the structures
  and properties of small molecules
   • Quantum mechanics
   • Molecular dynamics
   • Molecular modelling
• Statistical correlation of structure (however described)
  with physical, chemical and biological properties
   • Initially biological activity (QSAR)
   • Now pharmacokinetics and toxicity (ADMET)
Integration with database searching
• Related, but largely separate, research areas for many
  years
   • Simple search operations on very large numbers of molecules
   • Increasingly complex operations on smaller and smaller
     (normally homogeneous) datasets
   • Substructural analysis as an early, notable exception
• The future lies in the integration of these two
  approaches, applying more sophisticated methods on
  larger datasets
   • Docking now well established
   • Property calculations at a database level
   • ADMET
General references
J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH,
   Weinheim, 2003).
W.L. Chen, Chemoinformatics: past, present and future, Journal of
  Chemical Information and Modeling 46 (2006) 2230-2255.
D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics
   education in drug discovery, Drug Discovery Today 11 (2006) 436-
   439.
A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics
   (Kluwer, Dordrecht, 2nd sedition, 2007).
P. Willett, A bibliometric analysis of chemoinformatics, Aslib
   Proceedings 60 (2008) 4-17.

Contenu connexe

En vedette

An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureDevakumar Jain
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminarHaitham Hijazi
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug designSurmil Shah
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessProf. Dr. Basavaraj Nanjwade
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databasespgst
 
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptURBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptgrssieee
 
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...Dr. Haxel Consult
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingMitch Miller
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...James Jeffryes
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsabdelazim Galal
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Sean Ekins
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesHezekiah Fatoki
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug designReihaneh Safavi
 

En vedette (20)

An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminar
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databases
 
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptURBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
 
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific Thinking
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
 
Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformatics
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
 
Data Structures
Data StructuresData Structures
Data Structures
 
AVL Tree
AVL TreeAVL Tree
AVL Tree
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 

Similaire à Chemoinformatics and information management

Naked DNA And DNA Vaccines A Retrospective
Naked DNA And DNA Vaccines  A RetrospectiveNaked DNA And DNA Vaccines  A Retrospective
Naked DNA And DNA Vaccines A Retrospectiverwmalonemd
 
Dr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsisDr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsisvibhabhagat2007
 
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
Structure-Activity Relationships and Networks: A Generalized Approachto Expl...Structure-Activity Relationships and Networks: A Generalized Approachto Expl...
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...Rajarshi Guha
 
Postdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWDPostdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWDSean Clancy, Ph.D.
 
Lessons learned - the pharma experience
Lessons learned  - the pharma experienceLessons learned  - the pharma experience
Lessons learned - the pharma experienceDESCA_2012
 

Similaire à Chemoinformatics and information management (6)

kddvince
kddvincekddvince
kddvince
 
Naked DNA And DNA Vaccines A Retrospective
Naked DNA And DNA Vaccines  A RetrospectiveNaked DNA And DNA Vaccines  A Retrospective
Naked DNA And DNA Vaccines A Retrospective
 
Dr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsisDr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsis
 
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
Structure-Activity Relationships and Networks: A Generalized Approachto Expl...Structure-Activity Relationships and Networks: A Generalized Approachto Expl...
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
 
Postdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWDPostdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWD
 
Lessons learned - the pharma experience
Lessons learned  - the pharma experienceLessons learned  - the pharma experience
Lessons learned - the pharma experience
 

Plus de Duncan Hull

Why study plants?
Why study plants?Why study plants?
Why study plants?Duncan Hull
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumDuncan Hull
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyDuncan Hull
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Duncan Hull
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusDuncan Hull
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09Duncan Hull
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenIDDuncan Hull
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible ScientistDuncan Hull
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging DangerouslyDuncan Hull
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upDuncan Hull
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureDuncan Hull
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and Duncan Hull
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your DataDuncan Hull
 

Plus de Duncan Hull (20)

Why study plants?
Why study plants?Why study plants?
Why study plants?
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculum
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
How to Blog
How to BlogHow to Blog
How to Blog
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenID
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible Scientist
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging Dangerously
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-up
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literature
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Chemoinformatics and information management

  • 1. Chemoinformatics and information management Peter Willett, University of Sheffield, UK
  • 2. Overview • What is chemoinformatics and why is it necessary • Managing structural information • Typical facilities in chemoinformatics software • Examples of current research
  • 3. Drug discovery: I • Drug discovery is a vastly complex, multi-disciplinary task that can extend over two decades • The total cost for the discovery and development of a novel therapeutic agent is now ca. $1.5B • Even so, only about 1 in 3 cover the R&D costs • But when they can do the pay-offs can be massive: Lipitor in 2006 made $12.5B (cf MS Windows and Boeing 747) • Patent cover is 20 years from initial announcement • Time is money so need to find potential drugs (and to reject non- drugs) much faster (and similarly for agrochemicals)
  • 4. Drug discovery: II • Chemoinformatics is one way of increasing the cost effectiveness of drug discovery • Initial work in chemoinformatics as early as the Sixties: current interest because of developments in • Combinatorial chemistry • High throughput screening (HTS) • Change from sequential to massively parallel processing • Resulting explosion in the amounts of data available in drug-discovery programmes, and an increased interest in computational methods • Focus on chemical structure diagram, cf development of other types of -informatics specialisms
  • 5. Definitions • F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33, 375-384 • “The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization” • G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at http://www.warr.com/warrzone.htm • “Chem(o)informatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information” • J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics: a textbook. Wiley-VCH. • “Chemoinformatics is the application of informatics methods to solve chemical problems.”
  • 6. Representation of molecules • Need for a machine-readable representation • 1D – computed/experimental global properties • 2D – the chemical structure diagram • 3D – atomic coordinate data • 1D representations handled using conventional DBMS software • Need to manipulate 2D and 3D data
  • 7. Connection tables 9 1 C 2 2 6 1 7 1 O 2 C 1 2 3 1 2 3 C 2 1 4 2 1 3 4 C 3 2 5 1 7 8 5 C 4 1 6 2 4 6 C 1 1 5 2 6 7 C 1 1 8 1 9 2 5 8 C 7 1 9 O 7 2 • An unambiguous representation of a 2D chemical structure diagram • A connection table is a graph, the underlying data structure in chemoinformatics
  • 8. Graph theory and chemistry • Graph theory • Branch of mathematics that describes sets of objects, called nodes and the relationships between them, called edges O • A 2D connection table is a graph: Br • Nodes correspond to atoms • Edges correspond to bonds NH 2 • Graph matching algorithms • Search chemical databases • Generation of other representations
  • 9. Types of search • Exact structure search (hashed connection table with graph isomorphism for collision handling) • Substructure search (subgraph isomorphism) • cf partial or boolean matching in text • Similarity searching (maximal common subgraph isomorphism (or simpler)) • cf best match search or web searching • Graph matching algorithms are effective • But time is factorial with the number of nodes • Need for efficient heuristics
  • 10. Fingerprints C O C C C C C C C • A fingerprint (or fragment bit-string) is a binary vector encoding the presence (“1”) or absence (“0”) of fragment substructures in a molecule • Each bit in the fingerprint represents one molecular fragment. Typical length is ~1000 bits • An approximate representation, but one that can be processed very efficiently and hence often used as a precursor to graph matching
  • 11. Chemoinformatics facilities • Database searching as described previously • Structure and substructure searching originally • Similarity searching from mid-Eighties • 3D substructure searching from mid-Nineties (first rigid then flexible) • Applications • Database clustering • Molecular diversity analysis • Drug-likeness • Virtual screening Ligand-based Structure-based
  • 12. 3D substructure searching • Generation of pharmacophore patterns • Use of MOGA and hyperstructure approaches O a = 8.62+ 0.58 Angstroms - N O b = 7.08+ 0.56 Angstroms - c a c = 3.35+ 0.65 Angstroms O - O O b N O O O S N O N O O O N O N N O O O N N N O O O O P O O N O N N O N O P O O N O O N N O P O O O O O N O O O O
  • 13. Similarity searching using 2D fingerprints Use of data fusion methods to enhance performance, combining information from multiple searches H N O H H H2N N N OH H N N NH2 N N Q uery N N OH H H2N HO N N H N N N N
  • 14. Molecular modelling and QSAR • Use of computational chemistry to obtain the structures and properties of small molecules • Quantum mechanics • Molecular dynamics • Molecular modelling • Statistical correlation of structure (however described) with physical, chemical and biological properties • Initially biological activity (QSAR) • Now pharmacokinetics and toxicity (ADMET)
  • 15. Integration with database searching • Related, but largely separate, research areas for many years • Simple search operations on very large numbers of molecules • Increasingly complex operations on smaller and smaller (normally homogeneous) datasets • Substructural analysis as an early, notable exception • The future lies in the integration of these two approaches, applying more sophisticated methods on larger datasets • Docking now well established • Property calculations at a database level • ADMET
  • 16. General references J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH, Weinheim, 2003). W.L. Chen, Chemoinformatics: past, present and future, Journal of Chemical Information and Modeling 46 (2006) 2230-2255. D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics education in drug discovery, Drug Discovery Today 11 (2006) 436- 439. A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics (Kluwer, Dordrecht, 2nd sedition, 2007). P. Willett, A bibliometric analysis of chemoinformatics, Aslib Proceedings 60 (2008) 4-17.