SlideShare a Scribd company logo
1 of 26
Molecular similarity                    By: Haytham Hijazi
searching methods                       Advisor: Univ-Prof. Hon-Prof. Dr. Dieter
in drug discovery                                       Roller


A Presentation in advanced graphical
engineering systems seminar 2011/2012

                                                                              1
In this work, I propose a contribution to the field of “Cheminformatic”.
   Cheminformatic means solving chemical problems using computational methods[1].



James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity
search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007).




      Molecular similarity                                                                By: Haytham Hijazi
      searching methods                                                                      Advisor: Univ-Prof. Hon-Prof. Dr. Dieter
      in drug discovery                                                                                      Roller


      A Presentation in advanced graphical
      engineering systems seminar 2011/2012

                                                                                                                                    2
Agenda
                           •The main question in this research

                           •The principle of similarity

                           •Drug discovery as an application

                           •Research problem

                           • Molecular representations (1D, 2D…)

                           •Searching the similarity

                           •Similarity coefficients calculations

                           •The probabilistic model (BIM)

                           •The contribution (MDC)

                           •Experiments, conclusions and discussion
                                                                      3
A Presentation in advanced graphical engineering
systems seminar 2011/2012
“The similarity is in the eye of the beholder”
      Shape                     Colour




      Size                      Pattern




                                                 4
Question:      Which molecules in a database are
               similar to the query
               molecule?
Application:   •better compounds than initial lead
               compound (Drug discovery)
               •Property prediction of unknown
               compound.




                                                     5
     Structurally similar molecules are assumed to have
             similar biological properties.


            Similar biological propritiesdrug discovery.




                                                                   [1]




1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for
Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition.   6
Claim: General manufacturing problems!
                                         7
Similarity coefficients
  Molecule
                Feature selection      calculations and
represntation
                                      ranking for search




                                                              8
   Historical progression
            ◦ Complete structure
            ◦ Sub-Structure


           Descriptors
            ◦ 1D (psychophysical properties), 2D, 3D, and 4D


           Connectivity tables and graph theory!




Image Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                       9
diversity”, ChemoInformatics in Drug Discovery, 2009.
SMILES


                                                              CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C=
     CC(=O)OC1=CC=CC=C1C(=O)O
                                                              CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C

                  SMILES – Simplified Molecular Line Entry System
Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                              10
diversity”, ChemoInformatics in Drug Discovery, 2009.
       A fingerprint is a vector encoding the presence (‘1’) or
              absence (‘0’) of FRAGMENT substructures in a molecule


             Dictionary based or and hash based fingerprints

                  Descriptor          Fragment


              1                      AR


              2                      CCCCN


              3                      Me


              9                      NH2



                               [1]
                                                                                          [2]

2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”,
                                                                                                11
ChemoInformatics in Drug Discovery, 2009.
   In 3D keys the position of each bit
            corresponds to a certain range of distances or
            angels.
           Computationally complex




Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                 12
diversity”, ChemoInformatics in Drug Discovery, 2009.
Similarity coefficients
  Molecule
                Feature selection      calculations and
represntation
                                      ranking for search




                                                              13
   Exact structure search
                             Structure search
   Substructure search

   Similarity searching: maximal common sub
    graph isomorphism, Tanimoto/Dice/Cosine
    coefficients




                                                14
   The similarity measure (coefficient) is a
    quantitative measure of similarity

   Used to rank the results of the query

   Results are ordered decreasingly

    Distance coefficients.
    Probabilistic coefficients.
    Correlation coefficients.
    Association coefficients.


                                                15
Associative
           Simple matching coefficient                          (c+d)/(a+b-c+d)
           Jaccard measure (Tanimoto)                           c/(a+b-c) =AND/OR
           Cosine, Ochiai                                       c/√(a+b)(c+d)
           Dice                                                 c/.5[(a+c)+(b+c)] and 2c/a+b
                                                         Distance
           Hamming distance                                     a+b-2c
           Euclidean distance                                   √a+b-2c
           Soregel distance                                     a+b-2c/a+b-c
                                                   Other coefficients
           Pattern difference                                   ab/(a+b c+d)2
           Size                                                 (a-b)2/(a+b+c+d)2




Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research
                                                                                                 16
Management Centre Project Vote – 75207, University of Malaysia, 2009
   Assume we generate the fingerprint fragment
    based bits
   Molecule A:
       00010100010101000101010011110100
   Molecule B:
       00000000100101001001000011100000
                                      c
   Tanimoto coefficient =
   Where c=A AND B              (a   b)   c

   Tanimoto=6/(13+8)-6=0.4

                         a   c    b

                                                  17
   Associate the relevance of a structure to an
            explicit feature




           pi=probability that bit bi appears in an active structure.
           qi=probability that bit bi appears in an inactive structure
           αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated.
           P (A|S) is the probability of an active structure given S.
           P (NA|S) is the probability of an inactive structure given S.
           P(A) is the probability of ACTIVEs
           P(NA) is the probability of INACTIVES




Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research
                                                                                                                        18
Management Centre Project Vote – 75207, University of Malaysia, 2009
Claim: General manufacturing problems !
                                          19
Molecular
 dynamic
simulating
   tool                                            Active
                                                   compounds
                                                   Database
    Psychophysical properties   Voting   Class 1

        Classification                   Class 2
         Algorithm

                                         Class n


                                                               20
   Better insight about the similarity in terms of
    bioactivity, toxicity, reactivity...(+)

   The time of searching (+)

   Prediction and voting possibilities (+)

   Cost of simulation tools (-)

   Classification errors (-)


                                                      21
   Materials Explorer




   Itemtracker -Freezer/Cryogen sample tracking system


   CHARMM


   MDynaMix




                                                          22
Fingerprint time gneration

                                   30

                                   25

                                   20

                       Time (Ms)   15
                                                                                                   2 bits
                                   10
                                                                                                   3 bits
                                     5                                                             4 bits
                                                                                          4 bits
                                        0
                                                                                        3 bits
                                                4                                     2 bits
                                                        5
                                                                 6
                                                                              7
                                                                                  8

                                                            Max path.length




                                            Consider if we have more than 1000 bits!

Data source: simulating tool indicated in the report [17]
                                                                                                            23
Hit rate
                     0.18

                     0.16

                     0.14

                     0.12

                      0.1
          Hit Rate




                     0.08
                                                                                              Hit rate
                     0.06

                     0.04

                     0.02

                       0

                            0       500              1000                1500   2000   2500

                                                            Selection Size


   The more we increase the size of features, the more the hit rate of finding actives decreaes.


Data source: simulating tool indicated in the report [17]
                                                                                                         24
   Even fingerprint fragment based is time
    consuming

   Probabilistic models and machine learning
    introduced substantial changes

    Mixing more than type of descriptors seems
    efficient i.e. Time and results quality

   Still need to have experimental results



                                                  25
Molecular similarity                       Thanks for your listening
searching methods
in drug discovery                          Haytham Hijazi
                                           
A Presentation to the advanced graphical
engineering systems seminar 2011/2012

                                                                  26

More Related Content

What's hot

Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingAkshay Kank
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataAbhik Seal
 
Pharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentPharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentMbachu Chinedu
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformaticsbaoilleach
 
Presentation1
Presentation1Presentation1
Presentation1firesea
 
MOLECULAR DOCKING.pptx
MOLECULAR DOCKING.pptxMOLECULAR DOCKING.pptx
MOLECULAR DOCKING.pptxE Poovarasan
 
molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...GAUTAM KHUNE
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new moleculesMadhuraDatar
 
Cheminformatics
CheminformaticsCheminformatics
CheminformaticsVin Anto
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and SimulationsAbhilash Kannan
 
Pharmacophore
PharmacophorePharmacophore
Pharmacophoreirecen
 

What's hot (20)

Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular Modeling
 
Cheminformatics-1.ppt
Cheminformatics-1.pptCheminformatics-1.ppt
Cheminformatics-1.ppt
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical data
 
Pharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentPharmacophore mapping in Drug Development
Pharmacophore mapping in Drug Development
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Presentation1
Presentation1Presentation1
Presentation1
 
Energy minimization
Energy minimizationEnergy minimization
Energy minimization
 
MOLECULAR DOCKING.pptx
MOLECULAR DOCKING.pptxMOLECULAR DOCKING.pptx
MOLECULAR DOCKING.pptx
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Molecular modelling
Molecular modelling Molecular modelling
Molecular modelling
 
Energy minimization
Energy minimizationEnergy minimization
Energy minimization
 
molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Molecular Docking
 Molecular Docking Molecular Docking
Molecular Docking
 
Chemoinformatics
ChemoinformaticsChemoinformatics
Chemoinformatics
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new molecules
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
MD Simulation
MD SimulationMD Simulation
MD Simulation
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
Pharmacophore
PharmacophorePharmacophore
Pharmacophore
 

Similar to Molecular similarity searching methods, seminar

Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryDeakin University
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"ieee_cis_cyprus
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesMike Hucka
 
Semantic representation of neuroimaging observation
Semantic representation of neuroimaging observationSemantic representation of neuroimaging observation
Semantic representation of neuroimaging observationEmna AMDOUNI, Ph.D.
 
Ontology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questionsOntology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questionsNicola Guarino
 
Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Joris Klerkx
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Artificial ethics
Artificial ethicsArtificial ethics
Artificial ethicsJORGE
 
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaBm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaManuel GEA - Bio-Modeling Systems
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsCSCJournals
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformaticsBenjamin Bucior
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?GigaScience, BGI Hong Kong
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...butest
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligencePallavi Vashistha
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2zukun
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Mathieu d'Aquin
 
Multivariate analyses & decoding
Multivariate analyses & decodingMultivariate analyses & decoding
Multivariate analyses & decodingkhbrodersen
 

Similar to Molecular similarity searching methods, seminar (20)

Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug Discovery
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Semantic representation of neuroimaging observation
Semantic representation of neuroimaging observationSemantic representation of neuroimaging observation
Semantic representation of neuroimaging observation
 
Ontology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questionsOntology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questions
 
Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)
 
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant HealthSemantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Artificial ethics
Artificial ethicsArtificial ethics
Artificial ethics
 
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaBm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic Twins
 
Human Assessment of Ontologies
Human Assessment of OntologiesHuman Assessment of Ontologies
Human Assessment of Ontologies
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
 
Multivariate analyses & decoding
Multivariate analyses & decodingMultivariate analyses & decoding
Multivariate analyses & decoding
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Molecular similarity searching methods, seminar

  • 1. Molecular similarity By: Haytham Hijazi searching methods Advisor: Univ-Prof. Hon-Prof. Dr. Dieter in drug discovery Roller A Presentation in advanced graphical engineering systems seminar 2011/2012 1
  • 2. In this work, I propose a contribution to the field of “Cheminformatic”. Cheminformatic means solving chemical problems using computational methods[1]. James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007). Molecular similarity By: Haytham Hijazi searching methods Advisor: Univ-Prof. Hon-Prof. Dr. Dieter in drug discovery Roller A Presentation in advanced graphical engineering systems seminar 2011/2012 2
  • 3. Agenda •The main question in this research •The principle of similarity •Drug discovery as an application •Research problem • Molecular representations (1D, 2D…) •Searching the similarity •Similarity coefficients calculations •The probabilistic model (BIM) •The contribution (MDC) •Experiments, conclusions and discussion 3 A Presentation in advanced graphical engineering systems seminar 2011/2012
  • 4. “The similarity is in the eye of the beholder” Shape Colour Size Pattern 4
  • 5. Question: Which molecules in a database are similar to the query molecule? Application: •better compounds than initial lead compound (Drug discovery) •Property prediction of unknown compound. 5
  • 6. Structurally similar molecules are assumed to have similar biological properties.  Similar biological propritiesdrug discovery. [1] 1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition. 6
  • 8. Similarity coefficients Molecule Feature selection calculations and represntation ranking for search 8
  • 9. Historical progression ◦ Complete structure ◦ Sub-Structure  Descriptors ◦ 1D (psychophysical properties), 2D, 3D, and 4D  Connectivity tables and graph theory! Image Source: Karine Audouze, “Representation of molecular structures and structural 9 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 10. SMILES CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C= CC(=O)OC1=CC=CC=C1C(=O)O CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C SMILES – Simplified Molecular Line Entry System Source: Karine Audouze, “Representation of molecular structures and structural 10 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 11. A fingerprint is a vector encoding the presence (‘1’) or absence (‘0’) of FRAGMENT substructures in a molecule  Dictionary based or and hash based fingerprints Descriptor Fragment 1 AR 2 CCCCN 3 Me 9 NH2 [1] [2] 2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”, 11 ChemoInformatics in Drug Discovery, 2009.
  • 12. In 3D keys the position of each bit corresponds to a certain range of distances or angels.  Computationally complex Source: Karine Audouze, “Representation of molecular structures and structural 12 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 13. Similarity coefficients Molecule Feature selection calculations and represntation ranking for search 13
  • 14. Exact structure search Structure search  Substructure search  Similarity searching: maximal common sub graph isomorphism, Tanimoto/Dice/Cosine coefficients 14
  • 15. The similarity measure (coefficient) is a quantitative measure of similarity  Used to rank the results of the query  Results are ordered decreasingly Distance coefficients. Probabilistic coefficients. Correlation coefficients. Association coefficients. 15
  • 16. Associative Simple matching coefficient (c+d)/(a+b-c+d) Jaccard measure (Tanimoto) c/(a+b-c) =AND/OR Cosine, Ochiai c/√(a+b)(c+d) Dice c/.5[(a+c)+(b+c)] and 2c/a+b Distance Hamming distance a+b-2c Euclidean distance √a+b-2c Soregel distance a+b-2c/a+b-c Other coefficients Pattern difference ab/(a+b c+d)2 Size (a-b)2/(a+b+c+d)2 Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research 16 Management Centre Project Vote – 75207, University of Malaysia, 2009
  • 17. Assume we generate the fingerprint fragment based bits  Molecule A: 00010100010101000101010011110100  Molecule B: 00000000100101001001000011100000 c  Tanimoto coefficient =  Where c=A AND B (a b) c  Tanimoto=6/(13+8)-6=0.4 a c b 17
  • 18. Associate the relevance of a structure to an explicit feature  pi=probability that bit bi appears in an active structure.  qi=probability that bit bi appears in an inactive structure  αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated.  P (A|S) is the probability of an active structure given S.  P (NA|S) is the probability of an inactive structure given S.  P(A) is the probability of ACTIVEs  P(NA) is the probability of INACTIVES Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research 18 Management Centre Project Vote – 75207, University of Malaysia, 2009
  • 20. Molecular dynamic simulating tool Active compounds Database Psychophysical properties Voting Class 1 Classification Class 2 Algorithm Class n 20
  • 21. Better insight about the similarity in terms of bioactivity, toxicity, reactivity...(+)  The time of searching (+)  Prediction and voting possibilities (+)  Cost of simulation tools (-)  Classification errors (-) 21
  • 22. Materials Explorer  Itemtracker -Freezer/Cryogen sample tracking system  CHARMM  MDynaMix 22
  • 23. Fingerprint time gneration 30 25 20 Time (Ms) 15 2 bits 10 3 bits 5 4 bits 4 bits 0 3 bits 4 2 bits 5 6 7 8 Max path.length Consider if we have more than 1000 bits! Data source: simulating tool indicated in the report [17] 23
  • 24. Hit rate 0.18 0.16 0.14 0.12 0.1 Hit Rate 0.08 Hit rate 0.06 0.04 0.02 0 0 500 1000 1500 2000 2500 Selection Size The more we increase the size of features, the more the hit rate of finding actives decreaes. Data source: simulating tool indicated in the report [17] 24
  • 25. Even fingerprint fragment based is time consuming  Probabilistic models and machine learning introduced substantial changes  Mixing more than type of descriptors seems efficient i.e. Time and results quality  Still need to have experimental results 25
  • 26. Molecular similarity Thanks for your listening searching methods in drug discovery Haytham Hijazi  A Presentation to the advanced graphical engineering systems seminar 2011/2012 26

Editor's Notes

  1. 1
  2. Each bit in the fingerprint represents one molecular fragment