SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Structure-­‐Ac)vity	
  Rela)onships	
  and	
  
 Networks:	
  A	
  Generalized	
  Approach	
  
      to	
  Exploring	
  Structure-­‐Ac)vity	
  
                               Landscapes	
  

                    Rajarshi	
  Guha	
  
        NIH	
  Chemical	
  Genomics	
  Center	
  /	
  
 NIH	
  Center	
  for	
  Transla9onal	
  Therapeu9cs	
  
                       March	
  29,	
  2011	
  
NIH	
  Chemical	
  Genomics	
  Center	
  
•    Founded	
  2004	
  as	
  part	
  of	
  NIH	
  Roadmap	
  Molecular	
  Libraries	
  Ini9a9ve	
  
      –  NCGC	
  staffed	
  with	
  90+	
  scien9sts	
  –	
  biologists,	
  chemists,	
  informa9cians,	
  engineers	
  
      –  Post-­‐doc	
  program	
  
•    Mission	
  
      –  MLPCN	
  (screening	
  &	
  chemical	
  synthesis;	
  compound	
  repository;	
  PubChem	
  database;	
  
         funding	
  for	
  assay,	
  library	
  and	
  technology	
  development	
  )	
  
              •  Complements	
  individual	
  inves9gator-­‐ini9ated	
  research	
  programs	
  
              •  Enables	
  “pharma-­‐level”	
  HTS	
  and	
  early	
  chemical	
  op9miza9on	
  
      –  Develop	
  new	
  chemical	
  probes	
  for	
  basic	
  research	
  and	
  leads	
  for	
  therapeu9c	
  development,	
  
         par9cularly	
  for	
  rare/neglected	
  diseases	
  
      –  New	
  paradigms	
  &	
  applica9ons	
  of	
  HTS	
  for	
  chemical	
  biology	
  /	
  chemical	
  genomics	
  
•    All	
  NCGC	
  projects	
  are	
  collabora9ons	
  with	
  a	
  target	
  or	
  disease	
  expert;	
  	
  currently	
  >200	
  
     collabora9ons	
  with	
  inves9gators	
  worldwide	
  	
  
      –  75%	
  NIH	
  extramural,	
  10%	
  NIH	
  intramural,	
  15%	
  Founda9ons/Research	
  Consor9a/Pharma/
         Biotech	
  
NCGC	
  Project	
  Diversity	
  
(A) Disease areas    (B) Target types




                            (C) Detection methods
qHTS:	
  	
  High	
  Throughput	
  Dose	
  Response	
  
        Assay concentration ranges over 4 logs                       Informatics pipeline. Automated curve fitting

A	
     (high:~ 100 μM)
        1536-well plates, inter-plate dilution series
                                                                     and classification. 300K samples



                                                             C	
  
        Assay volumes 2 – 5 μL




B	
       Automated concentration-response data collection
          ~1 CRC/sec
Background	
  
•  Cheminforma9cs	
  methods	
  
    –  QSAR,	
  diversity	
  analysis,	
  virtual	
  screening,	
  	
  
       fragments,	
  polypharmacology,	
  networks	
  
•  More	
  recently	
  
    –  RNAi	
  screening,	
  high	
  content	
  imaging	
  
•  Extensive	
  use	
  of	
  machine	
  learning	
  
•  All	
  9ed	
  together	
  with	
  socware	
  	
  
   development	
  
    –  User-­‐facing	
  GUI	
  tools	
  
    –  Low	
  level	
  programma9c	
  libraries	
  
•  Believer	
  &	
  prac99oner	
  of	
  Open	
  Source	
  
Outline	
  
•  Structure-­‐ac9vity	
  rela9onships	
  
•  Characterizing	
  ac9vity	
  cliffs	
  
•  Working	
  with	
  the	
  structure-­‐ac9vity	
  landscape	
  
Structure	
  Ac)vity	
  Rela)onships	
  
     •  Similar	
  molecules	
  will	
  have	
  similar	
  ac9vi9es	
  
     •  Small	
  changes	
  in	
  structure	
  will	
  lead	
  to	
  small	
  
        changes	
  in	
  ac9vity	
  
     •  One	
  implica9on	
  is	
  that	
  SAR’s	
  are	
  addi9ve	
  
     •  This	
  is	
  the	
  basis	
  for	
  QSAR	
  modeling	
  




Mar9n,	
  Y.C.	
  et	
  al.,	
  J.	
  Med.	
  Chem.,	
  2002,	
  45,	
  4350–4358	
  
Excep)ons	
  Are	
  Easy	
  to	
  Find	
  

                  F3C                                             Cl                       Cl       F3C                       Cl            Cl
                                                 NH2                                                            NH2



                                            N                                                             N

                                                        N                                                             N
                                                                            NH2                                                        NH

                                                             O                                                            O
                                                                                                                                   O


                                             Ki	
  =	
  39.0	
  nM	
                                          Ki	
  =	
  1.8	
  nM	
  



                  F3C                                              Cl                      Cl       F3C                       Cl            Cl
                                                  NH2                                                           NH2



                                            N                                                             N

                                                        N                                                             N
                                                                            NH                                                         NH

                                                              O                              NH2                          O
                                                                        O                                                          O         NH2


                                                 Ki	
  =	
  10.0	
  nM	
                                      Ki	
  =	
  1.0	
  nM	
  

Tran,	
  J.A.	
  et	
  al.,	
  Bioorg.	
  Med.	
  Chem.	
  Le2.,	
  2007,	
  15,	
  5166–5176	
  
Structure	
  Ac)vity	
  Landscapes	
  



            •  Rugged	
  gorges	
  or	
  rolling	
  hills?	
  
                        –  Small	
  structural	
  changes	
  associated	
  with	
  large	
  
                           ac9vity	
  changes	
  represent	
  steep	
  slopes	
  in	
  the	
  
                           landscape	
  
                        –  But	
  tradi9onally,	
  QSAR	
  assumes	
  gentle	
  slopes	
  	
  
                        –  Machine	
  learning	
  is	
  not	
  very	
  good	
  for	
  special	
  
                           cases	
  
Maggiora,	
  G.M.,	
  J.	
  Chem.	
  Inf.	
  Model.,	
  2006,	
  46,	
  1535–1535	
  
Structure	
  Ac)vity	
  Landscapes	
  
Characterizing	
  the	
  Landscape	
  

            •  A	
  cliff	
  can	
  be	
  numerically	
  characterized	
  
            •  Structure	
  Ac9vity	
  Landscape	
  Index	
  (SALI)	
  

                                                                                                           Ai − A j
                                                                SALIi, j =
                                                                                                         1− sim(i, j)
            •  Cliffs	
  are	
  characterized	
  by	
  elements	
  of	
  the	
  
               matrix	
  with	
  very	
  large	
  values	
  

       €
Guha,	
  R.;	
  Van	
  Drie,	
  J.H.,	
  J.	
  Chem.	
  Inf.	
  Model.,	
  2008,	
  48,	
  646–658	
  
Visualizing	
  the	
  SALI	
  Matrix	
  
Fingerprints	
  



                       1   0   1   1   0   0         0    1    0



•  Lots	
  of	
  types	
  of	
  fingerprints	
  	
  
•  Indicates	
  the	
  presence	
  or	
  absence	
  of	
  a	
  structural	
  
   feature	
  	
  
•  Length	
  can	
  vary	
  from	
  166	
  to	
  4096	
  bits	
  or	
  more	
  	
  
•  Fingerprints	
  usually	
  compared	
  using	
  the	
  
   Tanimoto	
  metric	
  
Varying	
  Fingerprint	
  Methods	
  
                                          BCI 1052 bit                                                      MACCS 166 bit                                                    CDK 1024 bit




                       8




                                                                                          8




                                                                                                                                                             8
                       6




                                                                                          6




                                                                                                                                                             6
             Density




                                                                                Density




                                                                                                                                                   Density
                       4




                                                                                          4




                                                                                                                                                             4
                       2




                                                                                          2




                                                                                                                                                             2
                       0




                                                                                          0




                                                                                                                                                             0
                           0.70   0.75   0.80     0.85     0.90   0.95   1.00                 0.70   0.75   0.80     0.85     0.90   0.95   1.00                 0.6   0.7          0.8            0.9   1.0

                                         Tanimoto Similarity                                                Tanimoto Similarity                                              Tanimoto Similarity




•  Shorter	
  fingerprints	
  will	
  lead	
  to	
  more	
  “similar”	
  pairs	
  
•  Requires	
  a	
  higher	
  cutoff	
  to	
  focus	
  on	
  significant	
  cliffs	
  
Varying	
  the	
  Similarity	
  Metric	
  
Different	
  Ac)vity	
  Representa)ons	
  

             •  Using	
  the	
  Hill	
  parameters	
  from	
  a	
  dose-­‐response	
  
                curve	
  represents	
  richer	
  data	
  than	
  a	
  single	
  IC50	
  
           SInf




                                              ⎧ S0 ⎫
                                              ⎪      ⎪
                                              ⎪ Sinf ⎪                     d(Pi ,P j )
                                                                 SALIi, j =
           50%




                                              ⎨      ⎬
Activity




                                              ⎪ AC50 ⎪                    1− sim(i, j)
                                              ⎪ H ⎪
                                              ⎩      ⎭
           S0




                           AC50
                      Concentration                €
Visualizing	
  SALI	
  Values	
  

•  Alterna9ves?	
  
    –  A	
  heatmap	
  is	
  an	
  easy	
  to	
  understand	
  visualiza9on	
  
    –  Coupled	
  with	
  brushing,	
  can	
  be	
  a	
  handy	
  tool	
  
    –  A	
  more	
  flexible	
  approach	
  is	
  to	
  consider	
  a	
  network	
  
       view	
  of	
  the	
  matrix	
  	
  
•  The	
  SALI	
  graph	
  
    –  Compounds	
  are	
  nodes	
  
    –  Nodes	
  i,j	
  are	
  connected	
  if	
  SALI(i,j)	
  >	
  X	
  
    –  Only	
  display	
  connected	
  nodes	
  
Visualizing	
  SALI	
  Values	
  

•  The	
  SALI	
  graph	
  
    –  Compounds	
  are	
  nodes	
  
    –  Nodes	
  i,j	
  are	
  connected	
  if	
  SALI(i,j)	
  >	
  X	
  
    –  Only	
  display	
  connected	
  nodes	
  
                                                          !
                                                          17            !!!!!!!!!
                                                                        7 13 29 43 49 45 54 59 76




                                                 !
                                                 15            !
                                                               28        ! !!!!!!!
                                                                         6 52 44 50 46 55 60 75




                                                      ! !
                                                      3 18       !!
                                                                 2 35    !! !
                                                                         20 22 9                     !
                                                                                                     64       !
                                                                                                              69        !
                                                                                                                        21            !
                                                                                                                                      34      !
                                                                                                                                              38




                                                      !
                                                      8    !
                                                           65             !
                                                                          24    ! !
                                                                                1 71      !!
                                                                                          12 58     !!
                                                                                                    63 10   !! ! !!
                                                                                                            68 27 23 41 42    !!!!
                                                                                                                              72 73 31 51     !
                                                                                                                                              39




                                                             !
                                                             5                                 ! !
                                                                                               19 62               !
                                                                                                                   25   !
                                                                                                                        57   !
                                                                                                                             56             !!!
                                                                                                                                            30 53 37




                                                          !
                                                          4                               !
                                                                                          40




                                                                                                       !
                                                                                                       66
Varying	
  the	
  Cutoff	
  
    •  The	
  cutoff	
  controls	
  the	
  complexity	
  of	
  the	
  graph	
  	
  
    •  Higher	
  cut	
  offs	
  will	
  highlight	
  the	
  most	
  significant	
  
       ac9vity	
  cliffs	
  


                                        Cutoff = 90%                                                                                                          Cutoff = 50%                                                                                                          Cutoff = 20%


                                                                                                                                                                                                                                                          !             !!!!!!!!!
! !               !           ! ! !!!!! ! !!!!!!
                                                                                                                                                                                                                                                          17            7 13 29 43 49 45 54 59 76
9   17            15          13    12   22   23   29   38   41        64        43   45   49   54   59   63   ! !
                                                                                                               9   17
                                                                                                                                 !
                                                                                                                                 15
                                                                                                                                             ! ! ! !!! !
                                                                                                                                             13    12    21    22   29   35   38
                                                                                                                                                                                                   !64
                                                                                                                                                                                                                   !!!!!!
                                                                                                                                                                                                                   43   45   49   54   59   63




                                                                                                                                                                                                                                                 !
                                                                                                                                                                                                                                                 15
                                                                                                                                                                                                                                                               !
                                                                                                                                                                                                                                                               28
                                                                                                                                                                                                                                                                         ! !!!!!!!
                                                                                                                                                                                                                                                                         6 52 44 50 46 55 60 75




! !!
1        28   3
                              !! !!!!!!!!!!!!!
                              6    19    24   25   52   39   57   42        56   44   46   50   55   60   62   ! !!
                                                                                                               1        28   3
                                                                                                                                             !! ! !!! !!!! !!!!!!!!
                                                                                                                                             6    19    23     24   52   65        39   41   42    56    58   66   44   46   50   55   60   62


                                                                                                                                                                                                                                                      ! !
                                                                                                                                                                                                                                                      3 18
                                                                                                                                                                                                                                                                 !!
                                                                                                                                                                                                                                                                 2 35
                                                                                                                                                                                                                                                                         !! !
                                                                                                                                                                                                                                                                         20 22 9
                                                                                                                                                                                                                                                                                                     !
                                                                                                                                                                                                                                                                                                     64
                                                                                                                                                                                                                                                                                                              !
                                                                                                                                                                                                                                                                                                              69
                                                                                                                                                                                                                                                                                                                        !
                                                                                                                                                                                                                                                                                                                        21            !
                                                                                                                                                                                                                                                                                                                                      34
                                                                                                                                                                                                                                                                                                                                              !
                                                                                                                                                                                                                                                                                                                                              38




    !
    2
                      !   8
                                   !40                                                                             !
                                                                                                                   2
                                                                                                                                     !   8
                                                                                                                                                  ! !
                                                                                                                                                   40    25
                                                                                                                                                                              !
                                                                                                                                                                              37
                                                                                                                                                                                             !57                                                      !
                                                                                                                                                                                                                                                      8
                                                                                                                                                                                                                                                           !
                                                                                                                                                                                                                                                           65
                                                                                                                                                                                                                                                                          !
                                                                                                                                                                                                                                                                          24    ! !
                                                                                                                                                                                                                                                                                1 71      !!
                                                                                                                                                                                                                                                                                          12 58     !!
                                                                                                                                                                                                                                                                                                    63 10
                                                                                                                                                                                                                                                                                                            !! ! !!
                                                                                                                                                                                                                                                                                                            68 27 23 41 42    !!!!
                                                                                                                                                                                                                                                                                                                              72 73 31 51     !
                                                                                                                                                                                                                                                                                                                                              39




                                                                                                                                                                                                                                                             !
                                                                                                                                                                                                                                                             5
                                                                                                                                                                                                                                                                                               ! !
                                                                                                                                                                                                                                                                                               19 62
                                                                                                                                                                                                                                                                                                                   !
                                                                                                                                                                                                                                                                                                                   25
                                                                                                                                                                                                                                                                                                                        !
                                                                                                                                                                                                                                                                                                                        57
                                                                                                                                                                                                                                                                                                                             !
                                                                                                                                                                                                                                                                                                                             56
                                                                                                                                                                                                                                                                                                                                            !!!
                                                                                                                                                                                                                                                                                                                                            30 53 37




                  !
                  5                                                                                                              !
                                                                                                                                 5



                                                                                                                                                                                                                                                          !
                                                                                                                                                                                                                                                          4                               !
                                                                                                                                                                                                                                                                                          40




                  !   4                                                                                                          !   4
                                                                                                                                                                                                                                                                                                       !
                                                                                                                                                                                                                                                                                                       66
BePer	
  Visualiza)on	
  -­‐	
  SALIViewer	
  




          hPp://sali.rguha.net	
  
What	
  Can	
  We	
  Do	
  With	
  SALI’s?	
  

•  SALI	
  characterizes	
  cliffs	
  &	
  non-­‐cliffs	
  
•  For	
  a	
  	
  given	
  molecular	
  representa9on,	
  SALI’s	
  
   gives	
  us	
  an	
  idea	
  of	
  	
  the	
  
   smoothness	
  of	
  the	
  	
  
   SAR	
  landscape	
  
•  Models	
  try	
  and	
  encode	
  
   this	
  landscape	
  
•  Use	
  the	
  landscape	
  to	
  guide	
  
   descriptor	
  or	
  model	
  	
  
   selec9on	
  
Descriptor	
  Space	
  Smoothness	
  
                                                                                                                                                                                    gatifloxacin




                                                                                                                                                                                                                                                                                                                                                                                                                                                     granisetron   dolasetron   perhexiline   amitriptyline   diltiazem              sparfloxacin   grepafloxacin   sildenafil   moxifloxacin   gatifloxacin



                                                                                                                                                moxifloxacin                       grepafloxacin                                sildenafil




                                                                                                                                 sparfloxacin            diltiazem                                                                                        amitriptyline




                                                               dolasetron                                                                                            granisetron                                                             imipramine                                           perhexiline
                                                                                                                                                                                                                                                                                                                                                                                                         400




                                                                                                                                                                                                                                                                                                                                                                         Number of Edges in SALI Graph
                                    mibefradil                                                                  chlorpromazine                                                                                                                                                                                                azimilide                    bepridil
                                                                                                                                                                                                                                                                                                                                                                                                                                                                   cisapride     E-4031       sertindole                  pimozide                    dofetilide    droperidol   thioridazine   haloperidol    domperidone   loratadine   mizolastine   bepridil   azimilide    mibefradil   chlorpromazine   imipramine




                halofantrine                     mizolastine                loratadine                                                                                                                            domperidone                                             verapamil                             terfenadine




   sertindole          dofetilide                                                                 haloperidol                                                                                      thioridazine                                                                                                                    droperidol
                                                                                                                                                                                                                                                                                                                                                                                                         300

                                                                                         E-4031                                                                                                                                                                                       cisapride                                                 pimozide




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               astemizole




                                                                                                                                                                                                   astemizole




                                                                                                                                                                                                                                                                                                                                                                                                         200



                                                                                                                                                                                                                                                                                                                                                                                                                                                           grepafloxacin                                                                            sildenafil                                                         moxifloxacin                                            gatifloxacin

                                                                                                                                                                                                                                                                                                                                                                                                         100




                                                                                                                                                                                                                                                                                                                                                                                                           0


                                                                                                                                                                                                                                                                                                                                                                                                               0.0   0.2   0.4     0.6   0.8   1.0                                                                                                                                         astemizole


                                                                                                                                                                                                                                                                                                                                                                                                                           SALI Cutoff




•  Edge	
  count	
  of	
  the	
  SALI	
  graph	
  for	
  varying	
  cutoffs	
  
•  Measures	
  smoothness	
  of	
  the	
  descriptor	
  space	
  
•  Can	
  reduce	
  this	
  to	
  a	
  single	
  number	
  (AUC)	
  
Other	
  Examples	
  
                                                                           400




•  Instead	
  of	
  fingerprints,	
  	
  




                                           Number of Edges in SALI Graph
                                                                           300



   we	
  use	
  molecular	
  	
                                            200                                         2D	
  
   descriptors	
                                                           100



•  SALI	
  denominator	
  now	
  	
                                         0



   uses	
  Euclidean	
  distance	
                                               0.0   0.2   0.4     0.6

                                                                                             SALI Cutoff
                                                                                                           0.8   1.0




•  2D	
  &	
  3D	
  random	
  	
  
   descriptor	
  sets	
  
                                                                           400




                                           Number of Edges in SALI Graph
    –  None	
  are	
  really	
  good	
  
                                                                           300



                                                                                                                       3D	
  
    –  Too	
  rough,	
  or	
  
                                                                           200




    –  Too	
  flat	
  
                                                                           100




                                                                            0


                                                                                 0.0   0.2   0.4     0.6   0.8   1.0

                                                                                             SALI Cutoff
Feature	
  Selec)on	
  Using	
  SALI	
  
•  Surprisingly,	
  exhaus9ve	
  search	
  of	
  66,000	
  4-­‐
   descriptor	
  combina9ons	
  did	
  not	
  yield	
  semi-­‐
   smoothly	
  decreasing	
  curves	
  
•  Not	
  en9rely	
  clear	
  what	
  type	
  of	
  curve	
  is	
  desirable	
  
SALI	
  Graphs	
  &	
  Predic)ve	
  Models	
  
•  The	
  graph	
  view	
  allows	
  us	
  to	
  view	
  SAR’s	
  and	
  iden9fy	
  
   trends	
  easily	
  
•  The	
  aim	
  of	
  a	
  QSAR	
  model	
  is	
  to	
  encode	
  SAR’s	
  
•  Tradi9onally,	
  we	
  consider	
  the	
  quality	
  of	
  a	
  model	
  in	
  
   terms	
  of	
  RMSE	
  or	
  R2	
  
•  But	
  in	
  general,	
  we’re	
  not	
  as	
  interested	
  in	
  RMSE’s	
  as	
  
   we	
  are	
  in	
  whether	
  the	
  model	
  predicted	
  something	
  
   as	
  more	
  ac9ve	
  than	
  something	
  else	
  	
  
    –  What	
  we	
  want	
  to	
  have	
  is	
  the	
  correct	
  ordering	
  
    –  We	
  assume	
  the	
  model	
  is	
  sta9s9cally	
  significant	
  
Measuring	
  Model	
  Quality	
  
•  A	
  QSAR	
  model	
  should	
  easily	
  encode	
  the	
  “rolling	
  
   hills”	
  
•  A	
  good	
  model	
  captures	
  the	
  most	
  significant	
  cliffs	
  
•  Can	
  be	
  formalized	
  as	
  	
  

  	
  	
  How	
  many	
  of	
  the	
  edge	
  orderings	
  of	
  a	
  SALI	
  graph	
   	
  	
  	
  	
  	
  	
  	
  
      	
  	
  does	
  the	
  model	
  predict	
  correctly?	
  

•  Define	
  S	
  (X	
  ),	
  represen9ng	
  the	
  number	
  of	
  edges	
  
   correctly	
  predicted	
  for	
  a	
  SALI	
  network	
  at	
  a	
  threshold	
  
   X	
  
•  Repeat	
  for	
  varying	
  X	
  and	
  obtain	
  the	
  SALI	
  curve	
  
SALI	
  Curves	
  




                                                                       1.0
       1.0




                                                                       0.5
       0.5




                                                                S(X)
S(X)




                                                                       0.0
       0.0




                                                                       !0.5
       !0.5




                                     3!descriptor
                                     5!descriptor
                                     Scrambled 3!descriptor            !1.0
                                                                                                           SCI = 0.12
       !1.0




              0.0   0.2   0.4       0.6       0.8         1.0                 0.0   0.2    0.4       0.6       0.8      1.0

                                X                                                                X
Model	
  Search	
  Using	
  the	
  SCI	
  
•  We’ve	
  used	
  the	
  SALI	
  to	
  retrospec9vely	
  analyze	
  
   models	
  
•  Can	
  we	
  use	
  SALI	
  to	
  develop	
  models?	
  
    –  Iden9fy	
  a	
  model	
  that	
  captures	
  the	
  cliffs	
  
•  Tricky	
  
    –  Cliffs	
  are	
  fundamentally	
  outliers	
  
    –  Op9mizing	
  for	
  good	
  SALI	
  values	
  implies	
  overfivng	
  
    –  Need	
  to	
  trade-­‐off	
  between	
  SALI	
  &	
  generalizability	
  
The	
  Objec)ve	
  Func)on	
  
•  S0	
  is	
  a	
  measure	
  of	
  the	
  models	
             1.0




   ability	
  to	
  summarize	
  the	
  dataset	
                0.9


                                                                                                       S100	
  




                                                          S(X)
                                                                 0.8


   (analogous	
  to	
  RMSE)	
                                     S 	
  
                                                                 0.7
                                                                                    0




•  S100	
  measures	
  the	
  models	
  
                                                                 0.6




   ability	
  to	
  capture	
  cliffs	
  
                                                                       0.0    0.2       0.4     0.6   0.8         1.0

                                                                                        SALI Cutoff




•  Ideally,	
  the	
  curve	
  starts	
  high	
  and	
  stays	
  high	
  


          1                  1 (S100 − S0 )                                      1
      F=                   F= +                                              F=
         S100                S0     2                                           SCI
SALI	
  Based	
  Model	
  Selec)on	
  
                                                                                                                            RMSE             SCI                    S(100)




       •  Considered	
  the	
  BZR	
  dataset	
  	
  
                                                                                                                     0.5




          from	
  Sutherland	
  et	
  al	
  




                                                                                                              S(X)
                                                                                                                     0.0




       •  Iden9fied	
  “best”	
  models	
  
                                                                                                                     -0.5




          using	
  a	
  GA	
  to	
  select	
  from	
  a	
  	
                                                               0.0     0.2      0.4       0.6

                                                                                                                                              SALI Cutoff
                                                                                                                                                                      0.8          1.0




          pool	
  of	
  2D	
  descriptors	
                                                                                 RMSE             SCI                    S(100)




       •  While	
  SALI	
  based	
  op9miza9on	
                                                                      0.5




          can	
  lead	
  to	
  a	
  “bexer”	
  curve,	
  	
  
                                                                                                              S(X)
                                                                                                                      0.0




          it	
  doesn’t	
  give	
  the	
  best	
  model	
                                                            -0.5




                                                                                                                             0.00     0.02     0.04          0.06           0.08

                                                                                                                                              SALI Cutoff

Sutherland,	
  J	
  et	
  al,	
  J.	
  Chem.	
  Inf.	
  Comput.	
  Sci.,	
  2003,	
  43,	
  1906-­‐1915	
  
SALI	
  Based	
  Model	
  Selec)on	
  
                                                                                                       RMSE             SCI                S(0) + D/2




      •  107	
  aryl	
  azoles	
  as	
  ER-­‐β	
  agonists	
  
                                                                                                0.5




                                                                                         S(X)
                                                                                                0.0



      •  Used	
  a	
  GA	
  and	
  2D	
  descriptors	
                                          -0.5



         to	
  iden9fy	
  models	
  
                                                                                                       0.0     0.2      0.4       0.6           0.8          1.0



      •  In	
  this	
  case,	
  a	
  SALI	
  based	
  	
                                               RMSE
                                                                                                                         SALI Cutoff


                                                                                                                        SCI                S(0) + D/2


         objec9ve	
  func9on	
  was	
  able	
  to	
  
         iden9fy	
  the	
  best	
  model	
                                                      0.5




      •  Interes9ngly,	
  SCI	
  does	
  not	
  	
  
                                                                                         S(X)
                                                                                                0.0




         seem	
  to	
  perform	
  very	
  well	
                                                -0.5




                                                                                                        0.00     0.02     0.04          0.06          0.08

                                                                                                                         SALI Cutoff



Malamas,	
  M.S.	
  et	
  al,	
  J	
  Med	
  Chem,	
  2004,	
  47,	
  5021-­‐5040	
  
SALI	
  Based	
  Model	
  Selec)on	
  

       •  The	
  size	
  of	
  the	
  solu9on	
  space	
  explored	
  
          depends	
  on	
  the	
  SALI	
  objec9ve	
  func9on	
  
       1.15




              BZR	
                                               ER-­‐β	
  




                                                           0.65
       1.10
       1.05




                                                           0.60
                                                    RMSE
RMSE

       1.00
       0.95




                                                           0.55
       0.90




                  RMSE        S(100)          SCI                  1/S(0) + D/2        RMSE            SCI

                         Objective Function                                       Objective Function
Predic)ng	
  the	
  Landscape	
  

       •  Rather	
  than	
  predic9ng	
  ac9vity	
  directly,	
  we	
  can	
  
          try	
  to	
  predict	
  the	
  SAR	
  landscape	
  
       •  Implies	
  that	
  we	
  axempt	
  to	
  directly	
  predict	
  cliffs	
  
                    –  Observa9ons	
  are	
  now	
  pairs	
  of	
  molecules	
  
       •  A	
  more	
  complex	
  problem	
  
                    –  Choice	
  of	
  features	
  is	
  trickier	
  
                    –  S9ll	
  face	
  the	
  problem	
  of	
  cliffs	
  as	
  outliers	
  
                    –  Somewhat	
  similar	
  to	
  predic9ng	
  ac9vity	
  differences	
  

Scheiber	
  et	
  al,	
  StaHsHcal	
  Analysis	
  and	
  Data	
  Mining,	
  2009,	
  2,	
  115-­‐122	
  
Predic)ng	
  Cliffs	
  
•  Dependent	
  variable	
  are	
  pairwise	
  SALI	
  values,	
  
   calculated	
  using	
  fingerprints	
  
•  Independent	
  variables	
  are	
  molecular	
  descriptors	
  
   –	
  but	
  considered	
  pairwise	
  
    –  Absolute	
  difference	
  of	
  descriptor	
  pairs,	
  or	
  
    –  Geometric	
  mean	
  of	
  descriptor	
  pairs	
  
    –  …	
  
•  Develop	
  a	
  model	
  to	
  correlate	
  pairwise	
  
   descriptors	
  to	
  pairwise	
  SALI	
  values	
  
A	
  Test	
  Case	
  
      •  We	
  first	
  consider	
  the	
  Cavalli	
  CoMFA	
  dataset	
  of	
  30	
  
         molecules	
  with	
  pIC50’s	
  
      •  Evaluate	
  topological	
  and	
  physicochemical	
  
         descriptors	
  
      •  Developed	
  random	
  forest	
  	
  
         models	
  
                    –  On	
  the	
  original	
  observed	
  	
  
                       values	
  (30	
  obs)	
  
                    –  On	
  the	
  SALI	
  values	
  	
  
                       (435	
  observa9ons)	
  


Cavalli,	
  A.	
  et	
  al,	
  J	
  Med	
  Chem,	
  2002,	
  45,	
  3844-­‐3853	
  
Double	
  Coun)ng	
  Structures?	
  
•  The	
  dependent	
  and	
  	
                                                  GeoMean


   independent	
  variables	
  both	
  	
                                                                     60


                                                                                                              50



   encode	
  structure.	
  	
                                                                                 40


                                                                                                              30



•  But	
  prexy	
  low	
  correla9ons	
  	
                                                                   20




   between	
  individual	
  pairwise	
  	
  
                                                                                                              10




                                                Percent of Total
                                                                                                              0




   descriptors	
  and	
  the	
  SALI	
  	
  
                                                                                      AbsDiff
                                                                   60




   values	
  
                                                                   50


                                                                   40


                                                                   30


                                                                   20


                                                                   10


                                                                    0

                                                                        0.00   0.05             0.10   0.15

                                                                                        R2
Model	
  	
  Summaries	
  
                                        Original	
  pIC50	
                                                                SALI,	
  AbsDiff	
                                                     SALI,	
  GeoMean	
  
                  9                      RMSE	
  =	
  0.97	
                                                               RMSE	
  =	
  1.10	
                                                    RMSE	
  =	
  1.04	
  
                                                                                                      6                                                                          6                                                 !
                  8
Predicted pIC50




                                                                                                                                                            !                                                             !!       !




                                                                                     Predicted SALI




                                                                                                                                                                Predicted SALI
                                                                                                                                           !       !    !
                                                                                                                                                   !        !                                                              !
                                                         !           !           !                                                             !!!
                                                                                                                                                 ! !                                                           ! !      ! !! !
                                                                         !                                                                         !                                                                    !
                                                         !                                                                        !       !    !                                                                   ! !! !
                  7                                              ! !
                                                                 ! !                                                                     !!! ! ! !
                                                                                                                                           !     !                                                                 ! ! ! !
                                                                                                                                                                                                                    !              !
                                                 !
                                                     !
                                                         !
                                                                                                      4                       !       ! ! !! !
                                                                                                                                             !        !     !
                                                                                                                                                                                 4                             !! !! ! ! !
                                                                                                                                                                                                                   !
                                                                                                                                                                                                           ! ! ! !!! !!
                                                                                                                                                                                                           !        !
                                            !    !                           !                                                  !! !! ! ! ! !
                                                                                                                                  ! !!
                                                                                                                                   ! !     !                                                                    !  !! !        !
                        !       !                                                                                            !                    !!                                     !             !! ! !
                                                                                                                                                                                                         ! !!      ! !
                                                             !                                                               !      !!                                                             !               !     !!
                                                                                                                      ! ! ! ! ! !! !
                                                                                                                               !! ! !!                                                                    ! ! !!! ! !
                                                                                                                                                                                                              !
                                                                                                                     ! ! !!!!!!! ! ! ! ! !
                                                                                                                                !            !      !                                        !!      !
                                                                                                                                                                                                    !!
                                                                                                                                                                                                          ! !! !
                                                                                                                                                                                                       ! ! ! ! ! !! ! !
                  6                       !                                                                               ! ! !! ! ! ! !
                                                                                                                              ! ! !!      !                                                               ! ! !!
                                                                                                                                                                                      ! ! !!!! ! !!!!!!!! ! ! ! ! ! !
                                                                                                                                                                                                   ! ! !!! !!
                                     !                   !                                                  !      ! ! ! !!! ! ! ! ! ! !!!! ! ! !
                                                                                                                    !! ! !!! !!
                                                                                                                          !       !                                                         ! !!! ! !!!!! !
                                                                                                                                                                                                !
                                                                                                                                                                                                ! ! ! !! !!
                                    ! !
                                                                                                           ! ! !! ! !!!!! ! !!!! !
                                                                                                            ! ! ! ! !! !! ! !! ! !
                                                                                                                   ! ! !   !    !     !       !!                                        ! ! !! !!!!!! !!!!! !!
                                                                                                                                                                                                     ! ! !! ! !
                                                                                                                                                                                                     !
                                                                 !                                            !     !
                                                                                                          !! ! ! !!! !!!! !!!! !!! ! ! !
                                                                                                                                !!
                                                                                                              ! !!!!! !! ! ! ! ! ! !   !                                                 !      ! ! !! ! !
                                                                                                                                                                                                 ! ! !
                                                                                                                                                                                      ! !! !! ! !! !! !! !! ! !!
                                                                                                                                                                                         ! !    !! ! !!
                        !             !                                                                    !!! !!!!!!!!!! !! ! ! !!
                                                                                                                 !! !!!! ! ! ! !
                                                                                                                   !! !! ! ! !                                                       !!!!!!!!!!!! ! !! ! !
                                                                                                                                                                                          !
                                                                                                                                                                                     !! !!!!!!!!! !!!!! !!
                                                                                                                                                                                       ! ! ! !! ! !
                                        !                                                             2   !!!!!!! ! !!
                                                                                                           ! ! ! ! !
                                                                                                               ! !
                                                                                                           ! !! !! ! !
                                                                                                          ! !!!! !!!! ! !!
                                                                                                                        !
                                                                                                           ! !!!! ! ! ! ! !!
                                                                                                          ! !!!!!!! !!! !!                                                       2               !
                                                                                                                                                                                       ! ! !!!!!!! !!! !
                                                                                                                                                                                       ! !!!!!! ! ! ! ! !
                                                                                                                                                                                     ! ! ! ! !!!! ! !! !
                                                                                                                                                                                               ! ! !            !
                                                                                                          !!!!!!!!!!!! !! ! !
                                                                                                           ! !! ! !!!
                                                                                                                  ! !
                                                                                                           ! !! !!!!! ! !! ! ! !
                                                                                                                                                                                      ! !
                                                                                                                                                                                     ! !!!!!!!
                                                                                                                                                                                     ! !!! !
                                                                                                                                                                                      !!! !! ! ! ! ! !
                                                                                                                                                                                     ! ! ! ! !!! ! ! ! !
                  5                                                                                       ! ! !
                                                                                                           !     !
                                                                                                            ! ! !!!!! ! !
                                                                                                          ! !! ! !                 !                                                 !!! !!! !!!!! !
                                                                                                                                                                                     !!! !!! !!!! ! !
                                                                                                                                                                                      !! ! ! ! !
                                                                                                                                                                                         ! !
                                                                                                                                                                                         !! !
                            !                                                                             ! ! ! !
                                                                                                              !
                                                                                                             ! ! !! ! ! !
                                                                                                             !                                                                       ! ! ! !!!! ! !
                                                                                                                                                                                      ! ! !! !!
                                                                                                                                                                                         ! !! !
                                                                                                                                                                                            !!          !
                        !                                                                                     ! !       !                                                             !
                                !                                                                         !!!                                                                         !
                                                                                                                                                                                     !! !
                                                                                                                                                                                       !!
                                                                                                                                                                                       !!
                  4
                                                                                                      0                                                                          0


                        4           5        6               7           8       9                        0            2               4           6                                 0           2           4            6

                                    Observed pIC50                                                                    Observed SALI                                                             Observed SALI



                      •  All	
  models	
  explain	
  similar	
  %	
  of	
  variance	
  of	
  
                         their	
  respec9ve	
  datasets	
  	
  
                      •  Using	
  geometric	
  mean	
  as	
  the	
  descriptor	
  
                         aggrega9on	
  func9on	
  seems	
  to	
  perform	
  best	
  
                      •  SALI	
  models	
  are	
  more	
  robust	
  due	
  to	
  larger	
  size	
  
                         of	
  the	
  dataset	
  
Test	
  Case	
  2	
  

             •  Considered	
  the	
  Holloway	
  docking	
  dataset,	
  32	
  
                molecules	
  with	
  pIC50’s	
  and	
  Einter	
  
             •  Similar	
  strategy	
  as	
  before	
  
             •  Need	
  to	
  transform	
  SALI	
  values	
  	
  
             •  Descriptors	
  show	
  minimal	
  	
  
                correla9on	
                                                                      50


                                                                                                                                                                30

                                                                                                  40
                                                                               Percent of Total




                                                                                                                                             Percent of Total
                                                                                                  30
                                                                                                                                                                20




                                                                                                  20


                                                                                                                                                                10

                                                                                                  10




                                                                                                   0                                                             0


                                                                                                       0   20   40     60   80   100   120                           -1   0              1   2
Holloway,	
  M.K.	
  et	
  al,	
  J	
  Med	
  Chem,	
  1995,	
  38,	
  305-­‐317	
                                   SALI                                                 log10 (SALI)
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Contenu connexe

Similaire à Structure-Activity Relationships and Networks: A Generalized Approach to Exploring Structure-Activity Landscapes

Single multifunctional organocatalyst
Single multifunctional organocatalystSingle multifunctional organocatalyst
Single multifunctional organocatalyst
Ly Nguyen Hai Du
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
baoilleach
 

Similaire à Structure-Activity Relationships and Networks: A Generalized Approach to Exploring Structure-Activity Landscapes (9)

Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
 
Graduate Research @ USC
Graduate Research @ USCGraduate Research @ USC
Graduate Research @ USC
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information management
 
Single multifunctional organocatalyst
Single multifunctional organocatalystSingle multifunctional organocatalyst
Single multifunctional organocatalyst
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
Elucidating undecipherable chemical structures using computer assisted struct...
Elucidating undecipherable chemical structures using computer assisted struct...Elucidating undecipherable chemical structures using computer assisted struct...
Elucidating undecipherable chemical structures using computer assisted struct...
 
Postdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWDPostdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWD
 
Artificial photosynthesis cint 0711-2010
Artificial photosynthesis cint 0711-2010Artificial photosynthesis cint 0711-2010
Artificial photosynthesis cint 0711-2010
 
Qm Treatment AECOM Presentation
Qm Treatment AECOM PresentationQm Treatment AECOM Presentation
Qm Treatment AECOM Presentation
 

Plus de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
Rajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
Rajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
Rajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
Rajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Rajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
Rajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
Rajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
Rajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
Rajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
Rajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
Rajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
Rajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
Rajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
Rajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
Rajarshi Guha
 

Plus de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Structure-Activity Relationships and Networks: A Generalized Approach to Exploring Structure-Activity Landscapes

  • 1. Structure-­‐Ac)vity  Rela)onships  and   Networks:  A  Generalized  Approach   to  Exploring  Structure-­‐Ac)vity   Landscapes   Rajarshi  Guha   NIH  Chemical  Genomics  Center  /   NIH  Center  for  Transla9onal  Therapeu9cs   March  29,  2011  
  • 2. NIH  Chemical  Genomics  Center   •  Founded  2004  as  part  of  NIH  Roadmap  Molecular  Libraries  Ini9a9ve   –  NCGC  staffed  with  90+  scien9sts  –  biologists,  chemists,  informa9cians,  engineers   –  Post-­‐doc  program   •  Mission   –  MLPCN  (screening  &  chemical  synthesis;  compound  repository;  PubChem  database;   funding  for  assay,  library  and  technology  development  )   •  Complements  individual  inves9gator-­‐ini9ated  research  programs   •  Enables  “pharma-­‐level”  HTS  and  early  chemical  op9miza9on   –  Develop  new  chemical  probes  for  basic  research  and  leads  for  therapeu9c  development,   par9cularly  for  rare/neglected  diseases   –  New  paradigms  &  applica9ons  of  HTS  for  chemical  biology  /  chemical  genomics   •  All  NCGC  projects  are  collabora9ons  with  a  target  or  disease  expert;    currently  >200   collabora9ons  with  inves9gators  worldwide     –  75%  NIH  extramural,  10%  NIH  intramural,  15%  Founda9ons/Research  Consor9a/Pharma/ Biotech  
  • 3. NCGC  Project  Diversity   (A) Disease areas (B) Target types (C) Detection methods
  • 4. qHTS:    High  Throughput  Dose  Response   Assay concentration ranges over 4 logs Informatics pipeline. Automated curve fitting A   (high:~ 100 μM) 1536-well plates, inter-plate dilution series and classification. 300K samples C   Assay volumes 2 – 5 μL B   Automated concentration-response data collection ~1 CRC/sec
  • 5. Background   •  Cheminforma9cs  methods   –  QSAR,  diversity  analysis,  virtual  screening,     fragments,  polypharmacology,  networks   •  More  recently   –  RNAi  screening,  high  content  imaging   •  Extensive  use  of  machine  learning   •  All  9ed  together  with  socware     development   –  User-­‐facing  GUI  tools   –  Low  level  programma9c  libraries   •  Believer  &  prac99oner  of  Open  Source  
  • 6. Outline   •  Structure-­‐ac9vity  rela9onships   •  Characterizing  ac9vity  cliffs   •  Working  with  the  structure-­‐ac9vity  landscape  
  • 7. Structure  Ac)vity  Rela)onships   •  Similar  molecules  will  have  similar  ac9vi9es   •  Small  changes  in  structure  will  lead  to  small   changes  in  ac9vity   •  One  implica9on  is  that  SAR’s  are  addi9ve   •  This  is  the  basis  for  QSAR  modeling   Mar9n,  Y.C.  et  al.,  J.  Med.  Chem.,  2002,  45,  4350–4358  
  • 8. Excep)ons  Are  Easy  to  Find   F3C Cl Cl F3C Cl Cl NH2 NH2 N N N N NH2 NH O O O Ki  =  39.0  nM   Ki  =  1.8  nM   F3C Cl Cl F3C Cl Cl NH2 NH2 N N N N NH NH O NH2 O O O NH2 Ki  =  10.0  nM   Ki  =  1.0  nM   Tran,  J.A.  et  al.,  Bioorg.  Med.  Chem.  Le2.,  2007,  15,  5166–5176  
  • 9. Structure  Ac)vity  Landscapes   •  Rugged  gorges  or  rolling  hills?   –  Small  structural  changes  associated  with  large   ac9vity  changes  represent  steep  slopes  in  the   landscape   –  But  tradi9onally,  QSAR  assumes  gentle  slopes     –  Machine  learning  is  not  very  good  for  special   cases   Maggiora,  G.M.,  J.  Chem.  Inf.  Model.,  2006,  46,  1535–1535  
  • 11. Characterizing  the  Landscape   •  A  cliff  can  be  numerically  characterized   •  Structure  Ac9vity  Landscape  Index  (SALI)   Ai − A j SALIi, j = 1− sim(i, j) •  Cliffs  are  characterized  by  elements  of  the   matrix  with  very  large  values   € Guha,  R.;  Van  Drie,  J.H.,  J.  Chem.  Inf.  Model.,  2008,  48,  646–658  
  • 13. Fingerprints   1 0 1 1 0 0 0 1 0 •  Lots  of  types  of  fingerprints     •  Indicates  the  presence  or  absence  of  a  structural   feature     •  Length  can  vary  from  166  to  4096  bits  or  more     •  Fingerprints  usually  compared  using  the   Tanimoto  metric  
  • 14. Varying  Fingerprint  Methods   BCI 1052 bit MACCS 166 bit CDK 1024 bit 8 8 8 6 6 6 Density Density Density 4 4 4 2 2 2 0 0 0 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.6 0.7 0.8 0.9 1.0 Tanimoto Similarity Tanimoto Similarity Tanimoto Similarity •  Shorter  fingerprints  will  lead  to  more  “similar”  pairs   •  Requires  a  higher  cutoff  to  focus  on  significant  cliffs  
  • 16. Different  Ac)vity  Representa)ons   •  Using  the  Hill  parameters  from  a  dose-­‐response   curve  represents  richer  data  than  a  single  IC50   SInf ⎧ S0 ⎫ ⎪ ⎪ ⎪ Sinf ⎪ d(Pi ,P j ) SALIi, j = 50% ⎨ ⎬ Activity ⎪ AC50 ⎪ 1− sim(i, j) ⎪ H ⎪ ⎩ ⎭ S0 AC50 Concentration €
  • 17. Visualizing  SALI  Values   •  Alterna9ves?   –  A  heatmap  is  an  easy  to  understand  visualiza9on   –  Coupled  with  brushing,  can  be  a  handy  tool   –  A  more  flexible  approach  is  to  consider  a  network   view  of  the  matrix     •  The  SALI  graph   –  Compounds  are  nodes   –  Nodes  i,j  are  connected  if  SALI(i,j)  >  X   –  Only  display  connected  nodes  
  • 18. Visualizing  SALI  Values   •  The  SALI  graph   –  Compounds  are  nodes   –  Nodes  i,j  are  connected  if  SALI(i,j)  >  X   –  Only  display  connected  nodes   ! 17 !!!!!!!!! 7 13 29 43 49 45 54 59 76 ! 15 ! 28 ! !!!!!!! 6 52 44 50 46 55 60 75 ! ! 3 18 !! 2 35 !! ! 20 22 9 ! 64 ! 69 ! 21 ! 34 ! 38 ! 8 ! 65 ! 24 ! ! 1 71 !! 12 58 !! 63 10 !! ! !! 68 27 23 41 42 !!!! 72 73 31 51 ! 39 ! 5 ! ! 19 62 ! 25 ! 57 ! 56 !!! 30 53 37 ! 4 ! 40 ! 66
  • 19. Varying  the  Cutoff   •  The  cutoff  controls  the  complexity  of  the  graph     •  Higher  cut  offs  will  highlight  the  most  significant   ac9vity  cliffs   Cutoff = 90% Cutoff = 50% Cutoff = 20% ! !!!!!!!!! ! ! ! ! ! !!!!! ! !!!!!! 17 7 13 29 43 49 45 54 59 76 9 17 15 13 12 22 23 29 38 41 64 43 45 49 54 59 63 ! ! 9 17 ! 15 ! ! ! !!! ! 13 12 21 22 29 35 38 !64 !!!!!! 43 45 49 54 59 63 ! 15 ! 28 ! !!!!!!! 6 52 44 50 46 55 60 75 ! !! 1 28 3 !! !!!!!!!!!!!!! 6 19 24 25 52 39 57 42 56 44 46 50 55 60 62 ! !! 1 28 3 !! ! !!! !!!! !!!!!!!! 6 19 23 24 52 65 39 41 42 56 58 66 44 46 50 55 60 62 ! ! 3 18 !! 2 35 !! ! 20 22 9 ! 64 ! 69 ! 21 ! 34 ! 38 ! 2 ! 8 !40 ! 2 ! 8 ! ! 40 25 ! 37 !57 ! 8 ! 65 ! 24 ! ! 1 71 !! 12 58 !! 63 10 !! ! !! 68 27 23 41 42 !!!! 72 73 31 51 ! 39 ! 5 ! ! 19 62 ! 25 ! 57 ! 56 !!! 30 53 37 ! 5 ! 5 ! 4 ! 40 ! 4 ! 4 ! 66
  • 20. BePer  Visualiza)on  -­‐  SALIViewer   hPp://sali.rguha.net  
  • 21. What  Can  We  Do  With  SALI’s?   •  SALI  characterizes  cliffs  &  non-­‐cliffs   •  For  a    given  molecular  representa9on,  SALI’s   gives  us  an  idea  of    the   smoothness  of  the     SAR  landscape   •  Models  try  and  encode   this  landscape   •  Use  the  landscape  to  guide   descriptor  or  model     selec9on  
  • 22. Descriptor  Space  Smoothness   gatifloxacin granisetron dolasetron perhexiline amitriptyline diltiazem sparfloxacin grepafloxacin sildenafil moxifloxacin gatifloxacin moxifloxacin grepafloxacin sildenafil sparfloxacin diltiazem amitriptyline dolasetron granisetron imipramine perhexiline 400 Number of Edges in SALI Graph mibefradil chlorpromazine azimilide bepridil cisapride E-4031 sertindole pimozide dofetilide droperidol thioridazine haloperidol domperidone loratadine mizolastine bepridil azimilide mibefradil chlorpromazine imipramine halofantrine mizolastine loratadine domperidone verapamil terfenadine sertindole dofetilide haloperidol thioridazine droperidol 300 E-4031 cisapride pimozide astemizole astemizole 200 grepafloxacin sildenafil moxifloxacin gatifloxacin 100 0 0.0 0.2 0.4 0.6 0.8 1.0 astemizole SALI Cutoff •  Edge  count  of  the  SALI  graph  for  varying  cutoffs   •  Measures  smoothness  of  the  descriptor  space   •  Can  reduce  this  to  a  single  number  (AUC)  
  • 23. Other  Examples   400 •  Instead  of  fingerprints,     Number of Edges in SALI Graph 300 we  use  molecular     200 2D   descriptors   100 •  SALI  denominator  now     0 uses  Euclidean  distance   0.0 0.2 0.4 0.6 SALI Cutoff 0.8 1.0 •  2D  &  3D  random     descriptor  sets   400 Number of Edges in SALI Graph –  None  are  really  good   300 3D   –  Too  rough,  or   200 –  Too  flat   100 0 0.0 0.2 0.4 0.6 0.8 1.0 SALI Cutoff
  • 24. Feature  Selec)on  Using  SALI   •  Surprisingly,  exhaus9ve  search  of  66,000  4-­‐ descriptor  combina9ons  did  not  yield  semi-­‐ smoothly  decreasing  curves   •  Not  en9rely  clear  what  type  of  curve  is  desirable  
  • 25. SALI  Graphs  &  Predic)ve  Models   •  The  graph  view  allows  us  to  view  SAR’s  and  iden9fy   trends  easily   •  The  aim  of  a  QSAR  model  is  to  encode  SAR’s   •  Tradi9onally,  we  consider  the  quality  of  a  model  in   terms  of  RMSE  or  R2   •  But  in  general,  we’re  not  as  interested  in  RMSE’s  as   we  are  in  whether  the  model  predicted  something   as  more  ac9ve  than  something  else     –  What  we  want  to  have  is  the  correct  ordering   –  We  assume  the  model  is  sta9s9cally  significant  
  • 26. Measuring  Model  Quality   •  A  QSAR  model  should  easily  encode  the  “rolling   hills”   •  A  good  model  captures  the  most  significant  cliffs   •  Can  be  formalized  as        How  many  of  the  edge  orderings  of  a  SALI  graph                    does  the  model  predict  correctly?   •  Define  S  (X  ),  represen9ng  the  number  of  edges   correctly  predicted  for  a  SALI  network  at  a  threshold   X   •  Repeat  for  varying  X  and  obtain  the  SALI  curve  
  • 27. SALI  Curves   1.0 1.0 0.5 0.5 S(X) S(X) 0.0 0.0 !0.5 !0.5 3!descriptor 5!descriptor Scrambled 3!descriptor !1.0 SCI = 0.12 !1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 X X
  • 28. Model  Search  Using  the  SCI   •  We’ve  used  the  SALI  to  retrospec9vely  analyze   models   •  Can  we  use  SALI  to  develop  models?   –  Iden9fy  a  model  that  captures  the  cliffs   •  Tricky   –  Cliffs  are  fundamentally  outliers   –  Op9mizing  for  good  SALI  values  implies  overfivng   –  Need  to  trade-­‐off  between  SALI  &  generalizability  
  • 29. The  Objec)ve  Func)on   •  S0  is  a  measure  of  the  models   1.0 ability  to  summarize  the  dataset   0.9 S100   S(X) 0.8 (analogous  to  RMSE)   S   0.7 0 •  S100  measures  the  models   0.6 ability  to  capture  cliffs   0.0 0.2 0.4 0.6 0.8 1.0 SALI Cutoff •  Ideally,  the  curve  starts  high  and  stays  high   1 1 (S100 − S0 ) 1 F= F= + F= S100 S0 2 SCI
  • 30. SALI  Based  Model  Selec)on   RMSE SCI S(100) •  Considered  the  BZR  dataset     0.5 from  Sutherland  et  al   S(X) 0.0 •  Iden9fied  “best”  models   -0.5 using  a  GA  to  select  from  a     0.0 0.2 0.4 0.6 SALI Cutoff 0.8 1.0 pool  of  2D  descriptors   RMSE SCI S(100) •  While  SALI  based  op9miza9on   0.5 can  lead  to  a  “bexer”  curve,     S(X) 0.0 it  doesn’t  give  the  best  model   -0.5 0.00 0.02 0.04 0.06 0.08 SALI Cutoff Sutherland,  J  et  al,  J.  Chem.  Inf.  Comput.  Sci.,  2003,  43,  1906-­‐1915  
  • 31. SALI  Based  Model  Selec)on   RMSE SCI S(0) + D/2 •  107  aryl  azoles  as  ER-­‐β  agonists   0.5 S(X) 0.0 •  Used  a  GA  and  2D  descriptors   -0.5 to  iden9fy  models   0.0 0.2 0.4 0.6 0.8 1.0 •  In  this  case,  a  SALI  based     RMSE SALI Cutoff SCI S(0) + D/2 objec9ve  func9on  was  able  to   iden9fy  the  best  model   0.5 •  Interes9ngly,  SCI  does  not     S(X) 0.0 seem  to  perform  very  well   -0.5 0.00 0.02 0.04 0.06 0.08 SALI Cutoff Malamas,  M.S.  et  al,  J  Med  Chem,  2004,  47,  5021-­‐5040  
  • 32. SALI  Based  Model  Selec)on   •  The  size  of  the  solu9on  space  explored   depends  on  the  SALI  objec9ve  func9on   1.15 BZR   ER-­‐β   0.65 1.10 1.05 0.60 RMSE RMSE 1.00 0.95 0.55 0.90 RMSE S(100) SCI 1/S(0) + D/2 RMSE SCI Objective Function Objective Function
  • 33. Predic)ng  the  Landscape   •  Rather  than  predic9ng  ac9vity  directly,  we  can   try  to  predict  the  SAR  landscape   •  Implies  that  we  axempt  to  directly  predict  cliffs   –  Observa9ons  are  now  pairs  of  molecules   •  A  more  complex  problem   –  Choice  of  features  is  trickier   –  S9ll  face  the  problem  of  cliffs  as  outliers   –  Somewhat  similar  to  predic9ng  ac9vity  differences   Scheiber  et  al,  StaHsHcal  Analysis  and  Data  Mining,  2009,  2,  115-­‐122  
  • 34. Predic)ng  Cliffs   •  Dependent  variable  are  pairwise  SALI  values,   calculated  using  fingerprints   •  Independent  variables  are  molecular  descriptors   –  but  considered  pairwise   –  Absolute  difference  of  descriptor  pairs,  or   –  Geometric  mean  of  descriptor  pairs   –  …   •  Develop  a  model  to  correlate  pairwise   descriptors  to  pairwise  SALI  values  
  • 35. A  Test  Case   •  We  first  consider  the  Cavalli  CoMFA  dataset  of  30   molecules  with  pIC50’s   •  Evaluate  topological  and  physicochemical   descriptors   •  Developed  random  forest     models   –  On  the  original  observed     values  (30  obs)   –  On  the  SALI  values     (435  observa9ons)   Cavalli,  A.  et  al,  J  Med  Chem,  2002,  45,  3844-­‐3853  
  • 36. Double  Coun)ng  Structures?   •  The  dependent  and     GeoMean independent  variables  both     60 50 encode  structure.     40 30 •  But  prexy  low  correla9ons     20 between  individual  pairwise     10 Percent of Total 0 descriptors  and  the  SALI     AbsDiff 60 values   50 40 30 20 10 0 0.00 0.05 0.10 0.15 R2
  • 37. Model    Summaries   Original  pIC50   SALI,  AbsDiff   SALI,  GeoMean   9 RMSE  =  0.97   RMSE  =  1.10   RMSE  =  1.04   6 6 ! 8 Predicted pIC50 ! !! ! Predicted SALI Predicted SALI ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! 7 ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 4 ! ! ! !! ! ! ! ! 4 !! !! ! ! ! ! ! ! ! !!! !! ! ! ! ! ! !! !! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! !! ! !! ! ! ! !! ! ! ! ! !! ! ! !! ! ! ! ! ! !! ! !! ! !! ! ! !!! ! ! ! ! ! !!!!!!! ! ! ! ! ! ! ! ! !! ! !! ! !! ! ! ! ! ! ! !! ! ! 6 ! ! ! !! ! ! ! ! ! ! !! ! ! ! !! ! ! !!!! ! !!!!!!!! ! ! ! ! ! ! ! ! !!! !! ! ! ! ! ! ! !!! ! ! ! ! ! !!!! ! ! ! !! ! !!! !! ! ! ! !!! ! !!!!! ! ! ! ! ! !! !! ! ! ! ! !! ! !!!!! ! !!!! ! ! ! ! ! !! !! ! !! ! ! ! ! ! ! ! ! !! ! ! !! !!!!!! !!!!! !! ! ! !! ! ! ! ! ! ! !! ! ! !!! !!!! !!!! !!! ! ! ! !! ! !!!!! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! !! ! !! !! !! !! ! !! ! ! !! ! !! ! ! !!! !!!!!!!!!! !! ! ! !! !! !!!! ! ! ! ! !! !! ! ! ! !!!!!!!!!!!! ! !! ! ! ! !! !!!!!!!!! !!!!! !! ! ! ! !! ! ! ! 2 !!!!!!! ! !! ! ! ! ! ! ! ! ! !! !! ! ! ! !!!! !!!! ! !! ! ! !!!! ! ! ! ! !! ! !!!!!!! !!! !! 2 ! ! ! !!!!!!! !!! ! ! !!!!!! ! ! ! ! ! ! ! ! ! !!!! ! !! ! ! ! ! ! !!!!!!!!!!!! !! ! ! ! !! ! !!! ! ! ! !! !!!!! ! !! ! ! ! ! ! ! !!!!!!! ! !!! ! !!! !! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! 5 ! ! ! ! ! ! ! !!!!! ! ! ! !! ! ! ! !!! !!! !!!!! ! !!! !!! !!!! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !!!! ! ! ! ! !! !! ! !! ! !! ! ! ! ! ! ! ! !!! ! !! ! !! !! 4 0 0 4 5 6 7 8 9 0 2 4 6 0 2 4 6 Observed pIC50 Observed SALI Observed SALI •  All  models  explain  similar  %  of  variance  of   their  respec9ve  datasets     •  Using  geometric  mean  as  the  descriptor   aggrega9on  func9on  seems  to  perform  best   •  SALI  models  are  more  robust  due  to  larger  size   of  the  dataset  
  • 38. Test  Case  2   •  Considered  the  Holloway  docking  dataset,  32   molecules  with  pIC50’s  and  Einter   •  Similar  strategy  as  before   •  Need  to  transform  SALI  values     •  Descriptors  show  minimal     correla9on   50 30 40 Percent of Total Percent of Total 30 20 20 10 10 0 0 0 20 40 60 80 100 120 -1 0 1 2 Holloway,  M.K.  et  al,  J  Med  Chem,  1995,  38,  305-­‐317   SALI log10 (SALI)