SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Chemogenomics	
  in	
  the	
  cloud	
  
       Is	
  the	
  sky	
  the	
  limit?	
  

          Rajarshi	
  Guha,	
  Ph.D.	
  
 NIH	
  Center	
  for	
  Transla:onal	
  Therapeu:cs	
  
                              	
  
                    June	
  28,	
  2012	
  
The	
  cloud	
  as	
  infrastructure	
  
•  Cloud	
  compu:ng	
  is	
  a	
  service	
  for	
  
    –  Infrastructure	
  
    –  PlaForm	
  
    –  SoHware	
  
•  Much	
  of	
  the	
  benefits	
  of	
  cloud	
  compu:ng	
  are	
  
    –  Economic	
  
    –  Poli:cal	
  
•  Won’t	
  be	
  discussing	
  the	
  remote	
  hos:ng	
  aspects	
  
   of	
  clouds	
  
Characteris8cs	
  of	
  the	
  cloud	
  

              Virtually          Pay-per-use
             assemble



  Offsite                   Cloud                                Shared
technology                Computing                             workloads



                                                Massive
                  On-demand                      scale
                  self service

                                      hPp://www.slideshare.net/haslinatuanhim/slides-­‐cloud-­‐compu:ng	
  
Parallel	
  compu8ng	
  in	
  the	
  cloud	
  
•  Modern	
  cloud	
  vendors	
  make	
  provisioning	
  
   compute	
  resources	
  easy	
  
    –  Allows	
  one	
  to	
  handle	
  unpredictable	
  loads	
  easily	
  
    –  Pay	
  only	
  for	
  what	
  you	
  need	
  
•  Chemistry	
  applica:ons	
  don’t	
  usually	
  have	
  very	
  
   dynamic	
  loads	
  
•  But	
  large	
  scale	
  resources	
  are	
  an	
  opportunity	
  for	
  
   large	
  scale	
  (parallel)	
  computa:ons	
  
Storing	
  chemical	
  informa8on	
  
•  Fill	
  up	
  a	
  hard	
  drive,	
  mail	
  to	
  Amazon	
  
•  Copy	
  over	
  the	
  network	
  
    –  Aspera	
  
    –  GridFTP	
  
•  S:ll	
  need	
  to	
  pay	
  for	
  	
  
   storage	
  space	
  
•  Lots	
  of	
  op:ons	
  on	
  the	
  
   cloud	
  –	
  S3,	
  rela:onal	
  DB’s	
  
•  See	
  Chris	
  Dagdigian’s	
  talk	
  for	
  views	
  on	
  storage	
  
                                                    hPp://www.slideshare.net/chrisdag/2012-­‐trends-­‐from-­‐the-­‐trenches	
  
Recoding	
  for	
  the	
  cloud?	
  
•  Only	
  if	
  we	
  really	
  have	
  to	
  
•  Large	
  amounts	
  of	
  legacy	
  code,	
  	
  
   runs	
  perfectly	
  well	
  on	
  local	
  clusters	
  
    –  May	
  not	
  make	
  sense	
  to	
  recode	
  
       as	
  a	
  map-­‐reduce	
  job	
  
    –  May	
  not	
  be	
  possible	
  to	
  
                                                                                                                    ?	
  
•  Different	
  levels	
  of	
  HPC	
  on	
  the	
  cloud	
  
    –  Legacy	
  HPC	
  
    –  ‘Cloudy’	
  HPC	
  
    –  Big	
  Data	
  HPC	
  
                                            hPp://www.slideshare.net/chrisdag/mapping-­‐life-­‐science-­‐informa:cs-­‐to-­‐the-­‐cloud	
  
Recoding	
  for	
  the	
  cloud?	
  


•  Use	
  cloud	
  resources	
  in	
            •  Make	
  use	
  of	
  cloud	
                         •  Huge	
  datasets	
  
   the	
  same	
  way	
  as	
  a	
  local	
        capabili:es	
                                        •  Candidates	
  for	
  map-­‐
   cluster	
                                    •  Old	
  algorithms,	
  new	
                             reduce	
  
•  MIT	
  StarCluster	
  makes	
                   infrastructure	
                                     •  Involves	
  algorithm	
  	
  
   this	
  easy	
  to	
  do	
                   •  Spot	
  instances,	
  SNS,	
                            (re)design	
  
                                                   SQS	
  SimpleDB,	
  S3,	
  etc	
  

Legacy	
                                        Cloudy	
                                                 Big	
  Data	
  
HPC	
                                           HPC	
                                                    HPC	
  



                                                                        hPp://www.slideshare.net/chrisdag/mapping-­‐life-­‐science-­‐informa:cs-­‐to-­‐the-­‐cloud	
  
How	
  does	
  the	
  cloud	
  enable	
  science?	
  
•  How	
  does	
  the	
  cloud	
  change	
  computa:onal	
  
   chemistry,	
  cheminforma:cs,	
  …	
  
         –  The	
  way	
  we	
  do	
  them	
  
         –  The	
  scale	
  at	
  which	
  we	
  do	
  them	
  
	
  
        Are	
  there	
  problems	
  that	
  we	
  can	
  address	
  that	
  	
  
       we	
  could	
  not	
  have	
  if	
  we	
  didn’t	
  have	
  on-­‐demand,	
  	
  
                        scalable	
  cloud	
  resources?	
  
Big	
  data	
  &	
  cheminforma8cs	
  
•  Computa:on	
  over	
  large	
  chemical	
  databases	
  
   –  Pubchem,	
  ChEMBL,	
  …	
  
•  What	
  types	
  of	
  computa:ons?	
  
   –  Searches	
  (substructure,	
  pharmacophore,	
  ….)	
  
   –  QSAR	
  models	
  over	
  large	
  data	
  
   –  Predic:ons	
  for	
  large	
  data	
  
•  Certain	
  applica:ons	
  just	
  need	
  structures	
  
•  Access	
  to	
  correspondingly	
  massive	
  experimental	
  
   datasets	
  is	
  tough	
  (impossible?)	
  
Big	
  data	
  &	
  cheminforma8cs	
  
•  GDB-­‐13	
  is	
  a	
  truly	
  big	
  database	
  –	
  977	
  million	
  
   different	
  structures	
  
    –  Current	
  search	
  interface	
  is	
  based	
  on	
  NN	
  searches	
  
       using	
  a	
  reduced	
  representa:on	
  
    –  Could	
  be	
  a	
  good	
  candidate	
  for	
  a	
  Hadoop	
  based	
  
       analysis	
  
•  More	
  generally,	
  enumerated	
  virtual	
  libraries	
  can	
  
   also	
  lead	
  to	
  very	
  big	
  data	
  
    –  Time	
  required	
  to	
  enumerate	
  is	
  a	
  boPleneck	
  
Big	
  data	
  &	
  cheminforma8cs	
  
•  Fundamentally,	
  “big	
  chemical	
  data”	
  lets	
  us	
  
   explore	
  larger	
  chemical	
  spaces	
  	
  
    –  Can	
  plow	
  through	
  large	
  catalogs	
  
    –  e.g.,	
  iden:fying	
  PKR	
  inhibitors	
  by	
  LBVS	
  of	
  the	
  
       ChemNavigator	
  collec:on	
  [Bryk	
  et	
  al]	
  
•  This	
  can	
  push	
  predic:ve	
  models	
  to	
  their	
  limits 	
  	
  
    –  Brings	
  us	
  back	
  to	
  the	
  global	
  vs	
  local	
  arguments	
  
The	
  Hadoop	
  ecosystem	
  
•  A	
  framework	
  for	
  the	
  map-­‐reduce	
  agorithm	
  
    –  Not	
  something	
  you	
  can	
  download	
  and	
  just	
  run	
  
    –  Need	
  to	
  implement	
  the	
  infrastructure	
  and	
  then	
  
       develop	
  code	
  to	
  run	
  using	
  the	
  infrastructure	
  
•  Low	
  level	
  Hadoop	
  programs	
  can	
  be	
  large,	
  
   complex	
  and	
  tedious	
  
•  Abstrac:ons	
  have	
  been	
  developed	
  that	
  make	
  
   Hadoop	
  queries	
  more	
  SQL-­‐like	
  –	
  results	
  in	
  much	
  
   more	
  concise	
  code	
  
The	
  Hadoop	
  ecosystem	
  

             Chukwa                            Zookeeper                                   Flume                         Pig

               HBase                                Mahout                                   Avro                       Whirr

                                   Map Reduce Engine                                                                    Hama

                                    Hadoop Distributed
                                                                                                                        Hive
                                       Filesystem

                                                         Hadoop Common


Based	
  on	
  hPp://www.slideshare.net/informa:cacorp/101111-­‐part-­‐3-­‐maP-­‐asleP-­‐the-­‐hadoop-­‐ecosystem	
  
Simplifying	
  Hadoop	
  applica8ons	
  
•  Raw	
  Hadoop	
  	
  
   programs	
  can	
  	
  
   be	
  very	
  	
  
   tedious	
  to	
  	
  
   write	
  



                                     SMARTS	
  based	
  	
  
                                     substructure	
  search	
  	
  
Pig	
  &	
  Pig	
  La8n	
  
•  Pig	
  La:n	
  programs	
  are	
  much	
  simpler	
  to	
  write	
  
   and	
  get	
  translated	
  to	
        !"#"$%&'"()*'+,)-.)+("&."/.)+$*.012&3&33&456"

   Hadoop	
  code	
  
                                           7"#"8$9*3"!":4";*9-3<,2&-'1-=+<->?!@AB/.)+$*.C"(DA/#E5A/#E5D(56"
                                           .9%3*"7"+;9%"(%,9=,9-9F9(6"

                                                   SMARTS	
  search	
  in	
  	
  
•  SQL-­‐like,	
  requires	
  	
                   Pig	
  La:n	
  
                                    !"#$%&'&$())'*+,-./'012034)'5%$2065"3&'7'


   UDF	
  to	
  be	
  	
  
                                    '''')2(8&'*+,9-*:"06;-<<$')=2>)2(8&'7'
                                    ''''''''26;'7'
                                    '''''''''''')=2'?'30@'*+,9-*:"06;-<<$AB.BC>'


   implemented	
  to	
  	
  
                                    ''''''''D'&(2&E'A.FGH1&0!8<3'0C'7'
                                    ''''''''''''*;)20IJ<"2J!6%32$3A0C>'
                                    ''''''''D'
                                    ''''D'

   perform	
  	
                    '''')2(8&'*I%$0)K(6)06')!'?'30@'*I%$0)K(6)06AF0L("$2.E0IM#N0&2O"%$406JP02Q3)2(3&0ACC>'
                                    '
                                    ''''!"#$%&'O<<$0(3'010&A-"!$0'2"!$0C'2E6<@)'QMH1&0!8<3'7'

   non-­‐standard	
  tasks	
        ''''''''%L'A2"!$0'??'3"$$'RR'2"!$0J)%S0AC'T'UC'602"63'L($)0>'
                                    ''''''''*26%3P'2(6P02'?'A*26%3PC'2"!$0JP02AVC>'
                                    ''''''''*26%3P'="06;'?'A*26%3PC'2"!$0JP02AWC>'
                                    ''''''''26;'7'                                    UDF	
  for	
  SMARTS	
  search	
  
                                    '''''''''''')=2J)02*I(62)A="06;C>'
                                    ''''''''''''Q,2<I.<32(%306'I<$'?')!J!(6)0*I%$0)A2(6P02C>'
                                    ''''''''''''602"63')=2JI(2&E0)AI<$C>'
                                    ''''''''D'&(2&E'A.FGH1&0!8<3'0C'7'
                                    ''''''''''''2E6<@'X6(!!04QMH1&0!8<3J@6(!ABH66<6'%3'*+,9-*'!(Y063'<6'*+QZH*')26%3P'B[="06;'0C>'
                                    ''''''''D'
                                    ''''D'
                                    D'
Working	
  on	
  top	
  of	
  Hadoop	
  
•  Hadoop	
  doesn’t	
  know	
  anything	
  about	
  
   cheminforma:cs	
  
   –  Need	
  to	
  write	
  your	
  own	
  code,	
  UDF’s	
  etc	
  
•  But	
  applica:on	
  layers	
  have	
  been	
  developed	
  for	
  
   other	
  purposes	
  
   –  	
  	
  	
  	
  	
  	
  	
  	
  Apache	
  Mahout:	
  a	
  library	
  for	
  machine	
  learning	
  	
  
        	
  	
  	
  	
  	
  	
  	
  	
  	
  on	
  data	
  stored	
  in	
  Hadoop	
  clusters	
  
        	
  
   	
  
   –  Possible	
  to	
  build	
  virtual	
  screening	
  pipelines	
  based	
  on	
  
        the	
  Hadoop	
  framework	
  
What	
  Hadoop	
  is	
  not	
  for	
  
•  Doesn’t	
  replace	
  an	
  actual	
  database	
  
•  It’s	
  not	
  uniformly	
  fast	
  or	
  efficient	
  
•  Not	
  good	
  for	
  ad	
  hoc	
  or	
  real:me	
  analysis	
  
•  Not	
  effec:ve	
  unless	
  dealing	
  with	
  massive	
  
   datasets	
  
•  All	
  algorithms	
  are	
  not	
  amenable	
  to	
  the	
  map-­‐
   reduce	
  method	
  
     –  CPU	
  bound	
  methods	
  and	
  those	
  requiring	
  
        communica:on	
  
Cheminforma8cs	
  on	
  Hadoop	
  
•      Hadoop	
  and	
  Atom	
  Coun:ng	
  
•      Hadoop	
  and	
  SD	
  Files	
  
•      Cheminforma:cs,	
  Hadoop	
  and	
  EC2	
  
•      Pig	
  and	
  Cheminforma:cs	
  
	
  


        But	
  are	
  cheminforma1cs	
  problems	
  	
  
       really	
  big	
  enough	
  to	
  jus1fy	
  all	
  of	
  this?	
  
How	
  big	
  is	
  big?	
  
•  Bryk	
  et	
  al	
  performed	
  a	
  LBVS	
  of	
  5	
  million	
  
   compounds	
  to	
  iden:fy	
  PKR	
  inhibitors	
  
     –  Pharmacophore	
  fingerprints	
  +	
  perceptron	
  
     –  Required	
  conformer	
  genera:on	
  	
  
•  Given	
  that	
  conformer	
  and	
  descriptor	
  genera:on	
  
   are	
  one-­‐:me	
  tasks,	
  screening	
  5M	
  compounds	
  
   doesn’t	
  take	
  long	
  
•  Example:	
  RF	
  models	
  built	
  on	
  512	
  bit	
  binary	
  
   fingerprints	
  gives	
  us	
  predic:ons	
  for	
  5M	
  
   fingerprints	
  in	
  12	
  min	
  [Single	
  core,	
  3	
  GHz	
  Xeon,	
  OS	
  X	
  10.6.8]	
  
Going	
  beyond	
  chunking?	
  
•  All	
  the	
  preceding	
  use	
  cases	
  are	
  embarrassingly	
  
   parallel	
  	
  
    –  Chunking	
  the	
  input	
  data	
  and	
  applying	
  the	
  same	
  
       opera:on	
  to	
  each	
  chunk	
  
    –  Very	
  nice	
  when	
  you	
  have	
  a	
  big	
  cluster	
  


               Are	
  there	
  algorithms	
  in	
  	
  
        cheminforma1cs	
  that	
  	
  can	
  employ	
  	
  
       map-­‐reduce	
  at	
  the	
  algorithmic	
  level?	
  
Going	
  beyond	
  chunking?	
  
•  Applica:ons	
  that	
  make	
  use	
  of	
  pairwise	
  (or	
  higher	
  
   order)	
  calcula:ons	
  could	
  benefit	
  from	
  a	
  map-­‐
   reduce	
  incarna:on	
  
    –  Doesn’t	
  always	
  avoid	
  the	
  O(N2)	
  barrier	
  
    –  Bioisostere	
  iden:fica:on	
  is	
  one	
  case	
  that	
  could	
  be	
  
       rephrased	
  as	
  a	
  map-­‐reduce	
  problem	
  
•  Search	
  algorithms	
  such	
  as	
  GA’s,	
  par:cle	
  swarms	
  
   can	
  make	
  use	
  of	
  map-­‐reduce	
  
    –  GA	
  based	
  docking	
  
    –  Feature	
  selec:on	
  for	
  QSAR	
  models	
  
Going	
  beyond	
  chunking?	
  
•  Machine	
  learning	
  for	
  massive	
  chemical	
  datasets?	
  
   –  MR	
  jobs	
  (descriptor	
  genera:on)	
  +	
  Mahout	
  (model	
  
      building)	
  lets	
  us	
  handle	
  this	
  in	
  a	
  straight	
  forward	
  
      manner	
  
•  But	
  will	
  QSAR	
  models	
  benefit	
  from	
  more	
  data?	
  
   –  Helgee	
  et	
  al	
  suggest	
  global	
  models	
  are	
  preferable	
  
   –  But	
  diversity	
  and	
  the	
  structure	
  of	
  the	
  chemical	
  space	
  
      will	
  affect	
  performance	
  of	
  global	
  models	
  
   –  Unsupervised	
  methods	
  maybe	
  more	
  relevant	
  
   –  Philosophical	
  ques:on?	
  
Going	
  beyond	
  chunking?	
  
•  Many	
  clustering	
  algorithms	
  are	
  amenable	
  to	
  
   map-­‐reduce	
  style	
  
   –  K-­‐means,	
  Spectral,	
  EM,	
  minhash,	
  …	
  
   –  Many	
  are	
  implemented	
  in	
  Mahout	
  



  Problems	
  where	
  we	
  generate	
  large	
  numbers	
  of	
  	
  
  combina8ons	
  can	
  be	
  amenable	
  to	
  map-­‐reduce	
  
Networks	
  &	
  integra8on	
  
•  Network	
  models	
  of	
  molecules,	
  
   and	
  targets	
  are	
  common	
  
   –  Allows	
  for	
  the	
  incorpora:on	
  of	
  
      lots	
  of	
  associated	
  informa:on	
  
   –  Diseases,	
  pathways,	
  OTE’s,	
  	
                            Yildirim,	
  M.A.	
  et	
  al	
  


•  When	
  linked	
  with	
  clinical	
  data	
  	
  
   &	
  outcomes,	
  we	
  can	
  generate	
  massive	
  networks	
  
   –  Adverse	
  events	
  (FDA	
  AERS)	
  
   –  Analysis	
  by	
  Cloudera	
  considered	
  >	
  10E6	
  drug-­‐drug-­‐
      reac:on	
  triples	
  
Networks	
  &	
  integra8on	
  
•  SAR	
  data	
  can	
  be	
  viewed	
  in	
  a	
  
   network	
  form	
  
    –  SALI,	
  SARI	
  based	
  networks	
  
    –  Usually	
  requires	
  pairwise	
  	
  
       calcula:ons	
  of	
  the	
  metric	
                   Peltason,	
  L	
  et	
  al	
     hPp://sali.rguha.net/	
  


•  Current	
  studies	
  have	
  focused	
  on	
  small	
  datasets	
  
   (<	
  1000	
  molecules)	
  
•  Hadoop	
  +	
  Giraph	
  could	
  let	
  us	
  apply	
  this	
  to	
  HTS-­‐
   scale	
  datasets	
  
Networks	
  &	
  integra8on	
  
•  When	
  we	
  apply	
  a	
  network	
  view	
  
   we	
  can	
  consider	
  many	
  interes:ng	
  
   applica:ons	
  &	
  make	
  use	
  of	
  cloud	
  
   scale	
  infrastructure	
  
    –  Network	
  based	
  similarity	
  
    –  Community	
  detec:on	
  (aka	
  clustering)	
                     Bauer-­‐Mehren	
  et	
  al	
  



    –  PageRank	
  style	
  ranking	
  (of	
  targets,	
  compounds,	
  …)	
  
    –  Generate	
  network	
  metrics,	
  which	
  can	
  be	
  used	
  as	
  
       input	
  to	
  predic:ve	
  models	
  (for	
  interac:ons,	
  effects,	
  
       …)	
  
Conclusions	
  
•  Cheminforma:cs	
  applica:ons	
  can	
  be	
  rewriPen	
  
   to	
  take	
  advantage	
  of	
  cloud	
  resources	
  
    –  Remotely	
  hosted	
  	
  
    –  Embarrassingly	
  parallel	
  /	
  chunked	
  
    –  Map/reduce	
  	
  
•  Ability	
  to	
  process	
  larger	
  structure	
  collec:ons	
  lets	
  
   us	
  explore	
  more	
  chemical	
  space	
  
•  Integra:ng	
  chemistry	
  with	
  clinical	
  &	
  
   pharmacological	
  data	
  can	
  lead	
  to	
  big	
  datasets	
  
Conclusions	
  
•  Q:	
  But	
  are	
  cheminforma8cs	
  problems	
  really	
  big	
  
   enough	
  to	
  jus8fy	
  all	
  of	
  this?	
  	
  
•  A:	
  Yes	
  –	
  virtual	
  libraries,	
  integra:ng	
  chemical	
  
   structure	
  with	
  other	
  types	
  and	
  scales	
  of	
  data	
  

•  Q:	
  Are	
  there	
  algorithms	
  in	
  cheminforma8cs	
  that	
  	
  
   can	
  employ	
  map-­‐reduce	
  at	
  the	
  algorithmic	
  level?	
  
•  A:	
  Yes	
  –	
  especially	
  when	
  we	
  consider	
  problems	
  
   with	
  a	
  combinatorial	
  flavor	
  

Contenu connexe

Tendances

Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerVertiCloud Inc
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerChallenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerMapR Technologies
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR Technologies
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡youngick
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 

Tendances (20)

Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
 
10c introduction
10c introduction10c introduction
10c introduction
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerChallenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
 

En vedette

Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 

En vedette (6)

Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 

Similaire à Chemogenomics in the cloud: Is the sky the limit?

Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRVijay Rayapati
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseLukas Vlcek
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReducehuguk
 

Similaire à Chemogenomics in the cloud: Is the sky the limit? (20)

Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
 

Plus de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange Rajarshi Guha
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataRajarshi Guha
 
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Rajarshi Guha
 
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}Rajarshi Guha
 
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Rajarshi Guha
 

Plus de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange
 
Smashing Molecules
Smashing MoleculesSmashing Molecules
Smashing Molecules
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
 
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
 
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
 
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
 

Dernier

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Dernier (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Chemogenomics in the cloud: Is the sky the limit?

  • 1. Chemogenomics  in  the  cloud   Is  the  sky  the  limit?   Rajarshi  Guha,  Ph.D.   NIH  Center  for  Transla:onal  Therapeu:cs     June  28,  2012  
  • 2. The  cloud  as  infrastructure   •  Cloud  compu:ng  is  a  service  for   –  Infrastructure   –  PlaForm   –  SoHware   •  Much  of  the  benefits  of  cloud  compu:ng  are   –  Economic   –  Poli:cal   •  Won’t  be  discussing  the  remote  hos:ng  aspects   of  clouds  
  • 3. Characteris8cs  of  the  cloud   Virtually Pay-per-use assemble Offsite Cloud Shared technology Computing workloads Massive On-demand scale self service hPp://www.slideshare.net/haslinatuanhim/slides-­‐cloud-­‐compu:ng  
  • 4. Parallel  compu8ng  in  the  cloud   •  Modern  cloud  vendors  make  provisioning   compute  resources  easy   –  Allows  one  to  handle  unpredictable  loads  easily   –  Pay  only  for  what  you  need   •  Chemistry  applica:ons  don’t  usually  have  very   dynamic  loads   •  But  large  scale  resources  are  an  opportunity  for   large  scale  (parallel)  computa:ons  
  • 5. Storing  chemical  informa8on   •  Fill  up  a  hard  drive,  mail  to  Amazon   •  Copy  over  the  network   –  Aspera   –  GridFTP   •  S:ll  need  to  pay  for     storage  space   •  Lots  of  op:ons  on  the   cloud  –  S3,  rela:onal  DB’s   •  See  Chris  Dagdigian’s  talk  for  views  on  storage   hPp://www.slideshare.net/chrisdag/2012-­‐trends-­‐from-­‐the-­‐trenches  
  • 6. Recoding  for  the  cloud?   •  Only  if  we  really  have  to   •  Large  amounts  of  legacy  code,     runs  perfectly  well  on  local  clusters   –  May  not  make  sense  to  recode   as  a  map-­‐reduce  job   –  May  not  be  possible  to   ?   •  Different  levels  of  HPC  on  the  cloud   –  Legacy  HPC   –  ‘Cloudy’  HPC   –  Big  Data  HPC   hPp://www.slideshare.net/chrisdag/mapping-­‐life-­‐science-­‐informa:cs-­‐to-­‐the-­‐cloud  
  • 7. Recoding  for  the  cloud?   •  Use  cloud  resources  in   •  Make  use  of  cloud   •  Huge  datasets   the  same  way  as  a  local   capabili:es   •  Candidates  for  map-­‐ cluster   •  Old  algorithms,  new   reduce   •  MIT  StarCluster  makes   infrastructure   •  Involves  algorithm     this  easy  to  do   •  Spot  instances,  SNS,   (re)design   SQS  SimpleDB,  S3,  etc   Legacy   Cloudy   Big  Data   HPC   HPC   HPC   hPp://www.slideshare.net/chrisdag/mapping-­‐life-­‐science-­‐informa:cs-­‐to-­‐the-­‐cloud  
  • 8. How  does  the  cloud  enable  science?   •  How  does  the  cloud  change  computa:onal   chemistry,  cheminforma:cs,  …   –  The  way  we  do  them   –  The  scale  at  which  we  do  them     Are  there  problems  that  we  can  address  that     we  could  not  have  if  we  didn’t  have  on-­‐demand,     scalable  cloud  resources?  
  • 9. Big  data  &  cheminforma8cs   •  Computa:on  over  large  chemical  databases   –  Pubchem,  ChEMBL,  …   •  What  types  of  computa:ons?   –  Searches  (substructure,  pharmacophore,  ….)   –  QSAR  models  over  large  data   –  Predic:ons  for  large  data   •  Certain  applica:ons  just  need  structures   •  Access  to  correspondingly  massive  experimental   datasets  is  tough  (impossible?)  
  • 10. Big  data  &  cheminforma8cs   •  GDB-­‐13  is  a  truly  big  database  –  977  million   different  structures   –  Current  search  interface  is  based  on  NN  searches   using  a  reduced  representa:on   –  Could  be  a  good  candidate  for  a  Hadoop  based   analysis   •  More  generally,  enumerated  virtual  libraries  can   also  lead  to  very  big  data   –  Time  required  to  enumerate  is  a  boPleneck  
  • 11. Big  data  &  cheminforma8cs   •  Fundamentally,  “big  chemical  data”  lets  us   explore  larger  chemical  spaces     –  Can  plow  through  large  catalogs   –  e.g.,  iden:fying  PKR  inhibitors  by  LBVS  of  the   ChemNavigator  collec:on  [Bryk  et  al]   •  This  can  push  predic:ve  models  to  their  limits     –  Brings  us  back  to  the  global  vs  local  arguments  
  • 12. The  Hadoop  ecosystem   •  A  framework  for  the  map-­‐reduce  agorithm   –  Not  something  you  can  download  and  just  run   –  Need  to  implement  the  infrastructure  and  then   develop  code  to  run  using  the  infrastructure   •  Low  level  Hadoop  programs  can  be  large,   complex  and  tedious   •  Abstrac:ons  have  been  developed  that  make   Hadoop  queries  more  SQL-­‐like  –  results  in  much   more  concise  code  
  • 13. The  Hadoop  ecosystem   Chukwa Zookeeper Flume Pig HBase Mahout Avro Whirr Map Reduce Engine Hama Hadoop Distributed Hive Filesystem Hadoop Common Based  on  hPp://www.slideshare.net/informa:cacorp/101111-­‐part-­‐3-­‐maP-­‐asleP-­‐the-­‐hadoop-­‐ecosystem  
  • 14. Simplifying  Hadoop  applica8ons   •  Raw  Hadoop     programs  can     be  very     tedious  to     write   SMARTS  based     substructure  search    
  • 15. Pig  &  Pig  La8n   •  Pig  La:n  programs  are  much  simpler  to  write   and  get  translated  to   !"#"$%&'"()*'+,)-.)+("&."/.)+$*.012&3&33&456" Hadoop  code   7"#"8$9*3"!":4";*9-3<,2&-'1-=+<->?!@AB/.)+$*.C"(DA/#E5A/#E5D(56" .9%3*"7"+;9%"(%,9=,9-9F9(6" SMARTS  search  in     •  SQL-­‐like,  requires     Pig  La:n   !"#$%&'&$())'*+,-./'012034)'5%$2065"3&'7' UDF  to  be     '''')2(8&'*+,9-*:"06;-<<$')=2>)2(8&'7' ''''''''26;'7' '''''''''''')=2'?'30@'*+,9-*:"06;-<<$AB.BC>' implemented  to     ''''''''D'&(2&E'A.FGH1&0!8<3'0C'7' ''''''''''''*;)20IJ<"2J!6%32$3A0C>' ''''''''D' ''''D' perform     '''')2(8&'*I%$0)K(6)06')!'?'30@'*I%$0)K(6)06AF0L("$2.E0IM#N0&2O"%$406JP02Q3)2(3&0ACC>' ' ''''!"#$%&'O<<$0(3'010&A-"!$0'2"!$0C'2E6<@)'QMH1&0!8<3'7' non-­‐standard  tasks   ''''''''%L'A2"!$0'??'3"$$'RR'2"!$0J)%S0AC'T'UC'602"63'L($)0>' ''''''''*26%3P'2(6P02'?'A*26%3PC'2"!$0JP02AVC>' ''''''''*26%3P'="06;'?'A*26%3PC'2"!$0JP02AWC>' ''''''''26;'7' UDF  for  SMARTS  search   '''''''''''')=2J)02*I(62)A="06;C>' ''''''''''''Q,2<I.<32(%306'I<$'?')!J!(6)0*I%$0)A2(6P02C>' ''''''''''''602"63')=2JI(2&E0)AI<$C>' ''''''''D'&(2&E'A.FGH1&0!8<3'0C'7' ''''''''''''2E6<@'X6(!!04QMH1&0!8<3J@6(!ABH66<6'%3'*+,9-*'!(Y063'<6'*+QZH*')26%3P'B[="06;'0C>' ''''''''D' ''''D' D'
  • 16. Working  on  top  of  Hadoop   •  Hadoop  doesn’t  know  anything  about   cheminforma:cs   –  Need  to  write  your  own  code,  UDF’s  etc   •  But  applica:on  layers  have  been  developed  for   other  purposes   –                 Apache  Mahout:  a  library  for  machine  learning                      on  data  stored  in  Hadoop  clusters       –  Possible  to  build  virtual  screening  pipelines  based  on   the  Hadoop  framework  
  • 17. What  Hadoop  is  not  for   •  Doesn’t  replace  an  actual  database   •  It’s  not  uniformly  fast  or  efficient   •  Not  good  for  ad  hoc  or  real:me  analysis   •  Not  effec:ve  unless  dealing  with  massive   datasets   •  All  algorithms  are  not  amenable  to  the  map-­‐ reduce  method   –  CPU  bound  methods  and  those  requiring   communica:on  
  • 18. Cheminforma8cs  on  Hadoop   •  Hadoop  and  Atom  Coun:ng   •  Hadoop  and  SD  Files   •  Cheminforma:cs,  Hadoop  and  EC2   •  Pig  and  Cheminforma:cs     But  are  cheminforma1cs  problems     really  big  enough  to  jus1fy  all  of  this?  
  • 19. How  big  is  big?   •  Bryk  et  al  performed  a  LBVS  of  5  million   compounds  to  iden:fy  PKR  inhibitors   –  Pharmacophore  fingerprints  +  perceptron   –  Required  conformer  genera:on     •  Given  that  conformer  and  descriptor  genera:on   are  one-­‐:me  tasks,  screening  5M  compounds   doesn’t  take  long   •  Example:  RF  models  built  on  512  bit  binary   fingerprints  gives  us  predic:ons  for  5M   fingerprints  in  12  min  [Single  core,  3  GHz  Xeon,  OS  X  10.6.8]  
  • 20. Going  beyond  chunking?   •  All  the  preceding  use  cases  are  embarrassingly   parallel     –  Chunking  the  input  data  and  applying  the  same   opera:on  to  each  chunk   –  Very  nice  when  you  have  a  big  cluster   Are  there  algorithms  in     cheminforma1cs  that    can  employ     map-­‐reduce  at  the  algorithmic  level?  
  • 21. Going  beyond  chunking?   •  Applica:ons  that  make  use  of  pairwise  (or  higher   order)  calcula:ons  could  benefit  from  a  map-­‐ reduce  incarna:on   –  Doesn’t  always  avoid  the  O(N2)  barrier   –  Bioisostere  iden:fica:on  is  one  case  that  could  be   rephrased  as  a  map-­‐reduce  problem   •  Search  algorithms  such  as  GA’s,  par:cle  swarms   can  make  use  of  map-­‐reduce   –  GA  based  docking   –  Feature  selec:on  for  QSAR  models  
  • 22. Going  beyond  chunking?   •  Machine  learning  for  massive  chemical  datasets?   –  MR  jobs  (descriptor  genera:on)  +  Mahout  (model   building)  lets  us  handle  this  in  a  straight  forward   manner   •  But  will  QSAR  models  benefit  from  more  data?   –  Helgee  et  al  suggest  global  models  are  preferable   –  But  diversity  and  the  structure  of  the  chemical  space   will  affect  performance  of  global  models   –  Unsupervised  methods  maybe  more  relevant   –  Philosophical  ques:on?  
  • 23. Going  beyond  chunking?   •  Many  clustering  algorithms  are  amenable  to   map-­‐reduce  style   –  K-­‐means,  Spectral,  EM,  minhash,  …   –  Many  are  implemented  in  Mahout   Problems  where  we  generate  large  numbers  of     combina8ons  can  be  amenable  to  map-­‐reduce  
  • 24. Networks  &  integra8on   •  Network  models  of  molecules,   and  targets  are  common   –  Allows  for  the  incorpora:on  of   lots  of  associated  informa:on   –  Diseases,  pathways,  OTE’s,     Yildirim,  M.A.  et  al   •  When  linked  with  clinical  data     &  outcomes,  we  can  generate  massive  networks   –  Adverse  events  (FDA  AERS)   –  Analysis  by  Cloudera  considered  >  10E6  drug-­‐drug-­‐ reac:on  triples  
  • 25. Networks  &  integra8on   •  SAR  data  can  be  viewed  in  a   network  form   –  SALI,  SARI  based  networks   –  Usually  requires  pairwise     calcula:ons  of  the  metric   Peltason,  L  et  al   hPp://sali.rguha.net/   •  Current  studies  have  focused  on  small  datasets   (<  1000  molecules)   •  Hadoop  +  Giraph  could  let  us  apply  this  to  HTS-­‐ scale  datasets  
  • 26. Networks  &  integra8on   •  When  we  apply  a  network  view   we  can  consider  many  interes:ng   applica:ons  &  make  use  of  cloud   scale  infrastructure   –  Network  based  similarity   –  Community  detec:on  (aka  clustering)   Bauer-­‐Mehren  et  al   –  PageRank  style  ranking  (of  targets,  compounds,  …)   –  Generate  network  metrics,  which  can  be  used  as   input  to  predic:ve  models  (for  interac:ons,  effects,   …)  
  • 27. Conclusions   •  Cheminforma:cs  applica:ons  can  be  rewriPen   to  take  advantage  of  cloud  resources   –  Remotely  hosted     –  Embarrassingly  parallel  /  chunked   –  Map/reduce     •  Ability  to  process  larger  structure  collec:ons  lets   us  explore  more  chemical  space   •  Integra:ng  chemistry  with  clinical  &   pharmacological  data  can  lead  to  big  datasets  
  • 28. Conclusions   •  Q:  But  are  cheminforma8cs  problems  really  big   enough  to  jus8fy  all  of  this?     •  A:  Yes  –  virtual  libraries,  integra:ng  chemical   structure  with  other  types  and  scales  of  data   •  Q:  Are  there  algorithms  in  cheminforma8cs  that     can  employ  map-­‐reduce  at  the  algorithmic  level?   •  A:  Yes  –  especially  when  we  consider  problems   with  a  combinatorial  flavor