SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Lecture	
  6:	
  Parallel	
  compu0ng,	
  cloud	
  
compu0ng	
  and	
  working	
  on	
  Amazon	
  
                  Web	
  Services	
  


               Greg	
  Caporaso	
  
          gregcaporaso@gmail.com	
  
Some	
  last	
  thoughts	
  on	
  regular	
  
           expressions	
  	
  
Robust	
  searches	
  
•  Some0mes	
  your	
  queries	
  will	
  fail	
  
    –  Won’t	
  produce	
  output	
  (good)	
  
    –  Will	
  produce	
  incorrect	
  output	
  (bad)	
  
•  Fail	
  loudly!	
  Produce	
  a	
  (useful)	
  error	
  message	
  
   on	
  failure.	
  	
  
Designing	
  robust	
  searches	
  
                                         	
  
•  Make	
  assump0ons	
  explicit	
  
    –  If	
  you’re	
  assuming	
  that	
  your	
  records	
  start	
  with	
  ‘>’,	
  
       search	
  for	
  ‘^>’	
  to	
  avoid	
  matching	
  ‘>’	
  characters	
  
       that	
  show	
  up	
  in	
  other	
  places	
  
•  Match	
  full	
  lines	
  by	
  including	
  ^	
  and	
  $	
  in	
  your	
  
   search	
  query	
  
•  Check	
  the	
  number	
  of	
  matches	
  that	
  were	
  
   made:	
  is	
  it	
  reasonable?	
  	
  
Tes0ng	
  of	
  soWware	
  
•  Start	
  thinking	
  about	
  what	
  posi0ve	
  and	
  
   nega0ve	
  controls	
  for	
  these	
  terms	
  might	
  look	
  
   like.	
  SoWware	
  tes0ng	
  is	
  something	
  we’ll	
  be	
  
   discussing	
  regularly	
  through-­‐out	
  the	
  semester.	
  
Why	
  is	
  parallel	
  compu0ng	
  important	
  in	
  
                 bioinforma0cs?	
  
Cluster	
  compu0ng	
  
•  Many	
  computers	
  connected	
  to	
  one	
  another	
  to	
  
   serve	
  as	
  a	
  larger	
  compute	
  resource.	
  
•  Compute-­‐intensive	
  jobs	
  can	
  be	
  split	
  over	
  
   many	
  systems	
  and	
  run	
  in	
  parallel.	
  	
  
•  Similar	
  to	
  desktop	
  compute	
  hardware,	
  but	
  
   different	
  casing,	
  no	
  (or	
  only	
  few)	
  displays/
   keyboards	
  directly	
  connected.	
  
•  Owned	
  and	
  maintained	
  “in-­‐house”.	
  
Why	
  is	
  parallel	
  compu0ng	
  important	
  in	
  
                  bioinforma0cs?	
  
                                                              454	
        Illumina	
  Genome	
              Illumina	
  
          Pla$orm	
                   Sanger	
                                                                                        Illumina	
  MiSeq	
  
                                                          (Titanium)	
         Analyzer	
  II	
             HiSeq2000	
  

                                                                                                                                      150	
  (single	
  end);	
  
  Read	
  Length	
  (bases)	
          ~1000	
               ~400	
         150	
  (single	
  end)	
     100	
  (single	
  end)	
  
                                                                                                                                         250	
  soon	
  


    Number	
  of	
  reads	
         96	
  or	
  384	
     ~1,000,000	
       ~100,000,000	
              ~1,600,000,000	
                ~10,000,000	
  


 Maximum	
  number	
  of	
                                                 12,000	
  (barcode-­‐         24,000	
  (barcode-­‐         2500	
  (barcode-­‐
                                         n/a	
               1000	
  
  samples	
  per	
  run	
                                                      limited)	
                    limited)	
                   limited)	
  


   Sequences	
  per	
  $1	
  
                                        0.44	
                100	
                 5000	
                     200,000	
                     12,500	
  
(sequencing	
  costs	
  only)	
  
The	
  “benchtop”	
  sequencer	
  
OTU	
  picking:	
  example	
  of	
  a	
  compute	
  
                   intensive	
  process	
  

 What taxa are represented in
 each sample?
                                                                        Reference tree
                                                                      of non-redundant
                                                                    full length sequences

>PC.634_1 FLP3FBN01ELBSX
CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGG
CTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATG
CGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATAC
TGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGC
AGTTATCCCGGACACATGGGCTAGG
>PC.634_2 FLP3FBN01EG8AX
TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCC
TATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGG
AACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCG
GAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCC
                                                   BLAST against
CGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT
>PC.354_3 FLP3FBN01EEWKD
                                                   reference tree
TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGG
CTATGCATCATTGCCTTGGTAAGCCGTTACCTTACCAACTAGCTAATG
CACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCT
AGACATGCGTCTAGTGTTGTTATCCGGTATTAGCATCTGTTTCCAGGT
GTTATCCCAGTCTCTTGGG
...
OTU	
  picking:	
  example	
  of	
  a	
  compute	
  
              intensive	
  process	
  

 What taxa are represented in                                           Reference tree
                                                                      of non-redundant
 each sample?                                                       full length sequences

>PC.634_1 FLP3FBN01ELBSX
CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGG
CTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATG
CGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATAC
TGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGC
AGTTATCCCGGACACATGGGCTAGG
>PC.634_2 FLP3FBN01EG8AX
TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCC
TATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGG
AACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCG
GAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCC   BLAST against
CGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT
>PC.354_3 FLP3FBN01EEWKD                           reference tree
TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGG
CTATGCATCATTGCCTTGGTAAGCCGTTACCTTACCAACTAGCTAATG
CACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCT
AGACATGCGTCTAGTGTTGTTATCCGGTATTAGCATCTGTTTCCAGGT
GTTATCCCAGTCTCTTGGG
...




              Clusters of “Operational Taxonomic Units” (OTUs);
                       Per sample hits on reference tree;
                            Taxonomic assignments
OTU	
  Picking	
  	
  
•  For	
  1	
  billion	
  sequence	
  reads,	
  the	
  ini0al	
  step	
  
   ran	
  for	
  ~116	
  hours	
  on	
  110	
  processors	
  
   requiring	
  4GB	
  of	
  RAM	
  per	
  job	
  for	
  workers	
  and	
  
   64GB	
  of	
  RAM	
  for	
  master.	
  
OTU	
  Picking	
  	
  
•  For	
  1	
  billion	
  sequence	
  reads,	
  the	
  ini0al	
  step	
  
   ran	
  for	
  ~116	
  hours	
  on	
  110	
  processors	
  
   requiring	
  4GB	
  of	
  RAM	
  per	
  job	
  for	
  workers	
  and	
  
   64GB	
  of	
  RAM	
  for	
  the	
  master	
  job.	
  
•  So,	
  on	
  a	
  single	
  processor	
  desktop	
  with	
  64GB	
  
   of	
  RAM…	
  12760	
  hours	
  or	
  532	
  days!	
  
OTU	
  Picking	
  	
  
•  For	
  1	
  billion	
  sequence	
  reads,	
  the	
  ini0al	
  step	
  
   ran	
  for	
  ~116	
  hours	
  on	
  110	
  processors	
  
   requiring	
  4GB	
  of	
  RAM	
  per	
  job	
  for	
  workers	
  and	
  
   64GB	
  of	
  RAM	
  for	
  the	
  master	
  job.	
  
•  So,	
  on	
  a	
  single	
  processor	
  desktop	
  with	
  64GB	
  
   of	
  RAM…	
  12760	
  hours	
  or	
  532	
  days!	
  
•  One	
  HiSeq2000	
  generates	
  this	
  data	
  in	
  a	
  week!	
  
Cloud	
  compu0ng	
  
•  Implemented	
  on	
  a	
  cluster	
  (or	
  grid),	
  but	
  
   compute	
  power	
  is	
  rented	
  as	
  a	
  service	
  to	
  
   support	
  arbitrary	
  applica0ons.	
  	
  
Maintaining	
  hardware	
  is	
  expensive	
  
•  Temperature	
  (redundant	
  cooling	
  systems)	
  
•  Redundant	
  network	
  connec0ons	
  
•  Hardware	
  maintenance	
  (e.g.,	
  replacing	
  hard	
  
   drives)	
  
•  Fire	
  suppression	
  
•  Back-­‐up	
  power	
  
•  System	
  administrator	
  ($$)	
  
Pay-­‐as-­‐you-­‐go	
  compute	
  power	
  
•  Public	
  clouds	
  (e.g.,	
  Amazon)	
  rent	
  compute	
  
   resources	
  
•  Log	
  in,	
  boot	
  virtual	
  machine	
  image,	
  run	
  
   analyses,	
  and	
  terminate	
  instance.	
  
•  Cheaper	
  for	
  many	
  tasks	
  than	
  buying,	
  
   maintaining,	
  and	
  suppor0ng	
  a	
  compute	
  
   cluster.	
  
Types	
  of	
  cloud	
  offerings	
  
•  Applica0ons/SaaS	
  (e.g.,	
  Google	
  Docs,	
  gmail,	
  
   Dropbox,	
  iCloud)	
  
•  Compu0ng	
  planorm/PaaS	
  (e.g.,	
  Google	
  App	
  
   Engine)	
  
•  Raw	
  compute	
  resources/IaaS	
  (e.g.,	
  Amazon	
  
   Elas0c	
  Compute	
  Cloud	
  (EC2))	
  
Cloud	
  compu0ng	
  op0ons	
  
•  Amazon	
  Elas0c	
  Compute	
  Cloud	
  (EC2)	
  
•  Magellan	
  –	
  Argonne's	
  DOE	
  Cloud	
  Compu0ng	
  
•  Data	
  Intensive	
  Academic	
  Grid	
  (DIAG)	
  –	
  	
  	
  
   Ins0tute	
  for	
  Genome	
  Sciences	
  (IGS),	
  University	
  
   of	
  Maryland	
  School	
  of	
  Medicine	
  (UMSOM)	
  
Interac0ng	
  with	
  the	
  Amazon	
  Cloud	
  
•  Boot	
  virtual	
  machine	
  image	
  via	
  web	
  interface	
  
   (or	
  a	
  third-­‐party	
  tool	
  like	
  StarCluster).	
  
•  Log	
  in	
  and	
  work	
  via	
  terminal	
  (or	
  via	
  web	
  
   interface	
  with	
  IPython	
  Notebook)	
  
•  Move	
  data	
  back	
  and	
  forth	
  via	
  sWp/scp	
  or	
  a	
  
   graphical	
  sWp	
  client	
  (e.g.,	
  Cyberduck	
  [free/
   cross-­‐planorm])	
  
Virtual	
  machines	
  
•  A	
  “guest”	
  opera0ng	
  system	
  running	
  within	
  a	
  
   “host”	
  opera0ng	
  system	
  
•  A	
  soWware	
  implementa0on	
  of	
  a	
  computer,	
  
   that	
  operates	
  like	
  a	
  physical	
  computer.	
  	
  
•  A	
  developer	
  can	
  create	
  a	
  virtual	
  machine	
  
   image	
  which	
  contains	
  their	
  tools	
  pre-­‐installed.	
  
   Users	
  can	
  then	
  instan)ate	
  that	
  image	
  to	
  work	
  
   with	
  those	
  tools.	
  

                      Browse	
  this	
  page:	
  hvp://en.wikipedia.org/wiki/Virtual_machine	
  
Benefits	
  that	
  virtual	
  machines	
  offer	
  
                bioinforma0cs	
  
•  Reproducibility:	
  can	
  publish	
  protocols	
  with	
  a	
  
   virtual	
  machine	
  instance	
  id.	
  
•  Updates	
  are	
  burden	
  of	
  developer,	
  not	
  user.	
  
•  Coupled	
  with	
  cloud	
  compu0ng,	
  it’s	
  the	
  perfect	
  
   model	
  for	
  users	
  with	
  sporadic	
  compute	
  needs.	
  
EC2	
  costs:	
  www.ec2instances.info	
  
QIIME	
  virtual	
  machine	
  
•  The	
  QIIME	
  package	
  distributes	
  an	
  EC2	
  virtual	
  
   machine	
  with	
  QIIME	
  and	
  its	
  (many)	
  dependencies	
  
   pre-­‐installed.	
  
•  Dependencies	
  include	
  commonly	
  used	
  tools	
  like	
  
   BLAST,	
  muscle,	
  FastTree,	
  uclust,	
  IPython,	
  and	
  a	
  
   lot	
  more.	
  A	
  par0al	
  list	
  is	
  available	
  here:	
  
   hvp://qiime.org/install/install.html	
  
•  Latest	
  machine	
  iden0fier	
  can	
  always	
  be	
  found	
  at:	
  
   hvp://qiime.org/home_sta0c/dataFiles.html	
  
I	
  think	
  there	
  is	
  a	
  world	
  market	
  for	
  
maybe	
  five	
  computers.	
  	
  
	
  -­‐	
  Thomas	
  Watson,	
  IBM	
  Founder,	
  1943	
  
I	
  think	
  there	
  is	
  a	
  world	
  market	
  for	
  
maybe	
  five	
  computers.	
  	
  
	
  -­‐	
  Thomas	
  Watson,	
  IBM	
  Founder,	
  1943	
  
                                Units sold by year




                                 All	
  figures	
  are	
  in	
  units	
  of	
  1000.	
  
                                 hvp://jeremyreimer.com/postman/node/329	
  
The	
  democra0za0on	
  of	
  DNA	
  sequencing	
  



                      +	
                    +	
  

  Affordable	
  	
               Cloud	
        Open-­‐source	
  
  sequencing	
                compu0ng	
        soWware	
  
This	
  work	
  is	
  licensed	
  under	
  the	
  Crea0ve	
  Commons	
  Avribu0on	
  3.0	
  United	
  States	
  License.	
  To	
  view	
  a	
  
copy	
  of	
  this	
  license,	
  visit	
  
hvp://crea0vecommons.org/licenses/by/3.0/us/	
  or	
  send	
  a	
  lever	
  to	
  Crea0ve	
  Commons,	
  171	
  
Second	
  Street,	
  Suite	
  300,	
  San	
  Francisco,	
  California,	
  94105,	
  USA.	
  
	
  
Feel	
  free	
  to	
  use	
  or	
  modify	
  these	
  slides,	
  but	
  please	
  credit	
  me	
  by	
  placing	
  the	
  following	
  avribu0on	
  
informa0on	
  where	
  you	
  feel	
  that	
  it	
  makes	
  sense:	
  Greg	
  Caporaso,	
  www.caporaso.us.	
  	
  

Contenu connexe

Tendances

MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingBoxed Ice
 
Initiation concrète-à-la-virtualisation-devoxx-fr-2021
Initiation concrète-à-la-virtualisation-devoxx-fr-2021Initiation concrète-à-la-virtualisation-devoxx-fr-2021
Initiation concrète-à-la-virtualisation-devoxx-fr-2021Pierre-Antoine Grégoire
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingBoxed Ice
 
marko_go_in_badoo
marko_go_in_badoomarko_go_in_badoo
marko_go_in_badooMarko Kevac
 
PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0Tim Bunce
 
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014Amazon Web Services
 
NoSQL 동향
NoSQL 동향NoSQL 동향
NoSQL 동향NAVER D2
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesCharles Nutter
 
DTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDoug Burns
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsAndrei Pangin
 

Tendances (10)

MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and Queueing
 
Initiation concrète-à-la-virtualisation-devoxx-fr-2021
Initiation concrète-à-la-virtualisation-devoxx-fr-2021Initiation concrète-à-la-virtualisation-devoxx-fr-2021
Initiation concrète-à-la-virtualisation-devoxx-fr-2021
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueing
 
marko_go_in_badoo
marko_go_in_badoomarko_go_in_badoo
marko_go_in_badoo
 
PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0
 
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
 
NoSQL 동향
NoSQL 동향NoSQL 동향
NoSQL 동향
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
 
DTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database Forum
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
 

Similaire à Lecture6.pptx

InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Columnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeColumnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeWang Zuo
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGScursoNGS
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
Oracle Performance Tuning DE(v1.2)-part2.ppt
Oracle Performance Tuning DE(v1.2)-part2.pptOracle Performance Tuning DE(v1.2)-part2.ppt
Oracle Performance Tuning DE(v1.2)-part2.pptVenugopalChattu1
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolHong ChangBum
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glassaaronmorton
 

Similaire à Lecture6.pptx (20)

InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Final_Presentation_Docker_KP
Final_Presentation_Docker_KPFinal_Presentation_Docker_KP
Final_Presentation_Docker_KP
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Columnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeColumnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to come
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
Oracle Performance Tuning DE(v1.2)-part2.ppt
Oracle Performance Tuning DE(v1.2)-part2.pptOracle Performance Tuning DE(v1.2)-part2.ppt
Oracle Performance Tuning DE(v1.2)-part2.ppt
 
Serial-War
Serial-WarSerial-War
Serial-War
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
 

Dernier

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 

Dernier (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 

Lecture6.pptx

  • 1. Lecture  6:  Parallel  compu0ng,  cloud   compu0ng  and  working  on  Amazon   Web  Services   Greg  Caporaso   gregcaporaso@gmail.com  
  • 2. Some  last  thoughts  on  regular   expressions    
  • 3. Robust  searches   •  Some0mes  your  queries  will  fail   –  Won’t  produce  output  (good)   –  Will  produce  incorrect  output  (bad)   •  Fail  loudly!  Produce  a  (useful)  error  message   on  failure.    
  • 4. Designing  robust  searches     •  Make  assump0ons  explicit   –  If  you’re  assuming  that  your  records  start  with  ‘>’,   search  for  ‘^>’  to  avoid  matching  ‘>’  characters   that  show  up  in  other  places   •  Match  full  lines  by  including  ^  and  $  in  your   search  query   •  Check  the  number  of  matches  that  were   made:  is  it  reasonable?    
  • 5. Tes0ng  of  soWware   •  Start  thinking  about  what  posi0ve  and   nega0ve  controls  for  these  terms  might  look   like.  SoWware  tes0ng  is  something  we’ll  be   discussing  regularly  through-­‐out  the  semester.  
  • 6. Why  is  parallel  compu0ng  important  in   bioinforma0cs?  
  • 7. Cluster  compu0ng   •  Many  computers  connected  to  one  another  to   serve  as  a  larger  compute  resource.   •  Compute-­‐intensive  jobs  can  be  split  over   many  systems  and  run  in  parallel.     •  Similar  to  desktop  compute  hardware,  but   different  casing,  no  (or  only  few)  displays/ keyboards  directly  connected.   •  Owned  and  maintained  “in-­‐house”.  
  • 8. Why  is  parallel  compu0ng  important  in   bioinforma0cs?   454   Illumina  Genome   Illumina   Pla$orm   Sanger   Illumina  MiSeq   (Titanium)   Analyzer  II   HiSeq2000   150  (single  end);   Read  Length  (bases)   ~1000   ~400   150  (single  end)   100  (single  end)   250  soon   Number  of  reads   96  or  384   ~1,000,000   ~100,000,000   ~1,600,000,000   ~10,000,000   Maximum  number  of   12,000  (barcode-­‐ 24,000  (barcode-­‐ 2500  (barcode-­‐ n/a   1000   samples  per  run   limited)   limited)   limited)   Sequences  per  $1   0.44   100   5000   200,000   12,500   (sequencing  costs  only)  
  • 10. OTU  picking:  example  of  a  compute   intensive  process   What taxa are represented in each sample? Reference tree of non-redundant full length sequences >PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGG CTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATG CGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATAC TGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGC AGTTATCCCGGACACATGGGCTAGG >PC.634_2 FLP3FBN01EG8AX TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCC TATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGG AACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCG GAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCC BLAST against CGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT >PC.354_3 FLP3FBN01EEWKD reference tree TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGG CTATGCATCATTGCCTTGGTAAGCCGTTACCTTACCAACTAGCTAATG CACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCT AGACATGCGTCTAGTGTTGTTATCCGGTATTAGCATCTGTTTCCAGGT GTTATCCCAGTCTCTTGGG ...
  • 11. OTU  picking:  example  of  a  compute   intensive  process   What taxa are represented in Reference tree of non-redundant each sample? full length sequences >PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGG CTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATG CGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATAC TGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGC AGTTATCCCGGACACATGGGCTAGG >PC.634_2 FLP3FBN01EG8AX TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCC TATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGG AACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCG GAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCC BLAST against CGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT >PC.354_3 FLP3FBN01EEWKD reference tree TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGG CTATGCATCATTGCCTTGGTAAGCCGTTACCTTACCAACTAGCTAATG CACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCT AGACATGCGTCTAGTGTTGTTATCCGGTATTAGCATCTGTTTCCAGGT GTTATCCCAGTCTCTTGGG ... Clusters of “Operational Taxonomic Units” (OTUs); Per sample hits on reference tree; Taxonomic assignments
  • 12. OTU  Picking     •  For  1  billion  sequence  reads,  the  ini0al  step   ran  for  ~116  hours  on  110  processors   requiring  4GB  of  RAM  per  job  for  workers  and   64GB  of  RAM  for  master.  
  • 13. OTU  Picking     •  For  1  billion  sequence  reads,  the  ini0al  step   ran  for  ~116  hours  on  110  processors   requiring  4GB  of  RAM  per  job  for  workers  and   64GB  of  RAM  for  the  master  job.   •  So,  on  a  single  processor  desktop  with  64GB   of  RAM…  12760  hours  or  532  days!  
  • 14. OTU  Picking     •  For  1  billion  sequence  reads,  the  ini0al  step   ran  for  ~116  hours  on  110  processors   requiring  4GB  of  RAM  per  job  for  workers  and   64GB  of  RAM  for  the  master  job.   •  So,  on  a  single  processor  desktop  with  64GB   of  RAM…  12760  hours  or  532  days!   •  One  HiSeq2000  generates  this  data  in  a  week!  
  • 15. Cloud  compu0ng   •  Implemented  on  a  cluster  (or  grid),  but   compute  power  is  rented  as  a  service  to   support  arbitrary  applica0ons.    
  • 16. Maintaining  hardware  is  expensive   •  Temperature  (redundant  cooling  systems)   •  Redundant  network  connec0ons   •  Hardware  maintenance  (e.g.,  replacing  hard   drives)   •  Fire  suppression   •  Back-­‐up  power   •  System  administrator  ($$)  
  • 17. Pay-­‐as-­‐you-­‐go  compute  power   •  Public  clouds  (e.g.,  Amazon)  rent  compute   resources   •  Log  in,  boot  virtual  machine  image,  run   analyses,  and  terminate  instance.   •  Cheaper  for  many  tasks  than  buying,   maintaining,  and  suppor0ng  a  compute   cluster.  
  • 18. Types  of  cloud  offerings   •  Applica0ons/SaaS  (e.g.,  Google  Docs,  gmail,   Dropbox,  iCloud)   •  Compu0ng  planorm/PaaS  (e.g.,  Google  App   Engine)   •  Raw  compute  resources/IaaS  (e.g.,  Amazon   Elas0c  Compute  Cloud  (EC2))  
  • 19. Cloud  compu0ng  op0ons   •  Amazon  Elas0c  Compute  Cloud  (EC2)   •  Magellan  –  Argonne's  DOE  Cloud  Compu0ng   •  Data  Intensive  Academic  Grid  (DIAG)  –       Ins0tute  for  Genome  Sciences  (IGS),  University   of  Maryland  School  of  Medicine  (UMSOM)  
  • 20. Interac0ng  with  the  Amazon  Cloud   •  Boot  virtual  machine  image  via  web  interface   (or  a  third-­‐party  tool  like  StarCluster).   •  Log  in  and  work  via  terminal  (or  via  web   interface  with  IPython  Notebook)   •  Move  data  back  and  forth  via  sWp/scp  or  a   graphical  sWp  client  (e.g.,  Cyberduck  [free/ cross-­‐planorm])  
  • 21. Virtual  machines   •  A  “guest”  opera0ng  system  running  within  a   “host”  opera0ng  system   •  A  soWware  implementa0on  of  a  computer,   that  operates  like  a  physical  computer.     •  A  developer  can  create  a  virtual  machine   image  which  contains  their  tools  pre-­‐installed.   Users  can  then  instan)ate  that  image  to  work   with  those  tools.   Browse  this  page:  hvp://en.wikipedia.org/wiki/Virtual_machine  
  • 22. Benefits  that  virtual  machines  offer   bioinforma0cs   •  Reproducibility:  can  publish  protocols  with  a   virtual  machine  instance  id.   •  Updates  are  burden  of  developer,  not  user.   •  Coupled  with  cloud  compu0ng,  it’s  the  perfect   model  for  users  with  sporadic  compute  needs.  
  • 24. QIIME  virtual  machine   •  The  QIIME  package  distributes  an  EC2  virtual   machine  with  QIIME  and  its  (many)  dependencies   pre-­‐installed.   •  Dependencies  include  commonly  used  tools  like   BLAST,  muscle,  FastTree,  uclust,  IPython,  and  a   lot  more.  A  par0al  list  is  available  here:   hvp://qiime.org/install/install.html   •  Latest  machine  iden0fier  can  always  be  found  at:   hvp://qiime.org/home_sta0c/dataFiles.html  
  • 25. I  think  there  is  a  world  market  for   maybe  five  computers.      -­‐  Thomas  Watson,  IBM  Founder,  1943  
  • 26. I  think  there  is  a  world  market  for   maybe  five  computers.      -­‐  Thomas  Watson,  IBM  Founder,  1943   Units sold by year All  figures  are  in  units  of  1000.   hvp://jeremyreimer.com/postman/node/329  
  • 27. The  democra0za0on  of  DNA  sequencing   +   +   Affordable     Cloud   Open-­‐source   sequencing   compu0ng   soWware  
  • 28. This  work  is  licensed  under  the  Crea0ve  Commons  Avribu0on  3.0  United  States  License.  To  view  a   copy  of  this  license,  visit   hvp://crea0vecommons.org/licenses/by/3.0/us/  or  send  a  lever  to  Crea0ve  Commons,  171   Second  Street,  Suite  300,  San  Francisco,  California,  94105,  USA.     Feel  free  to  use  or  modify  these  slides,  but  please  credit  me  by  placing  the  following  avribu0on   informa0on  where  you  feel  that  it  makes  sense:  Greg  Caporaso,  www.caporaso.us.