SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
Bonnie	
  Hurwitz,	
  PhD	
  
Arizona	
  Health	
  Sciences	
  Center	
  
Extending	
  the	
  iPlant	
  Cyberinfrastructure:	
  
From	
  Plants	
  to	
  Microbes	
  
The	
  iPlant	
  Collabora,ve	
  	
  
Community	
  Cyberinfrastructure	
  for	
  Life	
  Science	
  
hEp://www.iplantcollaboraIve.org	
  
iVirus	
  and	
  iMicrobe	
  
Joaquin  Ruiz,  PhD	
Dean,  College  of  Science	
 Darren  Boss	
 Devesh  Chourasiya  	
Funding	
   Staff	
  
Ma=  Sullivan,  PhD	
Shane  Burgess,  PhD	
Dean,  CALS
The iPlant Collaborative
Vision
Enable life science researchers and
educators to use and extend
cyberinfrastructure to understand and
ultimately predict the complexity of
biological systems
How	
  iPlant	
  CI	
  Enables	
  Discovery	
  
Challenge:	
  Create	
  an	
  easy-­‐to-­‐use	
  plaNorm	
  powerful	
  enough	
  
to	
  handle	
  data-­‐intensive	
  biology	
  
Many	
  bioinformaIcs	
  tools	
  “off	
  limits”	
  to	
  those	
  without	
  
specialized	
  computaIonal	
  backgrounds.	
  
iPlant is a collaborative virtual
organization
The iPlant Collaborative
Who makes up iPlant?
The iPlant Collaborative
How is iPlant funded?
iPlant Renewed by NSF
September 2013 begins next 5 year period
Scientific Advisory Board
Focus on Genotype-Phenotype science
NSF Recommended expansion of
scope beyond plants
	
  
iPlant collaborates to enable access to the solutions that work the
best for the community…
The iPlant Collaborative
Who does iPlant collaborate with?
How	
  iPlant	
  CI	
  Enables	
  Discovery	
  
Overview	
  of	
  resources	
  
End	
  Users	
  Computa0onal	
  Users	
  
Teragrid
XSEDE
ü  Storage	
  
ü  Computa0on	
  
ü  Hos0ng	
  
ü  Web	
  Services	
  
ü  Scalability	
  
Building	
  a	
  plaNorm	
  
that	
  can	
  support	
  
diverse	
  and	
  
constantly	
  evolving	
  
needs.	
  
iPlant Data Store
ü  Initial 100 GB allocation – TB allocations available
ü  Automatic data backup
ü  Easy upload /download and sharing
The resources you need to share and manage
data with your lab, colleagues and community
Discovery Environment
Hundreds of bioinformatics Apps in an easy-to-
use interface
ü  A platform that can run almost any bioinformatics application
ü  Seamlessly integrated with data and high performance
computing
ü  User extensible – add your own applications
Agave API
Fully customize iPlant resources
ü  Science-as-a-service platform
ü  Define your own compute, and storage resources
(local and iPlant)
ü  Build your own app store of scientific code and workflows
Atmosphere
Cloud computing for the life sciences
ü  Simple: One-click access to more than 100 virtual machine
images
ü  Flexible: Fully customize your software setup
ü  Powerful: Integrated with iPlant computing and data resources
DNA Subway
Educational workflows for Genomes, DNA
Barcoding, RNA-Seq
ü  Commonly used bioinformatics tools in streamlined workflows
ü  Teach important concepts in biology and bioinformatics
ü  Inquiry-based experiments for novel discovery and
publication of data
Bisque
Image analysis, management, and metadata
ü  Secure image storage, analysis, and data management
ü  Integrate existing applications or create new ones
ü  Custom visualization and image handling routines and APIs
Typical	
  End	
  
Users	
  
Computa0onal	
  
Users	
   Teragrid
XSEDE
iMicrobe	
  and	
  iVirus	
  
Leverage	
  the	
  iPlant	
  Cyberinfrastructure	
  
ü  Storage	
  
ü  Computa0on	
  
ü  Analysis	
  
ü  App	
  dev.	
  
ü  Pipeline	
  dev.	
  
ü  Code	
  distrib.	
  
ü  Data	
  
Discoverability	
  
	
  
	
  
Using	
  iPlant	
  for:	
  
What’s	
  Under	
  the	
  Hood?	
  
Stampede	
  -­‐	
  High	
  Level	
  Overview	
  
•  Base	
  Cluster	
  (Dell/Intel/Mellanox):	
  
–  Intel	
  Sandy	
  Bridge	
  processors	
  
–  Dell	
  dual-­‐socket	
  nodes	
  w/32GB	
  RAM	
  (2GB/core)	
  
–  6,400	
  nodes	
  
–  56	
  Gb/s	
  Mellanox	
  FDR	
  InfiniBand	
  interconnect	
  
–  More	
  than	
  100,000	
  cores,	
  2.2	
  PF	
  peak	
  performance	
  
•  Co-­‐Processors:	
  	
  
–  Intel	
  Xeon	
  Phi	
  “MIC”	
  Many	
  Integrated	
  Core	
  processors	
  
–  Special	
  release	
  of	
  “Knight’s	
  Corner”	
  (61	
  cores)	
  
–  All	
  MIC	
  cards	
  are	
  on	
  site	
  at	
  TACC	
  
more	
  than	
  6000	
  installed	
  
final	
  installa0on	
  ongoing	
  for	
  formal	
  	
  	
  
summer	
  acceptance	
  
–  7+	
  PF	
  peak	
  performance	
  
•  Max	
  Total	
  Concurrency:	
  
–  exceeds	
  500,000	
  cores	
  
–  1.8M	
  threads	
  
	
  
•  Entered	
  produc,on	
  opera,ons	
  on	
  January	
  7,	
  2013	
  
iMicrobe/ iVirus: New App Development
June 2013 – May 2014:
13: New Apps
1: High-throughput analysis pipeline
Forging	
  
Ahead	
  
with	
  
iPlant	
  
•  Build	
  a	
  
metegenomics	
  
toolkit	
  	
  
•  Streamline	
  
metagenomics	
  
workflows	
  
•  Enable	
  high-­‐
throughput	
  
compuIng	
  
•  Provide	
  key	
  datasets	
  
for	
  computaIon	
  
iPlant Data Store
The resources you need to share
and manage data with your lab,
colleagues and community
Overview	
  of	
  the	
  iPlant	
  Data	
  Store
Some	
  Complica0ons	
  of	
  Big	
  Data	
  
•  Difficult/slow	
  transfers	
  
	
  
•  Expense	
  for	
  storage/backup	
  
	
  
•  Difficult	
  to	
  share	
  and	
  publish	
  
	
  
•  Metadata	
  
	
  
•  Analysis	
  
iPlant	
  Supports	
  the	
  Life	
  Cycle	
  of	
  Data	
  
Store	
  
Markup	
   Search	
  
Transfer	
  
Analyze	
  Visualize	
  
Collaborate	
  Share	
  
Data	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Results	
  A	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Results	
  B	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  Algo1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Algo2	
  	
  	
  	
  
Pre-­‐	
  PublicaIon	
  
Post-­‐	
  PublicaIon	
  
Teragrid
XSEDE
Overview	
  of	
  the	
  iPlant	
  Data	
  Store
Scalable,	
  Reliable,	
  Redundant,	
  High-­‐performance	
  
•  Access	
  your	
  data	
  from	
  mul0ple	
  iPlant	
  services	
  	
  
•  Automa0c	
  data	
  backup	
  (redundant	
  between	
  	
  
	
  	
  	
  	
  	
  	
  University	
  of	
  Arizona	
  and	
  University	
  of	
  Texas)	
  
	
  
•  Mul0ple	
  ways	
  to	
  share	
  data	
  with	
  collaborators	
  
•  Mul0-­‐threaded	
  high	
  speed	
  transfers	
  
•  Default	
  100GB	
  alloca0on.	
  >1TB	
  alloca0ons	
  	
  
	
  	
  	
  	
  	
  	
  available	
  with	
  jus0fica0on	
  
	
  
Overview	
  of	
  the	
  iPlant	
  Data	
  Store
Some	
  important	
  items	
  we	
  won’t	
  see	
  
Source	
   DesInaIon	
   Copy	
  Method	
   Time	
  (seconds)	
  
CD	
   My	
  Computer	
   cp	
   320	
  
Berkeley	
  Server	
   My	
  Computer	
   scp	
   150	
  
External	
  Drive	
   My	
  Computer	
   cp	
   36	
  
USB2.0	
  Flash	
   My	
  Computer	
   cp	
   30	
  
iDS	
   MyComputer	
   iget	
   18	
  
My	
  Computer	
   My	
  Computer	
   cp	
   15	
  
Close	
  to	
  op0mum	
  condi0ons;	
  transfer	
  between	
  	
  
Univ.	
  of	
  Arizona	
  and	
  UC	
  Berkeley	
  	
  
100GB:	
  29m15s	
  
1	
  GB	
  /	
  17.5	
  seconds	
  
	
  
Discovery Environment
Hundreds of bioinformatics
Apps in an easy-to-use
interface
Overview	
  of	
  the	
  iPlant	
  Discovery	
  Environment
Through	
  the	
  Discovery	
  
Environment	
  you	
  have:	
  
	
  
•  High-­‐powered	
  compu0ng	
  
•  iPlant	
  data	
  store	
  	
  
•  Easy	
  to	
  use	
  interface	
  
•  Virtually	
  limitless	
  apps	
  
•  Analysis	
  history	
  
(provenance)	
  
What	
  you	
  can	
  do	
  in	
  the	
  iPlant	
  DE?
Scalable	
  plajorm	
  for	
  	
  	
  
powerful	
  compu0ng,	
  data,	
  and	
  applica0on	
  resources	
  
	
  
•  Navigate	
  the	
  components	
  of	
  the	
  DE	
  
•  Access	
  and	
  manipulate	
  data	
  
•  Start	
  and	
  complete	
  an	
  analysis	
  
•  Track	
  your	
  analysis	
  and	
  see	
  your	
  results	
  	
  
Why	
  is	
  iPlant	
  DE	
  Scalable?
Democra0ze	
  your	
  code	
  	
  
•  Rich	
  plajorm	
  for	
  bioinforma0cs	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  ~400	
  apps	
  (and	
  coun0ng)	
  
•  Data	
  co-­‐localized	
  with	
  analysis	
  
•  Easy	
  to	
  use	
  interface,	
  with	
  access	
  	
  
	
  	
  	
  	
  	
  	
  to	
  support	
  
•  Easy	
  to	
  integrate	
  and	
  customize	
  your	
  own	
  
	
  	
  	
  	
  	
  	
  tools	
  
Goal:	
  Create	
  a	
  metagenomic	
  assembly.	
  
	
  
Task	
  1:	
  Upload	
  metagenomic	
  fasta	
  file	
  to	
  your	
  personal	
  data	
  store	
  
	
  
Task	
  2:	
  Run	
  quality	
  control	
  on	
  your	
  raw	
  sequence	
  reads	
  
	
  
Task	
  3:	
  Find	
  and	
  select	
  an	
  assembly	
  tool	
  (e.g.	
  Metavelvet)	
  
	
  
Task	
  4:	
  Specify	
  parameters	
  and	
  your	
  input	
  files.	
  	
  Run	
  the	
  assembly	
  App.	
  
	
  
Task	
  5:	
  Monitor	
  the	
  progress	
  of	
  your	
  analysis	
  and	
  save	
  parameters.	
  
	
  
Task	
  6:	
  View	
  your	
  results.	
  
Discovery	
  Environment	
  Example	
  
Sequence Quality Control in the iPlant DE
Genome, Metagenome,
and Transcriptome
Assembly
Genome and Metagenome
Assembly
ALLPATHS-LG
Newbler
SOAPdenovo
Velvet
MetaVelvet
ABySS
SPA
Digital Norm.
IDBA-UD
Transciptome Assembly
Trinity
De novo:
Reference-guided:
SOAPdenovo-Trans
Velvet/Oasis
Trans-ABySS
Tophat
Cufflinks
In the DEKey:
Where is the sample data?
Where is the Assembly App?
Specify Data and Assembly
Parameters
Specify Run Settings
Track Analyses and Results
What about Annotations?
•  Annotations are descriptions of features on contigs in a
genome / metagenome
–  Ab initio gene predictions
–  Protein homology (Genbank nr, SIMAP)
–  Curated protein resources (COG, Kegg, …)
•  Secondary annotations
–  InterPro Scan (Pfam, PIR, Prosite, …)
–  GO and other ontologies
–  Pathway Mapping (Kegg, Metacyc, Ecocyc)
Genome and Metagenome
Assembly
ALLPATHS-LG
Newbler
SOAPdenovo
Velvet
MetaVelvet
ABySS
SPA
Digital Norm.
IDBA-UD
Ab initio Gene
Prediction
Glimmer
Prodigal
FragGeneScan
Metagene
MetaGenmark
Transciptome Assembly
Trinity
De novo:
Reference-guided:
SOAPdenovo-Trans
Velvet/Oasis
Trans-ABySS
Tophat
Cufflinks
Meta-
Genome
input
Evidence
input
Conversion Tools
Annotation
Primary:
Secondary:
BLAST
tophat2gff
cufflinks2gff
Visualization
k-mer based
InterProScan
InterPro2GO
JBrowse
Web-Apollo
Data Commons:
Genomes and Metagenomes
Proteins / Genes
Reference Annotations
Metadata (in irods)
At TACCIn the DE Under DevelopmentKey:
Assembly &
Annotation at
iPlant
ü  Storage	
  
ü  Computa0on	
  
ü  Analysis	
  
ü  Data	
  Access	
  
ü  Code	
  Distr.	
  
ü  Query	
  by	
  
metadata	
  
The	
  Louis	
  Pasteur	
  Method:	
  
We	
  can’t	
  “see”	
  all	
  bacteria	
  using	
  culture-­‐based	
  approaches	
  
Razumov	
  (1932)	
  “The	
  Great	
  Plate	
  Anomaly.”	
  
 	
  	
  	
  	
  Community	
  
	
  	
  	
  	
  	
  Genomics	
   	
  	
  
	
  	
  	
  	
  Isolate	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Metagenomics	
  
The	
  Post-­‐Genomic	
  Era:	
  from	
  Pasteur	
  to	
  CSI	
  
Environmental	
  	
  
Sample	
  
Extract	
  DNA	
   High	
  throughput	
  sequencing	
  
Assemble	
  reads	
   Gene	
  Prediction	
  
library	
  
creation	
  
Making	
  Sense	
  of	
  Metagenomes	
  
Function	
  
Taxonomy	
  
Compare	
  to	
  	
  
known	
  proteins	
  
Viromes are dominated by the Unknown
PhoIc	
   AphoIc	
  
Hurwitz BL & Sullivan MB. The Pacific Ocean Virome (POV). PLoS One. 8: e57355.
Bacteria	
  
5%	
   Eukaryota	
  
1%	
  
Archaea	
  
	
  0%	
  
Viruses	
  
3%	
  
Viruses	
  
7%	
  
Bacteria	
  
4%	
  Eukaryota	
  
1%	
  
Archaea	
  
	
  0%	
  
Unknown	
  
88%	
  
Unknown	
  
91%	
  We  need  	
new  tools!
Phage	
  FuncIon	
  based	
  on	
  Environment	
  
PcPipe:	
  a	
  VigneEe	
  in	
  Viral	
  Metagenomics	
  
Assemble Find Genes
Protein
Clusters
Input
reads
Input
reads
Cluster
Genes
BIN	
Organizing	
  the	
  Unknown	
  
Yooseph	
  S,	
  et	
  al.	
  (2007)	
  The	
  Sorcerer	
  II	
  Global	
  Ocean	
  Sampling	
  expedi0on:	
  
expanding	
  the	
  universe	
  of	
  protein	
  families.	
  PLoS	
  Biol	
  5(3):e16.	
  	
  
27K	
  High-­‐Confidence	
  Viral	
  Protein	
  Clusters	
  
GOS	
  	
  
50%	
  
POV	
  +	
  GOS	
  
22%	
  
POV	
  	
  
28%	
  
Isolate	
  	
  
Phage	
  
1%	
  
2X	
  
environmental	
  
viral	
  protein	
  
clusters	
  
	
  
70%	
  
of	
  data	
  now	
  
included	
  
Hurwitz BL & Sullivan MB. (2013) The Pacific Ocean Virome (POV). PLoS One. 8: e57355.
Ocean	
  Microbial	
  CommuniIes	
  Vary	
  by	
  
Environmental	
  Factors	
  
Pacific	
  Ocean	
  Virome:	
  
Geographic	
  Region	
  
LocaIon	
  on	
  a	
  Transect	
  
Season	
  
Depth	
  
	
  Hurwitz BL & Sullivan MB. (2013) The Pacific Ocean Virome (POV). PLoS One. 8: e57355.
GDS
GFS
M5OD
M4OS
M2MS
LF26S
LA26S
LJ26S
LJ12S
LJ4S
M1CS
STCS
SFCS
SFSS
SFDS
M3MD
LJ12D
LJ26D
LJ4O
LJ12A
LJ4D
LJ4A
M6O1K
M7O4K
LF26D
LF26O
LJ12O
LF26A
LA26A
LA26O
LJ26O
LA26D
LJ4O
LJ12A
LJ4D
LJ4A
M6O1K
M7O4K
LF26D
LJ12O
LF26O
LF26A
LJ26O
LA26A
LA26O
LA26D
LJ26D
LJ12D
M3MD
GDS
GFS
M4OS
M5OD
LJ4S
LJ12S
LJ26S
LA26S
LF26S
M2MS
M1CS
SFSS
SFDS
SFCS
STCS
Aphotic Photic
AphoticPhotic
Hurwitz	
  BL,	
  Brum	
  J.	
  and	
  Sullivan	
  MB.	
  Depth	
  Stra0fied	
  Func0onal	
  and	
  Taxonomic	
  Niche	
  Specializa0on	
  
in	
  the	
  ‘Core’	
  and	
  ‘Flexible’	
  Pacific	
  Ocean	
  Virome	
  .	
  	
  In	
  Review.	
  
	
  
Photic	
vs	
Photic	
Aphotic	
vs	
Photic	
Aphotic	
vs	
Aphotic	
Photic	
vs	
Aphotic	
Protein	
  
Clusters	
  
group	
  by	
  
phoIc	
  
zone	
  
Many  PCs  shared	
Some  PCs  shared	
Few  PCs  shared
Host	
  Genes	
  that	
  Promote	
  Viral	
  ReplicaIon	
  
Fe-­‐S	
  cluster	
  biogenesis	
  and	
  funcIon	
  
DNA/Protein	
  biosynthesis	
  and	
  repair	
  
Host	
  “wake-­‐up”	
  
Energy	
  producIon	
  in	
  photosynthesis	
  
Niche	
  Defining	
  PhoIc	
  Core:	
  
Hurwitz	
  BL,	
  Hallam	
  S.,	
  Sullivan	
  MB.	
  (2013)	
  Metabolic	
  Reprogramming	
  by	
  Viruses	
  in	
  the	
  Sunlit	
  and	
  Dark	
  
Ocean.	
  Genome	
  Biology,	
  14,	
  R123.	
  
Hurwitz	
  BL,	
  Brum	
  J.	
  and	
  Sullivan	
  MB.	
  Depth	
  Stra0fied	
  Func0onal	
  and	
  Taxonomic	
  Niche	
  Specializa0on	
  
in	
  the	
  ‘Core’	
  and	
  ‘Flexible’	
  Pacific	
  Ocean	
  Virome	
  .	
  	
  In	
  Review.	
  
	
  
AdapIve	
  for	
  High	
  Pressure	
  Environments	
  
DNA	
  replicaIon	
  iniIaIon	
  
DNA	
  repair	
  
MoIlity	
  
Energy	
  producIon	
  in	
  the	
  TCA	
  cycle	
  
Niche	
  Defining	
  AphoIc	
  Core:	
  
Hurwitz	
  BL,	
  Hallam	
  S.,	
  Sullivan	
  MB.	
  (2013)	
  Metabolic	
  Reprogramming	
  by	
  Viruses	
  in	
  the	
  Sunlit	
  and	
  Dark	
  
Ocean.	
  Genome	
  Biology,	
  14,	
  R123.	
  
Hurwitz	
  BL,	
  Brum	
  J.	
  and	
  Sullivan	
  MB.	
  Depth	
  Stra0fied	
  Func0onal	
  and	
  Taxonomic	
  Niche	
  Specializa0on	
  
in	
  the	
  ‘Core’	
  and	
  ‘Flexible’	
  Pacific	
  Ocean	
  Virome.	
  	
  In	
  Review.	
  
	
  
QC	
  sequences	
  
•  FASTQ_	
  
	
  	
  	
  	
  	
  shrinker	
  
Assembly	
  	
  
part	
  1	
  
•  Velveth	
  
pcpipe	
  part	
  1	
  
•  Cd-­‐hit-­‐2d	
  
	
  
Input	
  to	
  
Analyses	
  
•  Blastx	
  to	
  nr	
  
•  QIIME	
  
•  RarefacMon	
  
	
  
New.fastq	
  
Find	
  Genes	
  
•  Meta-­‐
Gene-­‐Mark	
  
POV	
  PCs	
  
pcpipe	
  part	
  2	
  
•  Cd-­‐hit	
  
Assembly	
  	
  
part	
  2	
  
•  Velvetg	
  
New.a.faa	
  
iPlant	
  Discovery	
  Environment:	
  	
  
Automated	
  Workflows	
  
POV	
  +	
  
Novel	
  
PCs	
  
PCpipe:	
  creaIng	
  protein	
  clusters	
  for	
  viral	
  ecology	
  
1. 	
  Select	
  the	
  Apps	
  
2. 	
  Order	
  the	
  Apps	
  
3. 	
  Map	
  Outputs	
  to	
  Inputs	
  
4. 	
  Run	
  the	
  analysis	
  
Crea0ng	
  Workflows	
  Easy	
  as	
  1-­‐2-­‐3-­‐4	
  
Create	
  a	
  New	
  Workflow	
  
Provide	
  Workflow	
  Informa0on	
  
Select	
  the	
  Apps	
  
Add	
  the	
  Apps	
  
Remove	
  an	
  App	
  
Order	
  the	
  Apps	
  
New.a.faa	
   POV	
  PCs	
  
Map	
  Outputs	
  to	
  Inputs	
  
A	
  New	
  Workflow	
  
User’s	
  ORFs	
  
POV	
  PCs	
  
Run	
  the	
  Workflow	
  
Automated	
  workflows	
  
cannot	
  use	
  Apps	
  that	
  run	
  
on	
  the	
  HPC	
  
QC	
  sequences	
  
•  FASTQ_	
  
	
  	
  	
  	
  	
  shrinker	
  
Assembly	
  	
  
part	
  1	
  
•  Velveth	
  
pcpipe	
  part	
  1	
  
•  Cd-­‐hit-­‐2d	
  
	
  
AnnotaIon	
  
•  Protein	
  
annotaMon	
  
•  Secondary	
  
annotaMon	
  
	
  
New.fastq	
  
Find	
  Genes	
  
•  Meta-­‐
Gene-­‐Mark	
  
POV	
  PCs	
  
pcpipe	
  part	
  2	
  
•  Cd-­‐hit	
  
pcpipe	
  workflow	
  
Assembly	
  	
  
part	
  2	
  
•  Velvetg	
  
New.a.faa	
  
Gotchas	
  in	
  the	
  PCpipe	
  Workflow	
  
FoundaIon	
  API	
  
Runs	
  on	
  XSEDE	
  (HPC)	
  cannot	
  be	
  used	
  in	
  a	
  workflow	
  
POV	
  +	
  
Novel	
  
PCs	
  
FoundaIon	
  API	
  
Runs	
  on	
  XSEDE	
  
iPlant App iMicrobe
adapter
iMicrobe
condor
node
BLAST vs
SIMAP
cd-hit-2d cd-hit extract
proteins in
novel PCs
SIMAP
Annotation
Pipeline
Management
Foundation
Code
HPC
Job distribution
on condor on condor on condor on TACC on condor
Step 1 Step 2 Step 3 Step 4 Step 5
User
ORFs
Existing
Protein
Clusters
Input 1 Input 2
ORFs in
existing
clusters
ORFs in
new
clusters
Annotation
for new
clusters
Output 1 Output 2 Output 3
An	
  Integrated	
  PCPipe	
  	
  
Exis0ng	
  PCs	
  
(POV)	
  
Directory	
  of	
  
User	
  defined	
  
ORFS	
  
PCPipe	
  App	
  	
  
Collaborating with iPlant
•  Solve	
  computa0onal	
  boulenecks	
  	
  
•  Make	
  tools	
  easier	
  to	
  use	
  
•  Share	
  Data	
  
•  Provide	
  community	
  input	
  
Collaboration
QuesIons	
  or	
  Comments?	
  
Bonnie	
  Hurwitz,	
  PhD	
  
QC	
  sequences	
  
•  FASTQ_	
  
	
  	
  	
  	
  	
  shrinker	
  
Assembly	
  	
  
•  Velvet	
  
pcpipe	
  part	
  1	
  
•  Cd-­‐hit-­‐2d	
  
	
  
Gene	
  
AnnotaIon	
  
•  SIMAP	
  
•  GO	
  
•  PFAM…	
  
	
  
New.fastq	
  
PCs	
  
pcpipe	
  part	
  2	
  
•  Cd-­‐hit	
  
Find	
  Genes	
  
•  Prodigal	
  
ORFs	
  
PCpipe:	
  Protein	
  Cluster	
  Pipeline	
  
Steps	
  in	
  iPlant	
  DE	
  
PCs	
  +	
  
Novel	
  
PCs	
  
(HPC  or  Cloud)  

Contenu connexe

Tendances

Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencingGuy Coates
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific dataBruno Vieira
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Cloud Experiences
Cloud ExperiencesCloud Experiences
Cloud ExperiencesGuy Coates
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences EMC
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...EMC
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Paolo Missier
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Ola Spjuth
 
HPC lab projects
HPC lab projectsHPC lab projects
HPC lab projectsJason Riedy
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloudthetfoot
 
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...Keiichiro Ono
 

Tendances (20)

Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencing
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific data
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Cloud Experiences
Cloud ExperiencesCloud Experiences
Cloud Experiences
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
HPC lab projects
HPC lab projectsHPC lab projects
HPC lab projects
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Phylogenetics: Making publication-quality tree figures
Phylogenetics: Making publication-quality tree figuresPhylogenetics: Making publication-quality tree figures
Phylogenetics: Making publication-quality tree figures
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
 

Similaire à iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
iMarine catalogue of services
iMarine catalogue of servicesiMarine catalogue of services
iMarine catalogue of servicesiMarine283644
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesOla Spjuth
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...David Peyruc
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
iplant-highlights-pag2015
iplant-highlights-pag2015iplant-highlights-pag2015
iplant-highlights-pag2015Matthew Vaughn
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 

Similaire à iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes (20)

2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
iMarine catalogue of services
iMarine catalogue of servicesiMarine catalogue of services
iMarine catalogue of services
 
iMicrobe_ASLO_2015
iMicrobe_ASLO_2015iMicrobe_ASLO_2015
iMicrobe_ASLO_2015
 
Cyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life ScienceCyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life Science
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
iplant-highlights-pag2015
iplant-highlights-pag2015iplant-highlights-pag2015
iplant-highlights-pag2015
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 

Dernier

Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 

Dernier (20)

Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 

iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes

  • 1. Bonnie  Hurwitz,  PhD   Arizona  Health  Sciences  Center   Extending  the  iPlant  Cyberinfrastructure:   From  Plants  to  Microbes  
  • 2. The  iPlant  Collabora,ve     Community  Cyberinfrastructure  for  Life  Science   hEp://www.iplantcollaboraIve.org  
  • 3. iVirus  and  iMicrobe   Joaquin  Ruiz,  PhD Dean,  College  of  Science Darren  Boss Devesh  Chourasiya   Funding   Staff   Ma=  Sullivan,  PhD Shane  Burgess,  PhD Dean,  CALS
  • 4. The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure to understand and ultimately predict the complexity of biological systems
  • 5. How  iPlant  CI  Enables  Discovery   Challenge:  Create  an  easy-­‐to-­‐use  plaNorm  powerful  enough   to  handle  data-­‐intensive  biology   Many  bioinformaIcs  tools  “off  limits”  to  those  without   specialized  computaIonal  backgrounds.  
  • 6. iPlant is a collaborative virtual organization The iPlant Collaborative Who makes up iPlant?
  • 7. The iPlant Collaborative How is iPlant funded? iPlant Renewed by NSF September 2013 begins next 5 year period Scientific Advisory Board Focus on Genotype-Phenotype science NSF Recommended expansion of scope beyond plants  
  • 8. iPlant collaborates to enable access to the solutions that work the best for the community… The iPlant Collaborative Who does iPlant collaborate with?
  • 9. How  iPlant  CI  Enables  Discovery   Overview  of  resources   End  Users  Computa0onal  Users   Teragrid XSEDE ü  Storage   ü  Computa0on   ü  Hos0ng   ü  Web  Services   ü  Scalability   Building  a  plaNorm   that  can  support   diverse  and   constantly  evolving   needs.  
  • 10. iPlant Data Store ü  Initial 100 GB allocation – TB allocations available ü  Automatic data backup ü  Easy upload /download and sharing The resources you need to share and manage data with your lab, colleagues and community
  • 11. Discovery Environment Hundreds of bioinformatics Apps in an easy-to- use interface ü  A platform that can run almost any bioinformatics application ü  Seamlessly integrated with data and high performance computing ü  User extensible – add your own applications
  • 12. Agave API Fully customize iPlant resources ü  Science-as-a-service platform ü  Define your own compute, and storage resources (local and iPlant) ü  Build your own app store of scientific code and workflows
  • 13. Atmosphere Cloud computing for the life sciences ü  Simple: One-click access to more than 100 virtual machine images ü  Flexible: Fully customize your software setup ü  Powerful: Integrated with iPlant computing and data resources
  • 14. DNA Subway Educational workflows for Genomes, DNA Barcoding, RNA-Seq ü  Commonly used bioinformatics tools in streamlined workflows ü  Teach important concepts in biology and bioinformatics ü  Inquiry-based experiments for novel discovery and publication of data
  • 15. Bisque Image analysis, management, and metadata ü  Secure image storage, analysis, and data management ü  Integrate existing applications or create new ones ü  Custom visualization and image handling routines and APIs
  • 16. Typical  End   Users   Computa0onal   Users   Teragrid XSEDE iMicrobe  and  iVirus   Leverage  the  iPlant  Cyberinfrastructure   ü  Storage   ü  Computa0on   ü  Analysis   ü  App  dev.   ü  Pipeline  dev.   ü  Code  distrib.   ü  Data   Discoverability       Using  iPlant  for:  
  • 17. What’s  Under  the  Hood?   Stampede  -­‐  High  Level  Overview   •  Base  Cluster  (Dell/Intel/Mellanox):   –  Intel  Sandy  Bridge  processors   –  Dell  dual-­‐socket  nodes  w/32GB  RAM  (2GB/core)   –  6,400  nodes   –  56  Gb/s  Mellanox  FDR  InfiniBand  interconnect   –  More  than  100,000  cores,  2.2  PF  peak  performance   •  Co-­‐Processors:     –  Intel  Xeon  Phi  “MIC”  Many  Integrated  Core  processors   –  Special  release  of  “Knight’s  Corner”  (61  cores)   –  All  MIC  cards  are  on  site  at  TACC   more  than  6000  installed   final  installa0on  ongoing  for  formal       summer  acceptance   –  7+  PF  peak  performance   •  Max  Total  Concurrency:   –  exceeds  500,000  cores   –  1.8M  threads     •  Entered  produc,on  opera,ons  on  January  7,  2013  
  • 18. iMicrobe/ iVirus: New App Development June 2013 – May 2014: 13: New Apps 1: High-throughput analysis pipeline
  • 19. Forging   Ahead   with   iPlant   •  Build  a   metegenomics   toolkit     •  Streamline   metagenomics   workflows   •  Enable  high-­‐ throughput   compuIng   •  Provide  key  datasets   for  computaIon  
  • 20. iPlant Data Store The resources you need to share and manage data with your lab, colleagues and community
  • 21. Overview  of  the  iPlant  Data  Store Some  Complica0ons  of  Big  Data   •  Difficult/slow  transfers     •  Expense  for  storage/backup     •  Difficult  to  share  and  publish     •  Metadata     •  Analysis  
  • 22. iPlant  Supports  the  Life  Cycle  of  Data   Store   Markup   Search   Transfer   Analyze  Visualize   Collaborate  Share   Data                        Results  A                        Results  B                    Algo1                                  Algo2         Pre-­‐  PublicaIon   Post-­‐  PublicaIon  
  • 23. Teragrid XSEDE Overview  of  the  iPlant  Data  Store Scalable,  Reliable,  Redundant,  High-­‐performance   •  Access  your  data  from  mul0ple  iPlant  services     •  Automa0c  data  backup  (redundant  between                University  of  Arizona  and  University  of  Texas)     •  Mul0ple  ways  to  share  data  with  collaborators   •  Mul0-­‐threaded  high  speed  transfers   •  Default  100GB  alloca0on.  >1TB  alloca0ons                available  with  jus0fica0on    
  • 24. Overview  of  the  iPlant  Data  Store Some  important  items  we  won’t  see   Source   DesInaIon   Copy  Method   Time  (seconds)   CD   My  Computer   cp   320   Berkeley  Server   My  Computer   scp   150   External  Drive   My  Computer   cp   36   USB2.0  Flash   My  Computer   cp   30   iDS   MyComputer   iget   18   My  Computer   My  Computer   cp   15   Close  to  op0mum  condi0ons;  transfer  between     Univ.  of  Arizona  and  UC  Berkeley     100GB:  29m15s   1  GB  /  17.5  seconds    
  • 25. Discovery Environment Hundreds of bioinformatics Apps in an easy-to-use interface
  • 26. Overview  of  the  iPlant  Discovery  Environment Through  the  Discovery   Environment  you  have:     •  High-­‐powered  compu0ng   •  iPlant  data  store     •  Easy  to  use  interface   •  Virtually  limitless  apps   •  Analysis  history   (provenance)  
  • 27. What  you  can  do  in  the  iPlant  DE? Scalable  plajorm  for       powerful  compu0ng,  data,  and  applica0on  resources     •  Navigate  the  components  of  the  DE   •  Access  and  manipulate  data   •  Start  and  complete  an  analysis   •  Track  your  analysis  and  see  your  results    
  • 28. Why  is  iPlant  DE  Scalable? Democra0ze  your  code     •  Rich  plajorm  for  bioinforma0cs                    ~400  apps  (and  coun0ng)   •  Data  co-­‐localized  with  analysis   •  Easy  to  use  interface,  with  access                to  support   •  Easy  to  integrate  and  customize  your  own              tools  
  • 29. Goal:  Create  a  metagenomic  assembly.     Task  1:  Upload  metagenomic  fasta  file  to  your  personal  data  store     Task  2:  Run  quality  control  on  your  raw  sequence  reads     Task  3:  Find  and  select  an  assembly  tool  (e.g.  Metavelvet)     Task  4:  Specify  parameters  and  your  input  files.    Run  the  assembly  App.     Task  5:  Monitor  the  progress  of  your  analysis  and  save  parameters.     Task  6:  View  your  results.   Discovery  Environment  Example  
  • 30. Sequence Quality Control in the iPlant DE
  • 31. Genome, Metagenome, and Transcriptome Assembly Genome and Metagenome Assembly ALLPATHS-LG Newbler SOAPdenovo Velvet MetaVelvet ABySS SPA Digital Norm. IDBA-UD Transciptome Assembly Trinity De novo: Reference-guided: SOAPdenovo-Trans Velvet/Oasis Trans-ABySS Tophat Cufflinks In the DEKey:
  • 32. Where is the sample data?
  • 33. Where is the Assembly App?
  • 34. Specify Data and Assembly Parameters
  • 37. What about Annotations? •  Annotations are descriptions of features on contigs in a genome / metagenome –  Ab initio gene predictions –  Protein homology (Genbank nr, SIMAP) –  Curated protein resources (COG, Kegg, …) •  Secondary annotations –  InterPro Scan (Pfam, PIR, Prosite, …) –  GO and other ontologies –  Pathway Mapping (Kegg, Metacyc, Ecocyc)
  • 38. Genome and Metagenome Assembly ALLPATHS-LG Newbler SOAPdenovo Velvet MetaVelvet ABySS SPA Digital Norm. IDBA-UD Ab initio Gene Prediction Glimmer Prodigal FragGeneScan Metagene MetaGenmark Transciptome Assembly Trinity De novo: Reference-guided: SOAPdenovo-Trans Velvet/Oasis Trans-ABySS Tophat Cufflinks Meta- Genome input Evidence input Conversion Tools Annotation Primary: Secondary: BLAST tophat2gff cufflinks2gff Visualization k-mer based InterProScan InterPro2GO JBrowse Web-Apollo Data Commons: Genomes and Metagenomes Proteins / Genes Reference Annotations Metadata (in irods) At TACCIn the DE Under DevelopmentKey: Assembly & Annotation at iPlant ü  Storage   ü  Computa0on   ü  Analysis   ü  Data  Access   ü  Code  Distr.   ü  Query  by   metadata  
  • 39.
  • 40. The  Louis  Pasteur  Method:   We  can’t  “see”  all  bacteria  using  culture-­‐based  approaches   Razumov  (1932)  “The  Great  Plate  Anomaly.”  
  • 41.          Community            Genomics              Isolate                                         Metagenomics   The  Post-­‐Genomic  Era:  from  Pasteur  to  CSI  
  • 42. Environmental     Sample   Extract  DNA   High  throughput  sequencing   Assemble  reads   Gene  Prediction   library   creation   Making  Sense  of  Metagenomes   Function   Taxonomy   Compare  to     known  proteins  
  • 43. Viromes are dominated by the Unknown PhoIc   AphoIc   Hurwitz BL & Sullivan MB. The Pacific Ocean Virome (POV). PLoS One. 8: e57355. Bacteria   5%   Eukaryota   1%   Archaea    0%   Viruses   3%   Viruses   7%   Bacteria   4%  Eukaryota   1%   Archaea    0%   Unknown   88%   Unknown   91%  We  need   new  tools!
  • 44. Phage  FuncIon  based  on  Environment   PcPipe:  a  VigneEe  in  Viral  Metagenomics  
  • 45. Assemble Find Genes Protein Clusters Input reads Input reads Cluster Genes BIN Organizing  the  Unknown   Yooseph  S,  et  al.  (2007)  The  Sorcerer  II  Global  Ocean  Sampling  expedi0on:   expanding  the  universe  of  protein  families.  PLoS  Biol  5(3):e16.    
  • 46. 27K  High-­‐Confidence  Viral  Protein  Clusters   GOS     50%   POV  +  GOS   22%   POV     28%   Isolate     Phage   1%   2X   environmental   viral  protein   clusters     70%   of  data  now   included   Hurwitz BL & Sullivan MB. (2013) The Pacific Ocean Virome (POV). PLoS One. 8: e57355.
  • 47. Ocean  Microbial  CommuniIes  Vary  by   Environmental  Factors   Pacific  Ocean  Virome:   Geographic  Region   LocaIon  on  a  Transect   Season   Depth    Hurwitz BL & Sullivan MB. (2013) The Pacific Ocean Virome (POV). PLoS One. 8: e57355.
  • 48. GDS GFS M5OD M4OS M2MS LF26S LA26S LJ26S LJ12S LJ4S M1CS STCS SFCS SFSS SFDS M3MD LJ12D LJ26D LJ4O LJ12A LJ4D LJ4A M6O1K M7O4K LF26D LF26O LJ12O LF26A LA26A LA26O LJ26O LA26D LJ4O LJ12A LJ4D LJ4A M6O1K M7O4K LF26D LJ12O LF26O LF26A LJ26O LA26A LA26O LA26D LJ26D LJ12D M3MD GDS GFS M4OS M5OD LJ4S LJ12S LJ26S LA26S LF26S M2MS M1CS SFSS SFDS SFCS STCS Aphotic Photic AphoticPhotic Hurwitz  BL,  Brum  J.  and  Sullivan  MB.  Depth  Stra0fied  Func0onal  and  Taxonomic  Niche  Specializa0on   in  the  ‘Core’  and  ‘Flexible’  Pacific  Ocean  Virome  .    In  Review.     Photic vs Photic Aphotic vs Photic Aphotic vs Aphotic Photic vs Aphotic Protein   Clusters   group  by   phoIc   zone   Many  PCs  shared Some  PCs  shared Few  PCs  shared
  • 49. Host  Genes  that  Promote  Viral  ReplicaIon   Fe-­‐S  cluster  biogenesis  and  funcIon   DNA/Protein  biosynthesis  and  repair   Host  “wake-­‐up”   Energy  producIon  in  photosynthesis   Niche  Defining  PhoIc  Core:   Hurwitz  BL,  Hallam  S.,  Sullivan  MB.  (2013)  Metabolic  Reprogramming  by  Viruses  in  the  Sunlit  and  Dark   Ocean.  Genome  Biology,  14,  R123.   Hurwitz  BL,  Brum  J.  and  Sullivan  MB.  Depth  Stra0fied  Func0onal  and  Taxonomic  Niche  Specializa0on   in  the  ‘Core’  and  ‘Flexible’  Pacific  Ocean  Virome  .    In  Review.    
  • 50. AdapIve  for  High  Pressure  Environments   DNA  replicaIon  iniIaIon   DNA  repair   MoIlity   Energy  producIon  in  the  TCA  cycle   Niche  Defining  AphoIc  Core:   Hurwitz  BL,  Hallam  S.,  Sullivan  MB.  (2013)  Metabolic  Reprogramming  by  Viruses  in  the  Sunlit  and  Dark   Ocean.  Genome  Biology,  14,  R123.   Hurwitz  BL,  Brum  J.  and  Sullivan  MB.  Depth  Stra0fied  Func0onal  and  Taxonomic  Niche  Specializa0on   in  the  ‘Core’  and  ‘Flexible’  Pacific  Ocean  Virome.    In  Review.    
  • 51. QC  sequences   •  FASTQ_            shrinker   Assembly     part  1   •  Velveth   pcpipe  part  1   •  Cd-­‐hit-­‐2d     Input  to   Analyses   •  Blastx  to  nr   •  QIIME   •  RarefacMon     New.fastq   Find  Genes   •  Meta-­‐ Gene-­‐Mark   POV  PCs   pcpipe  part  2   •  Cd-­‐hit   Assembly     part  2   •  Velvetg   New.a.faa   iPlant  Discovery  Environment:     Automated  Workflows   POV  +   Novel   PCs   PCpipe:  creaIng  protein  clusters  for  viral  ecology  
  • 52. 1.   Select  the  Apps   2.   Order  the  Apps   3.   Map  Outputs  to  Inputs   4.   Run  the  analysis   Crea0ng  Workflows  Easy  as  1-­‐2-­‐3-­‐4  
  • 53. Create  a  New  Workflow  
  • 59. New.a.faa   POV  PCs   Map  Outputs  to  Inputs  
  • 61. User’s  ORFs   POV  PCs   Run  the  Workflow  
  • 62. Automated  workflows   cannot  use  Apps  that  run   on  the  HPC  
  • 63. QC  sequences   •  FASTQ_            shrinker   Assembly     part  1   •  Velveth   pcpipe  part  1   •  Cd-­‐hit-­‐2d     AnnotaIon   •  Protein   annotaMon   •  Secondary   annotaMon     New.fastq   Find  Genes   •  Meta-­‐ Gene-­‐Mark   POV  PCs   pcpipe  part  2   •  Cd-­‐hit   pcpipe  workflow   Assembly     part  2   •  Velvetg   New.a.faa   Gotchas  in  the  PCpipe  Workflow   FoundaIon  API   Runs  on  XSEDE  (HPC)  cannot  be  used  in  a  workflow   POV  +   Novel   PCs   FoundaIon  API   Runs  on  XSEDE  
  • 64. iPlant App iMicrobe adapter iMicrobe condor node BLAST vs SIMAP cd-hit-2d cd-hit extract proteins in novel PCs SIMAP Annotation Pipeline Management Foundation Code HPC Job distribution on condor on condor on condor on TACC on condor Step 1 Step 2 Step 3 Step 4 Step 5 User ORFs Existing Protein Clusters Input 1 Input 2 ORFs in existing clusters ORFs in new clusters Annotation for new clusters Output 1 Output 2 Output 3 An  Integrated  PCPipe    
  • 65. Exis0ng  PCs   (POV)   Directory  of   User  defined   ORFS   PCPipe  App    
  • 66. Collaborating with iPlant •  Solve  computa0onal  boulenecks     •  Make  tools  easier  to  use   •  Share  Data   •  Provide  community  input   Collaboration
  • 67. QuesIons  or  Comments?   Bonnie  Hurwitz,  PhD  
  • 68. QC  sequences   •  FASTQ_            shrinker   Assembly     •  Velvet   pcpipe  part  1   •  Cd-­‐hit-­‐2d     Gene   AnnotaIon   •  SIMAP   •  GO   •  PFAM…     New.fastq   PCs   pcpipe  part  2   •  Cd-­‐hit   Find  Genes   •  Prodigal   ORFs   PCpipe:  Protein  Cluster  Pipeline   Steps  in  iPlant  DE   PCs  +   Novel   PCs   (HPC  or  Cloud)