SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Bioinforma)cs	
  Primary	
  Analysis	
  
Tutorial	
  
Phil	
  Richmond,	
  PRA	
  
Dowell	
  Lab	
  
University	
  of	
  Colorado,	
  Biofron)ers	
  
Ins)tute	
  
	
  
Outline	
  
•  Intro	
  
– Things	
  that	
  will	
  be	
  covered	
  
– Things	
  that	
  won’t	
  be	
  covered	
  
•  Workflow	
  
•  Mapping	
  with	
  Bow)e	
  
•  File	
  Conversion	
  with	
  Samtools	
  
•  Visualiza)on	
  with	
  IGV	
  
•  Extras	
  
Sequencing	
  
•  There	
  are	
  many	
  different	
  types	
  of	
  sequencing	
  
including	
  454,	
  Illumina,	
  SOLiD,	
  IonTorrent,	
  and	
  
more.	
  
•  If	
  you	
  are	
  interested	
  in	
  each	
  type	
  of	
  
sequencing…	
  
Things	
  that	
  will	
  be	
  covered	
  
•  The	
  primary	
  analysis	
  that	
  I	
  will	
  walk	
  through	
  is	
  
a	
  “bare	
  bones”	
  analysis,	
  meant	
  to	
  take	
  your	
  
reads	
  from	
  Illumina	
  sequencer	
  to	
  visualizer,	
  as	
  
well	
  as	
  some	
  organiza)onal	
  prac)ces	
  
– Mapping	
  (Bow)e/BWA)	
  
– File	
  format	
  conversion	
  
– Visualiza)on	
  
Things	
  that	
  won’t	
  be	
  covered	
  
•  Post/preprocessing	
  steps	
  that	
  I’m	
  leaving	
  out	
  include:	
  
–  FastX	
  analysis	
  of	
  raw	
  reads	
  and	
  adapter	
  clipping,	
  etc.	
  
–  PCR	
  duplicate	
  marking	
  (Illumina)	
  on	
  raw	
  reads	
  
–  Base	
  Quality	
  Score	
  Recalibra)on	
  (GATK)	
  on	
  mapped	
  reads	
  
–  Local	
  Realignment	
  around	
  indels	
  on	
  mapped	
  reads	
  
•  Any	
  Secondary	
  or	
  Ter)ary	
  analysis	
  or	
  scrip)ng	
  
techniques	
  
–  Secondary	
  analysis	
  by	
  personal	
  appt.	
  
–  Scrip)ng	
  techniques	
  by	
  joining	
  Dave	
  Knox’s	
  python	
  class	
  
Login	
  to	
  Tuxedo	
  
•  Login	
  with	
  –X	
  op)on	
  to	
  open	
  X11	
  viewer.	
  
•  On	
  a	
  PC…see	
  me	
  for	
  separate	
  instruc)ons	
  to	
  
pipe	
  visualiza)on	
  
•  ssh	
  –X	
  richmonp@tuxedo.colorado.edu	
  
Working	
  Directory	
  
•  We	
  will	
  be	
  working	
  in	
  /data/Tutorial/<Student>	
  
–  cd	
  /data/Tutorial/Phil/	
  
•  The	
  necessary	
  files	
  for	
  the	
  tutorial	
  are	
  in	
  /data/
Tutorial/Files/	
  
–  Parent113010.fa	
  is	
  the	
  reference	
  (e.	
  coli)	
  genome	
  
–  Parent120710.gff	
  is	
  the	
  annota)on	
  file	
  
–  Sample1_single.fastq	
  is	
  the	
  reads	
  file	
  we	
  are	
  working	
  
with	
  
Organiza)on	
  
•  In	
  your	
  own	
  directory	
  (/data/Tutorial/
<Student>/)	
  create	
  the	
  following	
  sub-­‐
directories:	
  
– Genome/	
  
•  Keep	
  the	
  fasta	
  and	
  gff	
  files	
  here	
  
– Bow)e/	
  
•  Keep	
  the	
  Bow)e	
  alignments,	
  and	
  post-­‐processing	
  of	
  
bow)e	
  alignments	
  here	
  
– Fastq/	
  
•  Keep	
  the	
  raw	
  fastq	
  files	
  here	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Fastq	
  file	
  
•  File	
  extension	
  .fastq	
  or	
  .fq	
  
•  Example:	
  
@Read_iden)fier_and_flowcell_info	
  
ACGTCCGGTTNNN…	
  
+	
  
B$!?NP[%&C…	
  
•  For	
  more	
  info	
  on	
  ASCII	
  encoding	
  QV	
  scores…
go	
  to	
  wikipedia	
  
Read	
  ID	
  
Read	
  Sequence	
  
Read	
  QV	
  ID	
  
Read	
  QV	
  Sequence
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Mapping	
  the	
  Short	
  Reads	
  
•  Taking	
  each	
  read	
  and	
  mapping	
  it	
  to	
  a	
  
reference	
  genome	
  	
  
– Bow)e	
  
	
  
TGCATGCATGCATGCATGCATGCATGCATGCATGCAAAAAGCATGCATGCA	
  
TGCATGAATGCAAAAAGCATGCA	
  
Bow)e-­‐Build	
  Command	
  
•  In	
  order	
  to	
  map	
  the	
  reads	
  to	
  a	
  genome,	
  you	
  
must	
  acquire	
  the	
  genome	
  in	
  the	
  .fasta	
  (.fa)	
  
format,	
  and	
  then	
  index	
  it.	
  
•  bow)e-­‐build	
  -­‐f	
  <in.fasta>	
  <out_prefix>	
  
– $bow)e-­‐build	
  SGDv4.fasta	
  SGDv4_bow)e	
  
	
  
Bow)e	
  command	
  
•  Now	
  we	
  map	
  back	
  to	
  the	
  reference	
  we	
  just	
  
indexed.	
  
•  bow)e	
  <reference_in.prefix>	
  -­‐q	
  <in.fastq>	
  -­‐S	
  
<out.SAM>	
  2>	
  <out.stderr>	
  
– $	
  bow)e	
  /data/Tutorial/Phil/Genome/
Bow)e_index/SGDv3_bow)e	
  –q	
  Sample1.fastq	
  –S	
  
Sample1_	
  bow)e.sam	
  2>	
  Sample1_bow)e.stderr	
  
Sam	
  File	
  
•  Tab	
  Delimited	
  
•  hup://genome.sph.umich.edu/wiki/SAM	
  
•  Open	
  Example	
  SAM	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
Samtools	
  Commands	
  
•  samtools	
  view	
  –bS	
  <in.sam>	
  -­‐o	
  <out.bam>	
  
– $samtools	
  view	
  –bS	
  Sample1_bow)e.sam	
  –o	
  
Sample1_bow)e.bam	
  
•  samtools	
  sort	
  <in.bam>	
  <out.sorted>	
  
– $samtools	
  sort	
  Sample1_bow)e.bam	
  
Sample1_bow)e.sorted	
  
•  samtools	
  index	
  <in.sorted.bam>	
  
– $samtools	
  index	
  Sample1_bow)e.sorted.bam	
  
Workflow	
  Raw	
  Reads	
  (Fastq)	
  
Mapped	
  Reads	
  (SAM)	
  
Mapping	
  (Bow)e)	
  
Binary	
  Mapped	
  Reads	
  
(SORTED.BAM)	
  
File	
  Conversion	
  (SAMTOOLS)	
  
Visualiza)on	
  (IGV)	
  
IGV	
  
•  Located	
  at	
  /data2/IGV/	
  
•  Several	
  different	
  versions	
  available,	
  
recommend	
  either:	
  
•  	
  /data2/IGV/IGV_2.1.19/igv.jar	
  
•  /data2/IGV/IGV_1.5.64/igv.jar	
  
•  To	
  run	
  IGV:	
  	
  
– java	
  –Xmx5g	
  –jar	
  <igv.jar>	
  	
  
•  $java	
  –Xmx5g	
  –jar	
  /data2/IGV/IGV_1.5.64/igv.jar	
  &	
  
IGV:	
  Crea)ng	
  a	
  genome	
  
•  Reference	
  Instruc)ons	
  on	
  sheet.	
  
Bow)e	
  and	
  Bfast	
  IGV	
  
Bow$e	
  
Bfast	
  
Gene	
  
Advantages	
  to	
  Bfast	
  Gapped	
  Mapping	
  
Bow$e	
  
Bfast	
  
Gene	
  
Bfast	
  Mapping	
  Loosely	
  
Bow$e	
  
Bfast	
  
Gene	
  
If	
  you	
  are	
  gexng	
  the	
  hang	
  of	
  it	
  
quickly…	
  
•  Try	
  going	
  through	
  the	
  next	
  few	
  commands	
  
BWA	
  Paired	
  end	
  
•  /usr/local/src/bwa-­‐0.6.2/bwa	
  index	
  –a	
  is	
  –f	
  <in.fasta>	
  
•  Map	
  each	
  read	
  in	
  the	
  pair	
  independently	
  
•  /usr/local/src/bwa-­‐0.6.2/bwa	
  aln	
  <reference.prefix>	
  
<in_1.fq>	
  >	
  <out.sai>	
  
•  Finalize	
  the	
  mapping	
  by	
  conver)ng	
  (for	
  both	
  reads)	
  
both	
  the	
  .SAI	
  and	
  the	
  .FQ	
  into	
  a	
  final	
  SAM	
  alignment:	
  
•  /usr/local/src/bwa-­‐0.6.2/bwa	
  sampe	
  
<reference.prefix>	
  <in_1.sai>	
  <in_2.sai>	
  <in_1.fq>	
  
<in_2.fq>	
  >	
  <out_paired.sam>	
  	
  
Bow)e	
  Unique	
  Mapping	
  
•  Inves)gate	
  the	
  different	
  Bow)e	
  op)ons:	
  
– Look	
  at	
  –m	
  (number	
  of	
  mappings	
  per	
  read),	
  -­‐v	
  
(number	
  of	
  mismatches	
  per	
  seed)	
  
TopHat	
  Spliced	
  Mapping	
  
•  /usr/local/src/tophat-­‐2.0.4.Linux_x86_64/
tophat	
  –G	
  <in.gff>	
  	
  -­‐o	
  <output_directory>	
  
<bow)e_index>	
  <in.fastq>	
  	
  
The	
  end…for	
  now.	
  

Contenu connexe

En vedette

Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...Institut de l'Elevage - Idele
 
Des éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précisionDes éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précisionInstitut de l'Elevage - Idele
 
Evolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - AdemeEvolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - AdemeBuild Green
 
Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?Institut de l'Elevage - Idele
 
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...Institut de l'Elevage - Idele
 
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...Institut de l'Elevage - Idele
 
Les 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevageLes 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevageInstitut de l'Elevage - Idele
 
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...Institut de l'Elevage - Idele
 
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...Institut de l'Elevage - Idele
 
Guide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - AdemeGuide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - AdemeBuild Green
 

En vedette (13)

Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
Les carnets sanitaires informatisés : nouvelles perspectives de valorisation ...
 
Des éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précisionDes éleveurs connectés - Conséquences et applications de l'élevage de précision
Des éleveurs connectés - Conséquences et applications de l'élevage de précision
 
Evolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - AdemeEvolution de l'habitat en 2050 - Ademe
Evolution de l'habitat en 2050 - Ademe
 
Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?Travail: Comment sensibiliser les personnes en phase d'installation ?
Travail: Comment sensibiliser les personnes en phase d'installation ?
 
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
L'Elevage de précision: Quels changements dans l'organisation du trail et la ...
 
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
1. Varume, un observatoire de la variabilité génétique - Inventaire des situa...
 
Les 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevageLes 3 axes de l'attractivité des métiers d'élevage
Les 3 axes de l'attractivité des métiers d'élevage
 
Des solutions fourragères plus autonomes
Des solutions fourragères plus autonomesDes solutions fourragères plus autonomes
Des solutions fourragères plus autonomes
 
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...4.       Outils pour le traitement ciblé et le traitement sélectif : Indicate...
4. Outils pour le traitement ciblé et le traitement sélectif : Indicate...
 
Génomique semence sexée_eaap2015
Génomique semence sexée_eaap2015Génomique semence sexée_eaap2015
Génomique semence sexée_eaap2015
 
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
6. Les premiers résultats LIFE Carbon Dairy et les leviers pour réduire l'emp...
 
Guide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - AdemeGuide : Choisir des matériaux pour construire et renover - Ademe
Guide : Choisir des matériaux pour construire et renover - Ademe
 
Space 2015 orange - smart agriculture
Space 2015   orange - smart agricultureSpace 2015   orange - smart agriculture
Space 2015 orange - smart agriculture
 

Similaire à Primary analysis tutorial depracated

Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDCDrew Farris
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesPerforce
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014ESUG
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014Marcus Denker
 
Playing with Java Classes and Bytecode
Playing with Java Classes and BytecodePlaying with Java Classes and Bytecode
Playing with Java Classes and BytecodeYoav Avrahami
 
Python VS GO
Python VS GOPython VS GO
Python VS GOOfir Nir
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Shirshanka Das
 
The Road to Lambda - Mike Duigou
The Road to Lambda - Mike DuigouThe Road to Lambda - Mike Duigou
The Road to Lambda - Mike Duigoujaxconf
 
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsBacking Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsITD Systems
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practiceC. Tobin Magle
 
Biohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityBiohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityJerven Bolleman
 
Griffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source ControlGriffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source ControlDavid Waters
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTTkevinvw
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and ActivatorKevin Webber
 

Similaire à Primary analysis tutorial depracated (20)

Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse Branches
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014
 
Pharo Status ESUG 2014
Pharo Status ESUG 2014Pharo Status ESUG 2014
Pharo Status ESUG 2014
 
Playing with Java Classes and Bytecode
Playing with Java Classes and BytecodePlaying with Java Classes and Bytecode
Playing with Java Classes and Bytecode
 
Python VS GO
Python VS GOPython VS GO
Python VS GO
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
DATASTAGE ONLINE TRAINING
DATASTAGE ONLINE TRAININGDATASTAGE ONLINE TRAINING
DATASTAGE ONLINE TRAINING
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The Road to Lambda - Mike Duigou
The Road to Lambda - Mike DuigouThe Road to Lambda - Mike Duigou
The Road to Lambda - Mike Duigou
 
01 html-introduction
01 html-introduction01 html-introduction
01 html-introduction
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau Architecture
 
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsBacking Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Biohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityBiohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics Productivity
 
Griffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source ControlGriffith Bi Migration &amp; Source Control
Griffith Bi Migration &amp; Source Control
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTT
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 

Dernier

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 

Dernier (20)

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 

Primary analysis tutorial depracated

  • 1. Bioinforma)cs  Primary  Analysis   Tutorial   Phil  Richmond,  PRA   Dowell  Lab   University  of  Colorado,  Biofron)ers   Ins)tute    
  • 2. Outline   •  Intro   – Things  that  will  be  covered   – Things  that  won’t  be  covered   •  Workflow   •  Mapping  with  Bow)e   •  File  Conversion  with  Samtools   •  Visualiza)on  with  IGV   •  Extras  
  • 3. Sequencing   •  There  are  many  different  types  of  sequencing   including  454,  Illumina,  SOLiD,  IonTorrent,  and   more.   •  If  you  are  interested  in  each  type  of   sequencing…  
  • 4. Things  that  will  be  covered   •  The  primary  analysis  that  I  will  walk  through  is   a  “bare  bones”  analysis,  meant  to  take  your   reads  from  Illumina  sequencer  to  visualizer,  as   well  as  some  organiza)onal  prac)ces   – Mapping  (Bow)e/BWA)   – File  format  conversion   – Visualiza)on  
  • 5. Things  that  won’t  be  covered   •  Post/preprocessing  steps  that  I’m  leaving  out  include:   –  FastX  analysis  of  raw  reads  and  adapter  clipping,  etc.   –  PCR  duplicate  marking  (Illumina)  on  raw  reads   –  Base  Quality  Score  Recalibra)on  (GATK)  on  mapped  reads   –  Local  Realignment  around  indels  on  mapped  reads   •  Any  Secondary  or  Ter)ary  analysis  or  scrip)ng   techniques   –  Secondary  analysis  by  personal  appt.   –  Scrip)ng  techniques  by  joining  Dave  Knox’s  python  class  
  • 6. Login  to  Tuxedo   •  Login  with  –X  op)on  to  open  X11  viewer.   •  On  a  PC…see  me  for  separate  instruc)ons  to   pipe  visualiza)on   •  ssh  –X  richmonp@tuxedo.colorado.edu  
  • 7. Working  Directory   •  We  will  be  working  in  /data/Tutorial/<Student>   –  cd  /data/Tutorial/Phil/   •  The  necessary  files  for  the  tutorial  are  in  /data/ Tutorial/Files/   –  Parent113010.fa  is  the  reference  (e.  coli)  genome   –  Parent120710.gff  is  the  annota)on  file   –  Sample1_single.fastq  is  the  reads  file  we  are  working   with  
  • 8. Organiza)on   •  In  your  own  directory  (/data/Tutorial/ <Student>/)  create  the  following  sub-­‐ directories:   – Genome/   •  Keep  the  fasta  and  gff  files  here   – Bow)e/   •  Keep  the  Bow)e  alignments,  and  post-­‐processing  of   bow)e  alignments  here   – Fastq/   •  Keep  the  raw  fastq  files  here  
  • 9. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 10. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 11. Fastq  file   •  File  extension  .fastq  or  .fq   •  Example:   @Read_iden)fier_and_flowcell_info   ACGTCCGGTTNNN…   +   B$!?NP[%&C…   •  For  more  info  on  ASCII  encoding  QV  scores… go  to  wikipedia   Read  ID   Read  Sequence   Read  QV  ID   Read  QV  Sequence
  • 12. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 13. Mapping  the  Short  Reads   •  Taking  each  read  and  mapping  it  to  a   reference  genome     – Bow)e     TGCATGCATGCATGCATGCATGCATGCATGCATGCAAAAAGCATGCATGCA   TGCATGAATGCAAAAAGCATGCA  
  • 14. Bow)e-­‐Build  Command   •  In  order  to  map  the  reads  to  a  genome,  you   must  acquire  the  genome  in  the  .fasta  (.fa)   format,  and  then  index  it.   •  bow)e-­‐build  -­‐f  <in.fasta>  <out_prefix>   – $bow)e-­‐build  SGDv4.fasta  SGDv4_bow)e    
  • 15. Bow)e  command   •  Now  we  map  back  to  the  reference  we  just   indexed.   •  bow)e  <reference_in.prefix>  -­‐q  <in.fastq>  -­‐S   <out.SAM>  2>  <out.stderr>   – $  bow)e  /data/Tutorial/Phil/Genome/ Bow)e_index/SGDv3_bow)e  –q  Sample1.fastq  –S   Sample1_  bow)e.sam  2>  Sample1_bow)e.stderr  
  • 16. Sam  File   •  Tab  Delimited   •  hup://genome.sph.umich.edu/wiki/SAM   •  Open  Example  SAM  
  • 17. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 18. Samtools  Commands   •  samtools  view  –bS  <in.sam>  -­‐o  <out.bam>   – $samtools  view  –bS  Sample1_bow)e.sam  –o   Sample1_bow)e.bam   •  samtools  sort  <in.bam>  <out.sorted>   – $samtools  sort  Sample1_bow)e.bam   Sample1_bow)e.sorted   •  samtools  index  <in.sorted.bam>   – $samtools  index  Sample1_bow)e.sorted.bam  
  • 19. Workflow  Raw  Reads  (Fastq)   Mapped  Reads  (SAM)   Mapping  (Bow)e)   Binary  Mapped  Reads   (SORTED.BAM)   File  Conversion  (SAMTOOLS)   Visualiza)on  (IGV)  
  • 20. IGV   •  Located  at  /data2/IGV/   •  Several  different  versions  available,   recommend  either:   •   /data2/IGV/IGV_2.1.19/igv.jar   •  /data2/IGV/IGV_1.5.64/igv.jar   •  To  run  IGV:     – java  –Xmx5g  –jar  <igv.jar>     •  $java  –Xmx5g  –jar  /data2/IGV/IGV_1.5.64/igv.jar  &  
  • 21. IGV:  Crea)ng  a  genome   •  Reference  Instruc)ons  on  sheet.  
  • 22. Bow)e  and  Bfast  IGV   Bow$e   Bfast   Gene  
  • 23. Advantages  to  Bfast  Gapped  Mapping   Bow$e   Bfast   Gene  
  • 24. Bfast  Mapping  Loosely   Bow$e   Bfast   Gene  
  • 25. If  you  are  gexng  the  hang  of  it   quickly…   •  Try  going  through  the  next  few  commands  
  • 26. BWA  Paired  end   •  /usr/local/src/bwa-­‐0.6.2/bwa  index  –a  is  –f  <in.fasta>   •  Map  each  read  in  the  pair  independently   •  /usr/local/src/bwa-­‐0.6.2/bwa  aln  <reference.prefix>   <in_1.fq>  >  <out.sai>   •  Finalize  the  mapping  by  conver)ng  (for  both  reads)   both  the  .SAI  and  the  .FQ  into  a  final  SAM  alignment:   •  /usr/local/src/bwa-­‐0.6.2/bwa  sampe   <reference.prefix>  <in_1.sai>  <in_2.sai>  <in_1.fq>   <in_2.fq>  >  <out_paired.sam>    
  • 27. Bow)e  Unique  Mapping   •  Inves)gate  the  different  Bow)e  op)ons:   – Look  at  –m  (number  of  mappings  per  read),  -­‐v   (number  of  mismatches  per  seed)  
  • 28. TopHat  Spliced  Mapping   •  /usr/local/src/tophat-­‐2.0.4.Linux_x86_64/ tophat  –G  <in.gff>    -­‐o  <output_directory>   <bow)e_index>  <in.fastq>