SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Surya Saha
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
ss2489@cornell.edu // @SahaSurya
BTI PGRP Summer Internship Program 2014
http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
Why Sequencing?
• Targeted interrogation
of genome
• Economical
• Technological
developments
• High-throughput assays
• But requires subsequent
validation
7/8/2014 BTI PGRP Summer Internship Program 2014 2
1953
DNA Structure
discovery
1977
2012
Sanger DNA sequencing by
chain-terminating inhibitors
1984
Epstein-Barr
virus
(170 Kb)
1987Abi370
Sequencer
1995
2001
Homo
sapiens
(3.0 Gb)
2005
454
Solexa
Solid
2007
2011
Ion
Torrent
PacBio
Haemophilus
influenzae
(1.83 Mb)
2013
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina
Hiseq X
454
7/8/2014 BTI PGRP Summer Internship Program 2014 3
Pinus
taeda
(24 Gb)
First generation sequencing
7/8/2014 BTI PGRP Summer Internship Program 2014 4
Sanger method
7/8/2014 BTI PGRP Summer Internship Program 2014 5
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and
1980. Published the dideoxy chain termination
method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
Sanger method
7/8/2014 BTI PGRP Summer Internship Program 2014 6
http://bit.ly/1g6Cudq
http://bit.ly/1lcQO4J
First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
7/8/2014 BTI PGRP Summer Internship Program 2014 7
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400
http://bit.ly/1clLps3
http://1.usa.gov/1cLqIRd
Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS1/RSII
– Ion Torrent Proton/PGM
– SOLiD
7/8/2014 BTI PGRP Summer Internship Program 2014 8
http://www.acgt.me/blog/2014/3/10/next-generation-
sequencing-must-diepart-2
454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
7/8/2014 BTI PGRP Summer Internship Program 2014 9
http://bit.ly/1ehwxWN
GS FLX
Titanium
http://bit.ly/1ehAcEh
Illumina
7/8/2014 BTI PGRP Summer Internship Program 2014 10
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million 400 Million 4 Billion 6 Billion
Read
Length
2x300 bp 2x150 bp 2x125 bp
(2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
Illumina
7/8/2014 BTI PGRP Summer Internship Program 2014 11
http://1.usa.gov/1fP9ybl
Illumina:Moleculo
7/8/2014 BTI PGRP Summer Internship Program 2014 12
http://bit.ly/1aEPOBn
Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
7/8/2014 BTI PGRP Summer Internship Program 2014 13
http://bit.ly/1naxgTe
Pacific Biosciences SMRT sequencing
Error correction methods
7/8/2014 BTI PGRP Summer Internship Program 2014 14
Hierarchical genome-assembly
process (HGAP)
PBJelly
Enlish et al., PLOS One. 2012
PBJelly
7/8/2014 Centre for Agricultural Bioinformatics, Pusa 15
Pacific Biosciences SMRT sequencing
Read Lengths
Oxford Nanopore
7/8/2014 Centre for Agricultural Bioinformatics, Pusa 16
https://www.nanoporetech.com/
• No data yet??
• Error model
http://erlichya.tumblr.com/post/66376172948/hands-on-
experience-with-oxford-nanopore-minion
Others
• Ion Torrent Proton/PGM
• Helicos
• Nabsys
• SOLiD
• ……
7/8/2014 BTI PGRP Summer Internship Program 2014 17
Comparison
7/8/2014 BTI PGRP Summer Internship Program 2014 18
Next generation sequencing
7/8/2014 BTI PGRP Summer Internship Program 2014 19
Run Time Read Length Quality
Total
nucleotides
sequenced
Cost /MB
454
Pyrosequencing
24h 700 bp Q20-Q30 0.7 GB $10
Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15
Illumina Hiseq
2500
11days 2x125bp >Q30 1000 GB $0.05
Ion torrent 2h 400bp >Q20 50MB-1GB $1
Pacific
Biosciences
2h 8.5-20kb
>Q30 consensus
>Q10 single
400-850MB
/SMRT cell
$0.33-$1
http://bit.ly/1clLps3
http://1.usa.gov/1cLqIRd
http://omicsmaps.com/
Next Generation Genomics:
World Map of High-throughput Sequencers
BTI PGRP Summer Internship Program 20147/8/2014 20
7/8/2014 BTI PGRP Summer Internship Program 2014 21
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
7/8/2014 22Centre for Agricultural Bioinformatics, Pusa
Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
7/8/2014 BTI PGRP Summer Internship Program 2014 23
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano Bombarely
Implications of Choice of Library
7/8/2014 BTI PGRP Summer Internship Program 2014 24
Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers)
NNNNN NN
Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify
different samples in the same lane/sector.
7/8/2014 BTI PGRP Summer Internship Program 2014 25
Slide credit: Aureliano Bombarely
AGTCGT
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
Sequencing
Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide
sequences, in which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
File Formats
7/8/2014 BTI PGRP Summer Internship Program 2014 26
Slide credit: Aureliano Bombarely
Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually
nucleotide sequence) and its corresponding quality scores.
-Wikipedia
• Single line ID with at symbol (“@”) in the first column.
• Sequences can be in multiple lines after the ID line
• Single line with plus symbol (“+”) in the first column to represent the quality line.
• Quality ID line may contain ID
• Quality values are in multiple lines after the + line but length should be identical to sequence
7/8/2014 BTI PGRP Summer Internship Program 2014 27
Slide credit: Aureliano Bombarely
File Formats
7/8/2014 BTI PGRP Summer Internship Program 2014 28
Quality control: Encoding
Fastq files:
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
Quality control: Encoding
7/8/2014 BTI PGRP Summer Internship Program 2014 29
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
7/8/2014 BTI PGRP Summer Internship Program 2014 30
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is:
Qphred = -10 log10 (e)
where e is the estimated probability of a base
being wrong
Pre-processing: Tools
Trimming
• FastQC
• FASTX toolkit
• Trimmomatic
• Scythe
Joining paired-end reads
• fastq-join
• FLASH
• PANDAseq
7/8/2014 BTI PGRP Summer Internship Program 2014 31
Pre-processing: Error correction
7/8/2014 BTI PGRP Summer Internship Program 2014 32
Thank you!!
7/8/2014 BTI PGRP Summer Internship Program 2014 33

Contenu connexe

Similaire à SGN Internship Program Overview

Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Surya Saha
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsAnnelies Haegeman
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Surya Saha
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...Ramy K. Aziz
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Torsten Seemann
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data Surya Saha
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014Surya Saha
 
Next generation-sequencing.ppt-converted
Next generation-sequencing.ppt-convertedNext generation-sequencing.ppt-converted
Next generation-sequencing.ppt-convertedShweta Tiwari
 
Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningGenomeInABottle
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleGenomeInABottle
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...Ramy K. Aziz
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewEdizonJambormias2
 
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...Kozo Nishida
 
BioFuel - MetaTranscriptomics - Enzyme Activity
BioFuel - MetaTranscriptomics - Enzyme ActivityBioFuel - MetaTranscriptomics - Enzyme Activity
BioFuel - MetaTranscriptomics - Enzyme ActivityMinesh A. Jethva
 
Bio2RDF : A biological knowledge base for the Semantic Web
Bio2RDF : A biological knowledge base for the Semantic WebBio2RDF : A biological knowledge base for the Semantic Web
Bio2RDF : A biological knowledge base for the Semantic WebMichel Dumontier
 
Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)Dag Endresen
 

Similaire à SGN Internship Program Overview (20)

Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
ngs.pptx
ngs.pptxngs.pptx
ngs.pptx
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014
 
Next generation-sequencing.ppt-converted
Next generation-sequencing.ppt-convertedNext generation-sequencing.ppt-converted
Next generation-sequencing.ppt-converted
 
Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis Planning
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An Overview
 
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...
 
BioFuel - MetaTranscriptomics - Enzyme Activity
BioFuel - MetaTranscriptomics - Enzyme ActivityBioFuel - MetaTranscriptomics - Enzyme Activity
BioFuel - MetaTranscriptomics - Enzyme Activity
 
Nanopore sequencing .
Nanopore sequencing .Nanopore sequencing .
Nanopore sequencing .
 
Bio2RDF : A biological knowledge base for the Semantic Web
Bio2RDF : A biological knowledge base for the Semantic WebBio2RDF : A biological knowledge base for the Semantic Web
Bio2RDF : A biological knowledge base for the Semantic Web
 
Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)Germplasm data exchange, CGIAR SINGER (2009)
Germplasm data exchange, CGIAR SINGER (2009)
 

Plus de Surya Saha

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...Surya Saha
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomesSurya Saha
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingSurya Saha
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingSurya Saha
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesSurya Saha
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data Surya Saha
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all OmicsSurya Saha
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...Surya Saha
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Surya Saha
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Surya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing DataSurya Saha
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Surya Saha
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data SolutionsSurya Saha
 
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Surya Saha
 
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Surya Saha
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 

Plus de Surya Saha (19)

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meeting
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all Omics
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
 
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 

Dernier

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Dernier (20)

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

SGN Internship Program Overview

  • 1. Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY ss2489@cornell.edu // @SahaSurya BTI PGRP Summer Internship Program 2014 http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
  • 2. Why Sequencing? • Targeted interrogation of genome • Economical • Technological developments • High-throughput assays • But requires subsequent validation 7/8/2014 BTI PGRP Summer Internship Program 2014 2
  • 3. 1953 DNA Structure discovery 1977 2012 Sanger DNA sequencing by chain-terminating inhibitors 1984 Epstein-Barr virus (170 Kb) 1987Abi370 Sequencer 1995 2001 Homo sapiens (3.0 Gb) 2005 454 Solexa Solid 2007 2011 Ion Torrent PacBio Haemophilus influenzae (1.83 Mb) 2013 Slide credit: Aureliano Bombarely Sequencing over the Ages Illumina Illumina Hiseq X 454 7/8/2014 BTI PGRP Summer Internship Program 2014 3 Pinus taeda (24 Gb)
  • 4. First generation sequencing 7/8/2014 BTI PGRP Summer Internship Program 2014 4
  • 5. Sanger method 7/8/2014 BTI PGRP Summer Internship Program 2014 5 Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977 http://dailym.ai/1f1XeTB
  • 6. Sanger method 7/8/2014 BTI PGRP Summer Internship Program 2014 6 http://bit.ly/1g6Cudq http://bit.ly/1lcQO4J
  • 7. First generation sequencing • Very high quality sequences (99.999%) • Very low throughput 7/8/2014 BTI PGRP Summer Internship Program 2014 7 Run Time Read Length Reads / Run Total nucleotides sequenced Cost / MB Capillary Sequencing (ABI3730xl) 20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  • 8. Use the specific technology used to generate the data – Illumina Hiseq/Miseq/NextSeq – Pacific Biosciences RS1/RSII – Ion Torrent Proton/PGM – SOLiD 7/8/2014 BTI PGRP Summer Internship Program 2014 8 http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  • 9. 454 Pyrosequencing One purified DNA fragment, to one bead, to one read. 7/8/2014 BTI PGRP Summer Internship Program 2014 9 http://bit.ly/1ehwxWN GS FLX Titanium http://bit.ly/1ehAcEh
  • 10. Illumina 7/8/2014 BTI PGRP Summer Internship Program 2014 10 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina
  • 11. Illumina 7/8/2014 BTI PGRP Summer Internship Program 2014 11 http://1.usa.gov/1fP9ybl
  • 12. Illumina:Moleculo 7/8/2014 BTI PGRP Summer Internship Program 2014 12 http://bit.ly/1aEPOBn
  • 13. Pacific Biosciences SMRT sequencing Single Molecule Real Time sequencing 7/8/2014 BTI PGRP Summer Internship Program 2014 13 http://bit.ly/1naxgTe
  • 14. Pacific Biosciences SMRT sequencing Error correction methods 7/8/2014 BTI PGRP Summer Internship Program 2014 14 Hierarchical genome-assembly process (HGAP) PBJelly Enlish et al., PLOS One. 2012 PBJelly
  • 15. 7/8/2014 Centre for Agricultural Bioinformatics, Pusa 15 Pacific Biosciences SMRT sequencing Read Lengths
  • 16. Oxford Nanopore 7/8/2014 Centre for Agricultural Bioinformatics, Pusa 16 https://www.nanoporetech.com/ • No data yet?? • Error model http://erlichya.tumblr.com/post/66376172948/hands-on- experience-with-oxford-nanopore-minion
  • 17. Others • Ion Torrent Proton/PGM • Helicos • Nabsys • SOLiD • …… 7/8/2014 BTI PGRP Summer Internship Program 2014 17
  • 18. Comparison 7/8/2014 BTI PGRP Summer Internship Program 2014 18
  • 19. Next generation sequencing 7/8/2014 BTI PGRP Summer Internship Program 2014 19 Run Time Read Length Quality Total nucleotides sequenced Cost /MB 454 Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10 Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15 Illumina Hiseq 2500 11days 2x125bp >Q30 1000 GB $0.05 Ion torrent 2h 400bp >Q20 50MB-1GB $1 Pacific Biosciences 2h 8.5-20kb >Q30 consensus >Q10 single 400-850MB /SMRT cell $0.33-$1 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  • 20. http://omicsmaps.com/ Next Generation Genomics: World Map of High-throughput Sequencers BTI PGRP Summer Internship Program 20147/8/2014 20
  • 21. 7/8/2014 BTI PGRP Summer Internship Program 2014 21
  • 22. Real cost of Sequencing!! Sboner, Genome Biology, 2011 7/8/2014 22Centre for Agricultural Bioinformatics, Pusa
  • 23. Library Types Single end Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2) Mate pair (MP, 2Kb to 20 Kb) 7/8/2014 BTI PGRP Summer Internship Program 2014 23 F F R F R 454/Roche FR Illumina Illumina Slide credit: Aureliano Bombarely
  • 24. Implications of Choice of Library 7/8/2014 BTI PGRP Summer Internship Program 2014 24 Slide credit: Aureliano Bombarely Consensus sequence (Contig) Reads Scaffold (or Supercontig) Pair Read information NNNNN Pseudomolecule (or ultracontig) F Genetic information (markers) NNNNN NN
  • 25. Multiplexing Libraries Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector. 7/8/2014 BTI PGRP Summer Internship Program 2014 25 Slide credit: Aureliano Bombarely AGTCGT TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA Sequencing
  • 26. Fasta files: It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. -Wikipedia File Formats 7/8/2014 BTI PGRP Summer Internship Program 2014 26 Slide credit: Aureliano Bombarely
  • 27. Fastq files: FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. -Wikipedia • Single line ID with at symbol (“@”) in the first column. • Sequences can be in multiple lines after the ID line • Single line with plus symbol (“+”) in the first column to represent the quality line. • Quality ID line may contain ID • Quality values are in multiple lines after the + line but length should be identical to sequence 7/8/2014 BTI PGRP Summer Internship Program 2014 27 Slide credit: Aureliano Bombarely File Formats
  • 28. 7/8/2014 BTI PGRP Summer Internship Program 2014 28 Quality control: Encoding Fastq files: !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  • 29. Quality control: Encoding 7/8/2014 BTI PGRP Summer Internship Program 2014 29 !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  • 30. 7/8/2014 BTI PGRP Summer Internship Program 2014 30 Quality control: Encoding http://bit.ly/N28yUd Phred score of a base is: Qphred = -10 log10 (e) where e is the estimated probability of a base being wrong
  • 31. Pre-processing: Tools Trimming • FastQC • FASTX toolkit • Trimmomatic • Scythe Joining paired-end reads • fastq-join • FLASH • PANDAseq 7/8/2014 BTI PGRP Summer Internship Program 2014 31
  • 32. Pre-processing: Error correction 7/8/2014 BTI PGRP Summer Internship Program 2014 32
  • 33. Thank you!! 7/8/2014 BTI PGRP Summer Internship Program 2014 33