1. Surya Saha
Cornell University & Boyce Thompson Institute
suryasaha@cornell.edu // Twitter:@SahaSurya
IIT Indore
May 29, 2014
Slides: http://bit.ly/IITIndoreSeq
http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
2. 5/29/2014 IIT Indore 2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and
respect the rights and licenses associated
with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with
permission from originals by Christopher Ross. Original images are available under GPL at
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
3. 1953
DNA Structure
discovery
1977
2012
Sanger DNA sequencing by
chain-terminating inhibitors
1984
Epstein-Barr
virus
(170 Kb)
1987Abi370
Sequencer
1995
2001
Homo
sapiens
(3.0 Gb)
2005
454
Solexa
Solid
2007
2011
Ion
Torrent
PacBio
Haemophilus
influenzae
(1.83 Mb)
2013
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina
Hiseq X
454
5/29/2014 IIT Indore 3
Pinus
taeda
(24 Gb)
2014
MinION
4. 5/29/2014 IIT Indore 4
Its all about the $£€¥
http://www.genome.gov/sequencingcosts/
6. Sanger method
5/29/2014 IIT Indore 6
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and
1980. Published the dideoxy chain termination
method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
10. First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
5/29/2014 IIT Indore 10
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400
http://bit.ly/1clLps3
http://1.usa.gov/1cLqIRd
13. Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS I/RS II
– Ion Torrent Proton/PGM
– SOLiD
– 454
5/29/2014 IIT Indore 13
http://www.acgt.me/blog/2014/3/10/next-generation-
sequencing-must-diepart-2
14. 454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
5/29/2014 IIT Indore 14
http://bit.ly/1ehwxWN
GS FLX
Titanium
http://bit.ly/1ehAcEh
15. Illumina
5/29/2014 IIT Indore 15
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million 400 Million 4 Billion 6 Billion
Read
Length
2x300 bp 2x150 bp 2x125 bp
(2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
16. Illumina
5/29/2014 IIT Indore 16
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million 400 Million 4 Billion 6 Billion
Read
Length
2x300 bp 2x150 bp 2x125 bp
(2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
$1000 human
genome??
19. Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
5/29/2014 IIT Indore 19
http://bit.ly/1naxgTe
20. Pacific Biosciences SMRT sequencing
Error correction methods
5/29/2014 IIT Indore 20
Hierarchical genome-assembly
process (HGAP)
PBJelly
Enlish et al., PLOS One. 2012
PBJelly
21. 5/29/2014 IIT Indore 21
Pacific Biosciences SMRT sequencing
Read Lengths
http://www.igs.umaryland.edu/labs/grc/
Mean Read Length: 8391 bp
Maximum Subread Length: 24585 bp
22. Oxford Nanopore
5/29/2014 IIT Indore 22
https://www.nanoporetech.com/
• No data yet
• Error model
http://erlichya.tumblr.com/post/66376172948/hands-on-
experience-with-oxford-nanopore-minion
29. Real cost of Sequencing!!
Sboner, Genome Biology, 2011
IIT Indore5/29/2014 29
30. Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
5/29/2014 IIT Indore 30
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano Bombarely
31. Implications of Choice of Library
5/29/2014 IIT Indore 31
Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers)
NNNNN NN
32. 5/29/2014 IIT Indore 32
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is:
Qphred = -10 log10 (e)
where e is the estimated probability of a base
being incorrect
33. Which technology to use??
• Microbial genomes
• Eukaryotic genomes
• Resequencing genomes
• RNAseq and other XXXseq methods
5/29/2014 IIT Indore 33
http://bit.ly/1ko9Kgh
34. Looking into the Crystal ball
• Desktop sequencing
• Diagnostics in the clinic
• Large scale environmental sequencing of
microbes
• But challenges remain..
5/29/2014 IIT Indore 34
35. • International Society of
Computational Biology (ISCB)
• ISCB SC RSG India
• > 1500 members
• Contact
– rsg-india@googlegroups.com
– http://www.iscbsc.org/rsg/rsg-india
– https://groups.google.com/forum/#!for
um/compbio_discussion
5/29/2014 IIT Indore 35
36. 5/29/2014 IIT Indore 36
• Collaborate with student
organizations
• Organize workshops and
journal clubs
• Attend international
meetings
37. Position available at Solgenomics
Cassavabase project
Plant Breeding + Bioinformatician
● Familiar with breeding
● Programming in Perl, R, SQL, Hadoop
● Linux
● Africa
● Genius
http://www.cassavabase.org/forum/posts
.pl?topic_id=9