High Throughput Sequencing Technologies: What We Can Know

High Throughput Sequencing
Technologies:
What We Can Know
Brian Krueger, PhD
Duke University
Center for Human Genome Variation

2nd Generation Sequencing Overview
Fragmented
DNA
Align reads to a
reference genome
Ligate Adaptors
Add
Bases
Bind Library and create
clusters
Repeat Hundreds of
times on billions of
Wash Wash
clusters
Cleave Image
Sequencing Cycle
Genomic
DNA

2nd Generation Sequencing Advances
• V3 System Chemistry
– 300GB per Flowcell
– 11 Days to Data
– Genome: $4700, Exome: $790
• V4 System Chemistry
– 600GB per Flowcell
– 6 Days to Data
– Genome: $3000, Exome: $640
• X System Chemistry
– 1GB per Patterned Flowcell
– 3 Days to Data
– Genome: $1500, Exome: $500

Techniques for Acquiring Data
• Whole Genome Sequencing
– Obtain whole blood or tissue sample
– Create sequencing libraries of all DNA fragments
• Whole Exome Sequencing
– Utilizes a selection protocol to fish out ONLY coding
DNA sequences
– Create sequencing libraries from enriched DNA
– Reduces cost and analysis time
• Custom Capture
– Same protocol as Exome sequencing
– Only target desired DNA sequences
• Amplicon Sequencing
– Use PCR to amplify target DNA
– Sequence amplified DNA (Amplicon)
• RNA-Seq
– Extract RNA, capture mRNA, convert to cDNA
– Used for differential gene expression analyses, RNA
isoform detection

CCoommmmoonn DDNNAA MMuuttaattiioonnss
Chromosome
Sequence
variants
Structural
variants
Referenc
Single nucleotide variant
Small insertion
Small deletion
Deletion
Duplication A B C C D
Inversion A B D C
Translocation
e A B C D
ATCGGGTCATGTCA
A B C D
ATCGGGTCATATCA
A B C D
ATCGGGTCATGACGTCA
A B C D
ATCGGGTCAT
A C D
A B E F
G
Credit: Elizabeth Ruzzo, PhD, CHGV

Disadvantages of Current Techniques
• Amplification errors
– All polymerases have an inherent error rate (10-6-10-7)
• GC bias
– PCR bias against GC rich sequences
– Exome capture bias against GC rich sequences
• Trouble detecting small insertions and deletions
– Capture baits may not hybridize well
– Capture cannot be used to reliably detect large CNVs
• Cannot be used for De novo assembly
– Read length too short to span long repeat regions
– Not good for detecting trinucleotide repeat
expansions
• Miss large structural variations
– Translocations and inversions likely will be missed
– Require significant read depth at break points for
these variations to be detected
• Trouble with RNA-seq isoform detection
– Like large structural variations, hard to accurately
detect all splice isoforms using short read technology
A B B B C D
A B B B B C D
A B B B B B B C D
X
A C D
X
A E F G

Solutions!
• Solutions for many of these problems exist
– As always, come at a cost
• Whole Genome Sequencing - $1500
– Reduce Exome Artifacts
• Better Indel Detection and higher
coverage in high GC regions
• Can be used to detect large copy
number variations
• PCR Free Whole Genome Sequencing
– Reduces amplification bias and polymerase
error artifacts
• WGS will miss large structural variations
(Inversions, Translocations, microsatellites)
– Combine with long read technologies
– Added cost of $1000-$10,000
– Higher cost = better detection

Long-ish Read Sequencing Technologies
• Mate-Pair Sequencing
– Insert size increased from 300bp to 3-8KB
– Sequence ends of mate-pairs to pair reads
over much longer distances
– Use short reads to fill gaps
– Adds $1000 to Genome cost

Long-ish Read Sequencing Technologies
• Illumina Synthetic Long Reads
– Fragment Genomic DNA to 10KB
– Dilute across a 384 well plate
– Fragment clonal 10KB fragments into
300bp fragments and barcode
– Sequence fragments and use barcodes to
re-create the long reads synthetically
– Use as a short read scaffold to perform De
Novo sequencing
– Has been used in HLA sequencing and De
Novo assembly of the Drosophila genome
including accurate mapping of 80% of the
transposable elements
– Adds $1800 to Genome cost
10kb fragmentation
Barcoding and clonal amp
Nextera prep
Sequencing

True Long Read Sequencing Technologies
• Defined as single molecule sequencing
• Less complex sample prep and much longer read length
(1-100kb) compared to 200-400bp for 2nd Gen
• Two categories
– Sequencing by synthesis
• Pioneered by Pacific Biosciences
• Sequencer uses super microscopes and polymerase bound
nanowells to WATCH DNA as it is sequenced in real time
• Nanowells filled with DNA bases
• Fluorescence of base only detected at the polymerase
– Direct sequencing by passing DNA through a nanopore
• Bases fed through a membrane bound nanopore
• Ionic difference between both sides of the membrane
• Detect how ion flow changes at the pore as each base passes
through
• Oxford Nanopore, Base4, Stratos Genomics, Genia
• Bleeding edge technology
– Many technical hurdles with very high error rates (10-40%)
– Current best use is to create scaffolds for De Novo assembly
– Very expensive technology
• Costs 3-10x as much as Illumina to do whole genome
sequencing
PacBio
Oxford Nanopore

Questions??
• Reading/Viewing Material:
• Sequencing Methods Ecosystem -
http://res.illumina.com/documents/products/research_reviews/sequencing-methods-
review.pdf
• Illumina TruSeq synthetic long-reads empower de novo assembly and resolve
complex, highly repetitive transposable elements -
http://biorxiv.org/content/early/2014/01/19/001834
• Characterization of the human ESC transcriptome by hybrid sequencing -
http://www.pnas.org/content/110/50/E4821.short
• Nanopore Sequencing Web Conference - http://www.youtube.com/watch?
v=UtXlr19xTh8

High Throughput Sequencing Technologies: What We Can Know

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à High Throughput Sequencing Technologies: What We Can Know

Similaire à High Throughput Sequencing Technologies: What We Can Know (20)

Dernier

Dernier (20)

High Throughput Sequencing Technologies: What We Can Know

Notes de l'éditeur