Course: Bioinformatics for Biomedical Research (2014).
Session: 2.1.1- Next Generation Sequencing. Technologies and Applications. Part I: NGS Introduction and Technology Overview.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
1. 1
Vall d’Hebron Institut de Recerca (VHIR)
Rosa Prieto
Head of the High Tech Unit
rosa.prieto@vhir.org
15/05/2014
Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII)
NEXT GENERATION SEQUENCING
TECHNOLOGIES AND APPLICATIONS
CURS OF BIOINFORMATICS
FOR BIOMEDICAL RESEARCH
2. 2
INTRODUCTION TO NGS1
2
3
4
Index
NGS TECHNOLOGY OVERVIEW
NGS APPLICATIONS OVERVIEW
CURS OF BIOINFORMATICS
FOR BIOMEDICAL RESEARCH
WHAT IS NEXT IN SEQUENCING TECHNOLOGIES?
3. 5
Introduction
Personalized medicine era
Biomarker identification:
•Diagnostic
•Susceptibility/risk (prevention)
•Prognostic (indolent vs. aggressive)
•Predictive (response)
-The right therapeutic strategy for the right person at the right time
-Predisposition to disease
-Early and targeted prevention
4. 7
Introduction: “omics”
“Omics”
Omics aims at the collective characterization and quantification of pools of biological molecules that
translate into the structure, function, and dynamics of an organism or organisms (Wikipedia).
http://www.genomicglossaries.com/content/omes.asp
Genomics
High-throughput
technologies
Epigenomics
Metagenomics
Transcriptomics Proteomics Metabolomics
Lipidomics
8. NGS increases capacity and reduces costs
Moore’s Law: the number of transistors in an
integrated circuit duplicates in 2-years time (1965).
Source - NHGRI : http://www.genome.gov/sequencingcosts/
Date Cost per Mb Cost per Genome % cost vs. sep01
Sep-01 $5.292,39 $95.263.072 100%
Sep-02 $3.413,80 $61.448.422 64,5039%
Oct-03 $2.230,98 $40.157.554 42,1544%
Oct-04 $1.028,85 $18.519.312 19,4402%
Oct-05 $766,73 $13.801.124 14,4874%
Oct-06 $581,92 $10.474.556 10,9954%
Oct-07 $397,09 $7.147.571 7,5030%
Oct-08 $3,81 $342.502 0,3595%
Oct-09 $0,78 $70.333 0,0738%
Oct-10 $0,32 $29.092 0,0305%
Oct-11 $0,086 $7.743 0,0081%
Oct-12 $0,074 $6.618 0,0069%
Oct-13 $0,057 $5.096 0,0053%
Jan-14 $0,045 $4.008 0,0042%
9. 1. Fragmentación de DNA 1. Fragmentación de DNA
2.Clonaje en Vectores; Transformación
Bacterias; crecimiento y aislamiento
vector DNA
2. Ligación de adaptadores in
vitro y Amplificación clonal
3. Ciclo Secuenciación
CTATGCTCG
Secuencia:
Primer:
Polimerasa
dNTPs
ddNTPs marcados
Electroforesis
(1 Secuencia/Capilar)
3. Secuenciación masiva en paralelo
4. Procesamiento imagen y
análisis de datos
4. Procesamiento imagen
1. Fragmentación de DNA
2. y 3. Ligación de adaptadores in vitro
y Secuenciación masiva
SIN Amplificación
Sanger 2ªNGS 3ªNGS
Sanger sequencing vs. NGS (2nd and 3rd generation)
4. Procesamiento imagen y
análisis de datos
10. Comparison of different NGS platforms
-Similarities (and differences vs. Sanger):
•library preparation:
starting material: short fragments of nucleic acids
adapter ligation
multiplexing (MID tags)
•clonal amplification (not for 3rd generation sequencing)
•massive parallel sequencing
•the use of physical location to identify unique reads is a critical concept for all next
generation sequencing systems. The density of the reads and the ability to record
them without interfering noise is vital to the throughput of a given instrument.
•signal needs to be processed and post-treated to get the individual sequences
•complex data analysis due to the big amount of data
-Differences:
•Clonal amplification method/sequencing technology/signal detection
•Throughput
•Read-length
•Run time
•Cost per base
12. 17
DNA fragmentation and in vitro adaptor ligation
Different kinds of libraries (amplicons, shot-gun,
cDNA….)
emulsion PCR bridge PCR
454 sequencing Illumina technologyIon Proton/PGM
Pyrosequencing Semiconductor sequencing 4-colour fluorescent nucleotides
1
2
3
11
22
33
Library preparation
Clonal amplification
Cyclic array sequencing
NGS general workflow
13. 18
-1 starting effective fragment per microreactor
- ~106 microreactors per ml
- All processed in parallel (Clonal amplification)
High-speed
shaker
Clonal amplification by emPCR (454, Ion)
emPCR based systems (Roche, SoLID, Ion)
14. 19
Clonal amplification by emPCR (454, Ion)
Clonal amplification??
No empty beads
No beads containing more than one
amplified fragment
1) Bead vs. starting DNA quantity titration
2) Optimal enrichment:
Melt
dsDNA
Unión de Primer marcado
con Biotina a bolas de
captura con ssDNA
Adición de bolas
magnéticas con
estreptavidina
Melt
5-20% OK
15. 20Generación de clusters: PCR “en puente” 100-200 millones de clusters
HiSeq2500: 2 “flow-cells”, 8 carriles por celda
Unión de cadenas sencillas a los adaptadores
Eliminación de las cadenas reversas
Bloqueo y adición primer secuenciación
Clusters clonales de cadena doble
Bridge amplification (Illumina)
16. 21
Metal coated PTP reduces crosstalk
29 μm well diameter (20/bead)
3,400,000 wells per PTP
GS FLX 454 sequencing
17. 22
Pyrosequencing (sequencing by synthesis)
CCD Camera
“flowgram” (signal intensity is proportional to the
number of nucleotides incorporated in the
sequence)
- throughput limited by the nº of wells in the PTP
- errors in homopolymers :S (454)
- long sequences (up to 1000bp) are achieved
- low throughput, very expensive reagents
GS FLX 454 sequencing
18. 23
Illumina sequencing
- Limited by the fragment length than can “bridge”
- Labelled nucleotides are not incorporated as efficiently as
native ones
- Short sequences
-Strand-specific errors, substitutions towards the end of the
read, base substitution errors (sistematic error GGT >GGG)
-High throughput, expensive machines, cost per Mb OK
Liberación secuencial de 4
nucleótidos fluorescentes
Incorporación
Captación de imagen
Eliminación terminador 3’
Reversible dye terminator nucleotides (sequencing by synthesis)
19. 24
Fragmentación
& secuencias adaptadoras
1. Liberación secuencial de nucleótidos no modificados
2. La incorporación de un nucleótido por la polimerasa libera un H+
3. Detección directa y simultánea de un cambio de pH en todos los
pocillos.
ION TORRENT (Life Techn.)
Amplificación clonal (emPCR sobre beads)
Deposición de las beads+DNA en los pocillos del chip
Ion Torrent sequencing
•pHmeter, no optical system: rapid output improvement based on chips
•Fast runs (native nucleotides)
•Inexpensible machine and reagents
•Fails in homopolymers detection
21. 26
PLATFORM ROCHE GS FLX+ 454 ILLUMINA HISEQ 2500 ION PROTON
Library preparation
emPCR Bridge amplification emPCR
Sequencing chemistry Pyrosequencing Reversible dye terminators pH change
Read length Up to 1000bp From 2x125 bp to 2x300 bp Up to 200 bp
Run time 22 hrs 7 hrs-6 days From 2 to 4 hrs
Throughput/run Up to 700 Mb 500-1000Gb (1Tb) 10Gb (PI), 100Gb (PII)
Equipment Cost 500.000 $ 750.000 $ 250.000 $
Reagents Cost/run 8.000 $ 5.500 $ 1.000 $
GOOD! Longest read length
High throughput/low cost per
base/ease of use
Quick, easy to use and cheap
BAD!
High error rate in
homopolymers (>6); very
expensive; low throughput;
not automatized at all
Short sequences
Strand-specific errors,
substitutions towards the end of
the read, base substitution errors
(sistematic error GGT >GGG)
Errors in homopolymers
Higher bias than Illumina
NGS platforms comparison
22. 27
NGS High-Throughput Platforms comparison
Two modes: Rapid Run and High Output
Single/Dual Flow Cells
PE 2 x 125 pb
120 Gb in 27 hours (Rapid)
1 Tb in 6 days (High)
20 exomes in a day
1 human genome in a day
30 RNAseq samples in 5 hours
Human exome, 30x, aprox. 800-1000 €
Human RNAseq (30Mreads, 100bp PE, strand
specific): aprox. 800-1000 €
Human whole genome 30x: 4000 €
HiSeq Xten
(10 HiSeqX)
Only High Output mode
Single/Dual Flow Cells
PE 2 x 150 pb
600 Gb in a day (dual flow cell)
1.8 Tb in 3 days (4x faster than HiSeq2500)
HiSeq XTen: 10.000 genomes at 30x per year
Ion Proton
Source: Nextgenseek.com & Allseq.com.
Todos estos costes son orientativos a mayo de 2.014 y de ninguna manera vinculantes para la UAT
Ion PI chip:
Up to 20 Gb output (specific. 10 Gb)
Read length:Up to 200 bp
Run time: 2-4 hrs
1 human exome (aprox. 1000 €)
Ion PII chip:
Up to 100 Gb output (expected 2014),
now reduced to 20-30 Gb at launch
Run time: 2-4 hrs
Read length: 100 pb
Human Whole Genome (10x, ?)
Ion PIII chip (???): 200 Gb output per run