A presentation for people intersted in understanding how Illumina adapter ligation, clustering ands SBS sequencing work. Follow core-genomics http://core-genomics.blogspot.co.uk/
How to cluster and sequence an ngs library (james hadfield160416)
1. ‘How to prepare, cluster and sequence an NGS
library’
AN OVERVIEW OF NGS IN THE GENOMICS CORE
– Introduction
– Understanding library prep
– Understanding clustering and sequencing
– Understanding instruments
– NGS QC
– NGS applications
2. A potted history of Illumina sequencing
200Gb 500Gb1Gb 25Gb 1000Gb 1500Gb
1998
20142007
2010
2011 20152004
2006 2012
1994
19. Understanding cluster generation (2500 etc)
A) Diluted & denatured libraries are annealed to lawn oligos at their 3’ end, and a
polymerase creates a covalently attached copy of the library molecule.
B) The original strand is removed by denaturation with NaOH.
C) In non-denaturing conditions the library molecule bends and hybridises to a lawn
oligo complementary to the 5’ end, and a polymerase creates a second covalently
attached molecule. This amplification is repeated to create a cluster with around 1000
copies of the original library molecule.
A B C
20. Understanding cluster generation (2500 etc)
D E C G H
D) Clusters are linearized by cleavage at the 3’ end of the original library molecule, and
denaturation leaves the single stranded DNA which will be sequenced. A sequencing
primer is hybridised* and sequencing-by-synthesis generates the first read in your fastq
file.
-) For single-end indexing the the SBS template is removed by denaturation, and the index
1 sequencing primer is hybridised ready to generate index1 (i7). Dual-indexing is
complicated and differs on single- or paired-end flowcells but the process is essentially the
same to generate index two (i5).
E-G) For paired-end sequencing the SBS template is removed by denaturation, the cluster
is re-amplified for several cycles, cleaved at the 5’ end the paired-end sequencing primer
hybridised ready to generate read 2.
*Beware: if you create new adapters let us know if you need a custom sequencing primer
21. Understanding cluster generation (X Ten & 4000)
Exclusion Amplification
The same hybridisation and solid-surface amplification occurs but in an all-in-one
phase called “exclusion amplification” (ExAmp). Once a library molecule “lands” in a
well it should occupy it completely.
30. Different sequencing configurations
2500 Rapid
150M reads
SE 50bp 85%Q30
PE 250bp 75%Q30
PE 150 2 days
2500 High output
250M reads
SE 50bp 85%Q30
PE 125bp 80%Q30
PE 125 6 days
4000 High output
312M reads
SE 50bp 85%Q30
PE 150bp 75%Q30
PE 150 3 days
31. HiSeq 4000 considerations
CLUSTERING IS VERY DIFFERENT FROM 2500
– PE150 - >125 is not great*
– %Q30 “passes Illumina spec”*
– ExAmp duplicates*
– Need to consider how you handle duplicates
– RNA-seq is fine
– Exome-seq is fine
– Genomes are fine
34. NGS QC – library prep
QUALITY CONTROL OF LIBRARIES IS IMPORTANT.
TITRATION FLOWCELLS AND FAILED RUNS ARE EXPENSIVE.
TRY TO IDENTIFY ISSUES BEFORE RUNNING ANY LANES.
QC IS SPECIFIC TO YOUR SAMPLES.
QUANTITATION OF LIBRARIES IS IMPORTANT.
SOME QC CAN ONLY BE DONE ONCE YOU HAVE GENERATED DATA
Good
Bad
Bioanalyser qPCR Analysis
37. NGS QC – MGA
LIBRARY QC – CONTAMINANT DETECTION
SAMPLE 100,000 READS FROM FASTQ
READS TRIMMED TO 36BP
ALIGN TO MULTIPLE GENOMES USING BOWTIE
LIBRARY QC – ADAPTER DETECTION
SAMPLE 100,000 READS FROM FASTQ
READS CONVERTED TO FASTA
ALIGN TO “ADAPT-OME” USING EXONERATE
LIBRARY QC- YIELD
COUNT NUMBER OF READS (SINGLE-END ONLY)
DISPLAY NUMBER ON A PRE-DEFINED SCALE
DISPLAY LANES IN FLOWCELL CONFIGURATION
42. The Genomics Core sequencing services
This Tweet is
6 hours old
There are 13 samples
in the queue
It will take
about 1 week
to sequence
your sample
There is 1x
paired-end
125bp sample
in the queue
This is
driven by our
Genologics
LIMs
Sequencing
is on our
Illumina
sequencers
43. Service metrics Jan 2016
– TAT has been 2-3 weeks (often as little as 1 week)
– Most sequencing works very well, but…
48. A genomic case report
NFKBIA S32G
SIFT: deleterious(0)
PolyPhen: probably_damaging(0.979)
Notes de l'éditeur
Show my own starting point in radionucleotide four base sequencing and semi-automated basecalling…through Solexa, Manteia, Illumina GAI, MiSeq, HiSeq, NextSeq and X Ten
Most people looking at these slides will already understand PCR, qPCR and Bioanalyser.
You only need to understand End-repair, A-tailing and adapter-ligation to be reasonably competent at NGS library prep.
Spend time understanding these key steps and you can easily move between the many different library prep technologies.
The next few slides will show how the simple steps of PCR, qPCR and Bioanalyser can be combined with End-repair, A-tailing and adapter-ligation deliver an NGS library prep protocol – exemplified in this case as Illumina TruSeq Nano.
The next few slides will show how the simple steps of PCR, qPCR and Bioanalyser can be combined with End-repair, A-tailing and adapter-ligation deliver an NGS library prep protocol – exemplified in this case as Illumina TruSeq Nano.
And by adding quantified ChIP DNA or skipping PCR you turn TruSeq Nano into Tru Seq PCR free or ChIP-seq
And by starting with RNA you turn TruSeq Nano into RNA-seq
Comparing Nextera and Nextera exomes to TruSeq
Comparing Rubicon to TruSeq
All library prep types shown together for comparison. Almost all other preps rely on varying the starting material or adding specific steps to the workflow. Look at http://www.illumina.com/content/dam/illumina-marketing/documents/applications/ngs-library-prep/ForAllYouSeqMethods.pdf
Still the best introductory paper
An overview of the library prep workflow
Another (mine) overview of the library prep workflow
Another (mine) overview of the library prep workflow, this time focussing on how the adapter work (you really need the PPT animation to work)
Users generally start with tiny amounts of DNA, it gets massively amplified in the library prep so we can carefully quantify it, before it gets massively diluted for sequencing. Can we go straight across the bottom of this graph?
An overview of theNextera library prep workflow
An overview of clustering workflow
Illumina sequencing is not single molecule detection. Single molecules are amplified as ‘clusters’ on a ‘flow-cell’ think PCR on a slide…
Clustering explained: an introduction to exclusion amplification chemistry
Clustering explained: how Illumina are likely to increase yield in the future
Sanger sequencing
Pyrosequencing
Illumina sequencing-by-synthesis (SBS)
Illumina sequencing-by-synthesis (SBS) explained
Different versions of Illumina sequencing-by-synthesis (SBS) on different sequencers
The anatomy of a HiSeq
Different HiSeq flowcells
HiSeq 4000 specifications
MiSeq and long read sequencing
NextSeq (not really explained at al)!
QC of NGS libraries and data
QC of NGS libraries and data: FastQC
QC of NGS libraries and data: MGA
QC of NGS libraries and data: MGA
QC of NGS libraries and data: MGA, a good flowcel
QC of NGS libraries and data: MGA, a bad flowcel
The CRUK-CI Genomics Core sequencing facility
Follow us on Twitter to get updates on how long our queue is
Our Twitter queue Tweets explained
Performance of the Genomics Core sequencing service on HiSeq 2500
@RNA_seq : a Twitter bot that posts PubMed papers on RNA-seq
@Exome_seq : a Twitter bot that posts PubMed papers on Exome sequencing
The beautiful labour of love that is “For all you seq”
Sequencing of a child with a severe Immunodeficiency to try and identify a causal mutation.
Illumina TruSeq PCR-free library prep of the trio and sequencing on HiSeq 2500
Sequencing of a child with a severe Immunodeficiency to try and identify a causal mutation
Identification of a de novo mutation in NFKB1A that abrogates B-cell response
Bone-marrow transplant with a one-mismatch transplant leads to significant improvement in patients condition