SlideShare une entreprise Scribd logo
1  sur  11
High Throughput Sequencing 
Technologies: 
What We Can Know 
Brian Krueger, PhD 
Duke University 
Center for Human Genome Variation
2nd Generation Sequencing Overview 
Fragmented 
DNA 
Align reads to a 
reference genome 
Ligate Adaptors 
Add 
Bases 
Bind Library and create 
clusters 
Repeat Hundreds of 
times on billions of 
Wash Wash 
clusters 
Cleave Image 
Sequencing Cycle 
Genomic 
DNA
2nd Generation Sequencing Advances 
• V3 System Chemistry 
– 300GB per Flowcell 
– 11 Days to Data 
– Genome: $4700, Exome: $790 
• V4 System Chemistry 
– 600GB per Flowcell 
– 6 Days to Data 
– Genome: $3000, Exome: $640 
• X System Chemistry 
– 1GB per Patterned Flowcell 
– 3 Days to Data 
– Genome: $1500, Exome: $500
Techniques for Acquiring Data 
• Whole Genome Sequencing 
– Obtain whole blood or tissue sample 
– Create sequencing libraries of all DNA fragments 
• Whole Exome Sequencing 
– Utilizes a selection protocol to fish out ONLY coding 
DNA sequences 
– Create sequencing libraries from enriched DNA 
– Reduces cost and analysis time 
• Custom Capture 
– Same protocol as Exome sequencing 
– Only target desired DNA sequences 
• Amplicon Sequencing 
– Use PCR to amplify target DNA 
– Sequence amplified DNA (Amplicon) 
• RNA-Seq 
– Extract RNA, capture mRNA, convert to cDNA 
– Used for differential gene expression analyses, RNA 
isoform detection
CCoommmmoonn DDNNAA MMuuttaattiioonnss 
Chromosome 
Sequence 
variants 
Structural 
variants 
Referenc 
Single nucleotide variant 
Small insertion 
Small deletion 
Deletion 
Duplication A B C C D 
Inversion A B D C 
Translocation 
e A B C D 
ATCGGGTCATGTCA 
A B C D 
ATCGGGTCATATCA 
A B C D 
ATCGGGTCATGACGTCA 
A B C D 
ATCGGGTCAT 
A C D 
A B E F 
G 
Credit: Elizabeth Ruzzo, PhD, CHGV
Disadvantages of Current Techniques 
• Amplification errors 
– All polymerases have an inherent error rate (10-6-10-7) 
• GC bias 
– PCR bias against GC rich sequences 
– Exome capture bias against GC rich sequences 
• Trouble detecting small insertions and deletions 
– Capture baits may not hybridize well 
– Capture cannot be used to reliably detect large CNVs 
• Cannot be used for De novo assembly 
– Read length too short to span long repeat regions 
– Not good for detecting trinucleotide repeat 
expansions 
• Miss large structural variations 
– Translocations and inversions likely will be missed 
– Require significant read depth at break points for 
these variations to be detected 
• Trouble with RNA-seq isoform detection 
– Like large structural variations, hard to accurately 
detect all splice isoforms using short read technology 
A B B B C D 
A B B B B C D 
A B B B B B B C D 
X 
A C D 
X 
A E F G
Solutions! 
• Solutions for many of these problems exist 
– As always, come at a cost 
• Whole Genome Sequencing - $1500 
– Reduce Exome Artifacts 
• Better Indel Detection and higher 
coverage in high GC regions 
• Can be used to detect large copy 
number variations 
• PCR Free Whole Genome Sequencing 
– Reduces amplification bias and polymerase 
error artifacts 
• WGS will miss large structural variations 
(Inversions, Translocations, microsatellites) 
– Combine with long read technologies 
– Added cost of $1000-$10,000 
– Higher cost = better detection
Long-ish Read Sequencing Technologies 
• Mate-Pair Sequencing 
– Insert size increased from 300bp to 3-8KB 
– Sequence ends of mate-pairs to pair reads 
over much longer distances 
– Use short reads to fill gaps 
– Adds $1000 to Genome cost
Long-ish Read Sequencing Technologies 
• Illumina Synthetic Long Reads 
– Fragment Genomic DNA to 10KB 
– Dilute across a 384 well plate 
– Fragment clonal 10KB fragments into 
300bp fragments and barcode 
– Sequence fragments and use barcodes to 
re-create the long reads synthetically 
– Use as a short read scaffold to perform De 
Novo sequencing 
– Has been used in HLA sequencing and De 
Novo assembly of the Drosophila genome 
including accurate mapping of 80% of the 
transposable elements 
– Adds $1800 to Genome cost 
10kb fragmentation 
Barcoding and clonal amp 
Nextera prep 
Sequencing
True Long Read Sequencing Technologies 
• Defined as single molecule sequencing 
• Less complex sample prep and much longer read length 
(1-100kb) compared to 200-400bp for 2nd Gen 
• Two categories 
– Sequencing by synthesis 
• Pioneered by Pacific Biosciences 
• Sequencer uses super microscopes and polymerase bound 
nanowells to WATCH DNA as it is sequenced in real time 
• Nanowells filled with DNA bases 
• Fluorescence of base only detected at the polymerase 
– Direct sequencing by passing DNA through a nanopore 
• Bases fed through a membrane bound nanopore 
• Ionic difference between both sides of the membrane 
• Detect how ion flow changes at the pore as each base passes 
through 
• Oxford Nanopore, Base4, Stratos Genomics, Genia 
• Bleeding edge technology 
– Many technical hurdles with very high error rates (10-40%) 
– Current best use is to create scaffolds for De Novo assembly 
– Very expensive technology 
• Costs 3-10x as much as Illumina to do whole genome 
sequencing 
PacBio 
Oxford Nanopore
Questions?? 
• Reading/Viewing Material: 
• Sequencing Methods Ecosystem - 
http://res.illumina.com/documents/products/research_reviews/sequencing-methods- 
review.pdf 
• Illumina TruSeq synthetic long-reads empower de novo assembly and resolve 
complex, highly repetitive transposable elements - 
http://biorxiv.org/content/early/2014/01/19/001834 
• Characterization of the human ESC transcriptome by hybrid sequencing - 
http://www.pnas.org/content/110/50/E4821.short 
• Nanopore Sequencing Web Conference - http://www.youtube.com/watch? 
v=UtXlr19xTh8

Contenu connexe

Tendances

Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Mrinal Vashisth
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Correlagen next gen presentation 042711
Correlagen next gen presentation 042711Correlagen next gen presentation 042711
Correlagen next gen presentation 042711algunduz28
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencingPeter Egorov
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeBrian Krueger
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingSajad Rafatiyan
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataAdrian Baez-Ortega
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingTapish Goel
 
Comparison of three High-throughput sequencing techniques
Comparison of three High-throughput sequencing techniquesComparison of three High-throughput sequencing techniques
Comparison of three High-throughput sequencing techniquesRamesh Pothuraju
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 

Tendances (20)

Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Correlagen next gen presentation 042711
Correlagen next gen presentation 042711Correlagen next gen presentation 042711
Correlagen next gen presentation 042711
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencing
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Comparison of three High-throughput sequencing techniques
Comparison of three High-throughput sequencing techniquesComparison of three High-throughput sequencing techniques
Comparison of three High-throughput sequencing techniques
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 

En vedette

High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slidesharehansjansen9999
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...Lex Nederbragt
 
A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...
A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...
A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...SimonUribeConvers
 
Dr Nicholas Shackel - Bioinformatics and Personalised Medicine
Dr Nicholas Shackel - Bioinformatics and Personalised MedicineDr Nicholas Shackel - Bioinformatics and Personalised Medicine
Dr Nicholas Shackel - Bioinformatics and Personalised Medicinecentenaryinstitute
 
Voluntariado y capital social
Voluntariado y  capital socialVoluntariado y  capital social
Voluntariado y capital socialAlexander Dorado
 
GENOMA HUMANO ARIANA PADRON
GENOMA HUMANO ARIANA PADRONGENOMA HUMANO ARIANA PADRON
GENOMA HUMANO ARIANA PADRONAriana Padron
 
Potential applications for Oxford Nanopore sequnecing
Potential applications for Oxford Nanopore sequnecingPotential applications for Oxford Nanopore sequnecing
Potential applications for Oxford Nanopore sequnecingYaniv Erlich
 
Capture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid Detection
Capture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid DetectionCapture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid Detection
Capture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid DetectionCFTCC
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsJoe Parker
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerJoe Parker
 
Using field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsUsing field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsJoe Parker
 
Joe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe Parker
 
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...Joe Parker
 
V4 Sequencing Reagent Experience
V4 Sequencing Reagent ExperienceV4 Sequencing Reagent Experience
V4 Sequencing Reagent ExperienceBrian Krueger
 
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Joe Parker
 

En vedette (20)

High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...
A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...
A targeted subgenomic approach for phylogenomics based on microfluidic pcr an...
 
Dr Nicholas Shackel - Bioinformatics and Personalised Medicine
Dr Nicholas Shackel - Bioinformatics and Personalised MedicineDr Nicholas Shackel - Bioinformatics and Personalised Medicine
Dr Nicholas Shackel - Bioinformatics and Personalised Medicine
 
Voluntariado y capital social
Voluntariado y  capital socialVoluntariado y  capital social
Voluntariado y capital social
 
GENOMA HUMANO ARIANA PADRON
GENOMA HUMANO ARIANA PADRONGENOMA HUMANO ARIANA PADRON
GENOMA HUMANO ARIANA PADRON
 
Potential applications for Oxford Nanopore sequnecing
Potential applications for Oxford Nanopore sequnecingPotential applications for Oxford Nanopore sequnecing
Potential applications for Oxford Nanopore sequnecing
 
Capture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid Detection
Capture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid DetectionCapture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid Detection
Capture and Release Nanopore-Nanofiber Mesh: pH Triggered Nucleic Acid Detection
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasets
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
Using field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsUsing field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomics
 
Joe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformatics
 
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
 
V4 Sequencing Reagent Experience
V4 Sequencing Reagent ExperienceV4 Sequencing Reagent Experience
V4 Sequencing Reagent Experience
 
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
 
Oxford Nanopore MinION
Oxford Nanopore MinIONOxford Nanopore MinION
Oxford Nanopore MinION
 

Similaire à High Throughput Sequencing Technologies: What We Can Know

Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets? ehsan sepahi
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondAdamCribbs1
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencingshinycthomas
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptRuthMWinnie
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptEdizonJambormias2
 
Next generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementNext generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementanjaligoud
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?Nick Loman
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
Sequencing genes and genomes
Sequencing genes and genomesSequencing genes and genomes
Sequencing genes and genomessepidehsaroghi
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxRITHIKA R S
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGBilal Nizami
 
CRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingCRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingTristan Kempston
 
Molecular marker technology in studies on plant genetic diversity
Molecular marker technology in studies on plant genetic diversityMolecular marker technology in studies on plant genetic diversity
Molecular marker technology in studies on plant genetic diversityChanakya P
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing pptAshwini R
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reactionvikashkumar1866
 

Similaire à High Throughput Sequencing Technologies: What We Can Know (20)

NGS.pptx
NGS.pptxNGS.pptx
NGS.pptx
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
Microbial physiology in genomic era
Microbial physiology in genomic eraMicrobial physiology in genomic era
Microbial physiology in genomic era
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
Next generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementNext generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvement
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Sequencing genes and genomes
Sequencing genes and genomesSequencing genes and genomes
Sequencing genes and genomes
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptx
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
CRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingCRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse Modeling
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
Molecular marker technology in studies on plant genetic diversity
Molecular marker technology in studies on plant genetic diversityMolecular marker technology in studies on plant genetic diversity
Molecular marker technology in studies on plant genetic diversity
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing ppt
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reaction
 

Dernier

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Dernier (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

High Throughput Sequencing Technologies: What We Can Know

  • 1. High Throughput Sequencing Technologies: What We Can Know Brian Krueger, PhD Duke University Center for Human Genome Variation
  • 2. 2nd Generation Sequencing Overview Fragmented DNA Align reads to a reference genome Ligate Adaptors Add Bases Bind Library and create clusters Repeat Hundreds of times on billions of Wash Wash clusters Cleave Image Sequencing Cycle Genomic DNA
  • 3. 2nd Generation Sequencing Advances • V3 System Chemistry – 300GB per Flowcell – 11 Days to Data – Genome: $4700, Exome: $790 • V4 System Chemistry – 600GB per Flowcell – 6 Days to Data – Genome: $3000, Exome: $640 • X System Chemistry – 1GB per Patterned Flowcell – 3 Days to Data – Genome: $1500, Exome: $500
  • 4. Techniques for Acquiring Data • Whole Genome Sequencing – Obtain whole blood or tissue sample – Create sequencing libraries of all DNA fragments • Whole Exome Sequencing – Utilizes a selection protocol to fish out ONLY coding DNA sequences – Create sequencing libraries from enriched DNA – Reduces cost and analysis time • Custom Capture – Same protocol as Exome sequencing – Only target desired DNA sequences • Amplicon Sequencing – Use PCR to amplify target DNA – Sequence amplified DNA (Amplicon) • RNA-Seq – Extract RNA, capture mRNA, convert to cDNA – Used for differential gene expression analyses, RNA isoform detection
  • 5. CCoommmmoonn DDNNAA MMuuttaattiioonnss Chromosome Sequence variants Structural variants Referenc Single nucleotide variant Small insertion Small deletion Deletion Duplication A B C C D Inversion A B D C Translocation e A B C D ATCGGGTCATGTCA A B C D ATCGGGTCATATCA A B C D ATCGGGTCATGACGTCA A B C D ATCGGGTCAT A C D A B E F G Credit: Elizabeth Ruzzo, PhD, CHGV
  • 6. Disadvantages of Current Techniques • Amplification errors – All polymerases have an inherent error rate (10-6-10-7) • GC bias – PCR bias against GC rich sequences – Exome capture bias against GC rich sequences • Trouble detecting small insertions and deletions – Capture baits may not hybridize well – Capture cannot be used to reliably detect large CNVs • Cannot be used for De novo assembly – Read length too short to span long repeat regions – Not good for detecting trinucleotide repeat expansions • Miss large structural variations – Translocations and inversions likely will be missed – Require significant read depth at break points for these variations to be detected • Trouble with RNA-seq isoform detection – Like large structural variations, hard to accurately detect all splice isoforms using short read technology A B B B C D A B B B B C D A B B B B B B C D X A C D X A E F G
  • 7. Solutions! • Solutions for many of these problems exist – As always, come at a cost • Whole Genome Sequencing - $1500 – Reduce Exome Artifacts • Better Indel Detection and higher coverage in high GC regions • Can be used to detect large copy number variations • PCR Free Whole Genome Sequencing – Reduces amplification bias and polymerase error artifacts • WGS will miss large structural variations (Inversions, Translocations, microsatellites) – Combine with long read technologies – Added cost of $1000-$10,000 – Higher cost = better detection
  • 8. Long-ish Read Sequencing Technologies • Mate-Pair Sequencing – Insert size increased from 300bp to 3-8KB – Sequence ends of mate-pairs to pair reads over much longer distances – Use short reads to fill gaps – Adds $1000 to Genome cost
  • 9. Long-ish Read Sequencing Technologies • Illumina Synthetic Long Reads – Fragment Genomic DNA to 10KB – Dilute across a 384 well plate – Fragment clonal 10KB fragments into 300bp fragments and barcode – Sequence fragments and use barcodes to re-create the long reads synthetically – Use as a short read scaffold to perform De Novo sequencing – Has been used in HLA sequencing and De Novo assembly of the Drosophila genome including accurate mapping of 80% of the transposable elements – Adds $1800 to Genome cost 10kb fragmentation Barcoding and clonal amp Nextera prep Sequencing
  • 10. True Long Read Sequencing Technologies • Defined as single molecule sequencing • Less complex sample prep and much longer read length (1-100kb) compared to 200-400bp for 2nd Gen • Two categories – Sequencing by synthesis • Pioneered by Pacific Biosciences • Sequencer uses super microscopes and polymerase bound nanowells to WATCH DNA as it is sequenced in real time • Nanowells filled with DNA bases • Fluorescence of base only detected at the polymerase – Direct sequencing by passing DNA through a nanopore • Bases fed through a membrane bound nanopore • Ionic difference between both sides of the membrane • Detect how ion flow changes at the pore as each base passes through • Oxford Nanopore, Base4, Stratos Genomics, Genia • Bleeding edge technology – Many technical hurdles with very high error rates (10-40%) – Current best use is to create scaffolds for De Novo assembly – Very expensive technology • Costs 3-10x as much as Illumina to do whole genome sequencing PacBio Oxford Nanopore
  • 11. Questions?? • Reading/Viewing Material: • Sequencing Methods Ecosystem - http://res.illumina.com/documents/products/research_reviews/sequencing-methods- review.pdf • Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive transposable elements - http://biorxiv.org/content/early/2014/01/19/001834 • Characterization of the human ESC transcriptome by hybrid sequencing - http://www.pnas.org/content/110/50/E4821.short • Nanopore Sequencing Web Conference - http://www.youtube.com/watch? v=UtXlr19xTh8

Notes de l'éditeur

  1. And while the title of my talk is High Throughput Sequencing Technologies: What We Can Know, a better sub title would be, “How much would you like to spend?” Because in many cases in genomics, the limitation of our knowledge isn’t the technology but how much money we can reasonably allocate for obtaining that knowledge.
  2. First off I wanted to start with a quick refresher, the dominant sequencing technology today is Illumina sequencing by synthesis. This is commonly referred to as second generation or “next” generation sequencing. In this system, genomic DNA is sheared to 300-400 basepair fragments. These fragments then undergo library prep to repair the sheared ends and add adaptors with known sequences that are then used in the sequencing process. Library DNA is bound to a flowcell, clustered using bridge amplification and the sequencing cycle is initiated. Through this process, a camera takes a picture of the flowcell during each cyclic addition of fluorescently tagged nucleotides. Because the clusters never change their position, we can determine the sequence of the bases by watching how the color of the cluster changes after each cycle. This is done hundreds of times on billions of clusters to create the final sequenced short reads. This data is then aligned to a reference genome and further downstream analyses are performed to pull out variants and structural defects
  3. There have been two exciting advances in this space over the past year. Prior to January, the chemistry run on the current iteration of the HiSeq system was version 3 which produced 300 gigabases of data per flowcel in 11 days. Using this Chemistry a 40X genome cost $4700 and a 100x exome ran about $800. In April of this year Illumina released its new version 4 chemistry which allows for the generation of twice as much data in about half of the time. One added benefit to this new chemistry is that the reagent price dropped significantly which resulted in a $1700 reduction in the cost of a genome and a $150 reduction in the cost of an exome. Illumina also announced this year that they were releasing a higher capacity system called the HiSeqX which could generate twice as much data as a V4 system in half the time. This is achieved through the use of additional cameras for imaging and also an improved clustering scheme. Cost of a genome on the HiSeqX runs about $1500 once bioinformatics is accounted for, and when Illumina allows these systems to be used for exomes, they should run about $500. Unfortunately, Illumina only allows the X systems to be purchased as a 10 pack which limits the access of this technology; however, our group should have one of these systems in place early next year through a collaborative endeavor.
  4. While I only listed genome and exome prices on the previous slide we do also perform a wide variety of assays in the sequencing lab. Of course whole genome sequencing is done by acquiring genomic DNA from whole blood or tissue and sequencing those libraries directly, while whole exome sequencing is performed using a selection protocol and we use DNA baits to only pull out coding DNA sequences. Over the past year we have also performed over 10,000 custom capture sequencing runs. This process is similar to whole exome sequencing except the capture size is smaller and only targets desired genomic locations. A lot of companies are also now calling these custom panels. We have performed small amplicon sequencing projects in the past, but in most cases they’re expensive and don’t offer much of an added benefit over a custom capture. In the past year we’ve started doing more RNA-seq. This is typically combined with whole genome sequencing to see if we can correlate differential gene expression or differential isoform expression with genomic changes that occur outside the coding regions that are surveyed by exome sequencing.
  5. Of course the reason we perform these assays is to determine the genomic make-up of a patient or study subject with a focus on finding sequence variants. The techniques I mentioned on the previous slide have differing success in detecting these variations. For example, amplicon sequencing is best used for profiling SNVs or small indels in a small set of target amplicons while custom capture and exome sequencing are best suited for discovering SNVs and small indels across the coding DNA. Unfortunately, these two techniques will likely miss large structural variations such as deletions or duplications and this is due to a number of factors that I’ll discuss on the next slide. Whole genome sequencing does a better job of discovering indels along with deletions and duplications; however, because we use short reads that can’t span large regions we will miss things like inversions, translocations or large repeat expansions.
  6. As I said, there are some major disadvantages of using the current sequencing technology. This isn’t meant to scare you but it’s certainly something to keep in mind when reviewing your data. Second generation sequencing relies on a number of amplification steps both after library preparation and while generating sequencing clusters. Because polymerases have an inherent error rate, an error could be introduced into the sequence every 10 to 100 million bases. The current techniques also suffer from a bias against GC rich sequences. This is both a PCR efficiency issue and in the case of exome sequencing a sequence capture issue. GC rich sequences do not amplify as efficiently and they also are harder to elute from capture baits. Exome sequencing also has a problem detecting small insertions and deletions depending on their size because capture baits may not hybridize well to sequences that have a high degree of variation causing these sequences to be under represented. Exome capture also can’t be used to reliably detect large CNVs due to the differential capture efficiency of individual probes – It’s hard to tell if the read depth variation is due to true variation or just capture efficiency. Because of these factors, GC rich sequences are usually underrepresented in datasets. Exome capture manufacturers have realized this is a problem and the most recent versions of these kits try to account for this problem with varying degrees of success. Because the technology we use relies on short reads we also can’t use this technology for de novo assembly of a genome, and it will miss trinucleotide repeat expansions. Additionally, the short reads make it hard to detect large structural variants such as inversions and translocations because without significant coverage at break points these are very hard to detect as variations. For this very same reason, short reads also aren’t ideal for RNA-seq when the goal is to detect isoforms because if we don’t have reads that span of all the splice sites in a transcript, it is very hard to identify individual transcript isoforms.
  7. However, solutions to most of these problems exist but as with anything they come at a cost. One way to counter the indel and GC bias problem of exome sequencing is to perform whole genome sequencing. This has the added benefit of allowing us to detect many of the large structural variations. To further reduce the GC bias problem, there are now kits available for PCR free whole genome sequencing which reduce PCR biases and polymerase artifacts. Since the cost of whole genome sequencing is dropping rapidly, it may make sense in the near future to perform whole genome sequencing over whole exome sequencing. Finally, to detect large structural variations including repeat expansions, translocations and inversions we can use some of the newly developed long read technologies. We need to keep in mind though these techniques have a varying degree of success and generally the more expensive the technique the better the data quality. These can add anywhere from $1000 to $10,000 to the cost of a genome run but in some cases the added cost may be worth it.
  8. At CHGV we are currently reviewing two of the short read based technologies that allow for some access in repetitive regions. We’re performing a side-by-side analysis of these technologies on a C9orf72 ALS sample. One of these techniques is mate-pair sequencing which is essentially the same technique we use to create sequencing libraries except the insert size, or space between sequenced regions, is much larger. The larger insert size allows us to span longer distances with the hope that variant sequence will be detected to better inform the alignment of both the short and mate-pair reads.
  9. The second technique we’re testing is a new protocol released by Illumina called Synthetic long reads. This technique uses a dilution scheme to recreate 10 kilobase reads in-silico from barcoded subfragments. In this scheme the genomic DNA is sheared to 10KB, barcoded, fragmented again into short reads and sequenced. Because we know the size of the original fragment and the source of the sub fragments we can use this information to recreate the original long reads using short reads. This technique has been used to sequence the HLA region and also to perform De Novo sequencing of the Drosophila genome including accurate mapping of 80% of the drosophila transposable elements. We remain optimistic about the success of these techniques, and hope they perform well on our C9orf sample, but we’ll see what the final result is after our evaluation of both protocols.
  10. However, to get a more accurate view of these repeat expansions and large structural variations we could use a true long read sequencing technology. These techniques have become more popular over the past few years as their base accuracy has improved and their cost has become more reasonable. These single molecule sequencing techniques offer some advantages over second generation sequencing in that their sample preparation is less complex and they provide much longer reads on the order of 1 kilobase to 100 kilobases. There are currently two long read technologies available. One of them is Pacific Biosystems sequencing by synthesis which uses a microscope to watch a polymerase as it sequences DNA and records the flash that appears in the active site. The second type of single molecule sequencing is nanopore based and in this system DNA is fed through a porous membrane and the DNA sequence is detecting by sensing changes in the flow of ions through the pore. Because DNA bases are different sizes they restrict the flow of ions to a different degree and this can be used to determine the sequence of the bases. There are a few proof of concept nanopore systems in development and I expect this technology to expand rapidly over the next 5 years. Oxford nanopore has finally allowed us to start using its technology and while the base accuracy is awful, the results are promising for a number of use cases – particular in bacterial community profiling. Quality should improve significantly either through Oxford nanopore or through one of the other companies that are developing similar ingenious technology. While the base quality of these techniques is currently on the order of 60-40% and the cost is 3-10 times as much as illumina sequencning, these long reads can be used as a scaffold for highly accurate illumina short reads and give us access to sequence information that is impossible to detect with Illumina chemistry alone. To illustrate this, a researcher recently performed RNA-seq analysis using both illumina and PacBio reads and showed that the illumina system missed 90% of the RNA isoforms.
  11. Finally I’d like to end with a slide that offers some additional reading and viewing material. Garvin in Australia has released the first test run data from the HiSeqX and has made it available to the public. If you’re interested in playing with some new data, I recommend visiting their DNAnexus page and downloading the data. While I listed 5 techniques that we perform routinely in the sequencingg lab, there are a wide array of other techniques in this space we’d be willing to explore. Many of these are detailed in this Sequencing methods overview from illumina. If you’re interestted in synthetic long reads and how the technology works you can read the paper dettailing thte use of this system for de novo sequencing of the drosophila genome.