SlideShare a Scribd company logo
1 of 18
Download to read offline
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2019 by Pacific Biosciences of California, Inc. All rights reserved.
Evaluation of HG002 v4 draft benchmark
against GATK calls on PacBio HiFi reads
William Rowell, Sr Bioinformatics Scientist, 2019-10-15
@nothingclever
AGENDA
-PacBio Sequencing Modes: Long reads (CLR) vs HiFi
-HiFi datasets available through GIAB
-Detecting variants in HiFi reads with GATK
HaplotypeCaller
-Evaluation of v4 draft benchmark
TWO MODES OF PACBIO SMRT SEQUENCING
Continuous Long Read
Sequencing (CLR)
consensus sequence
Long Read 1
.
.
.
.
.
.
.
Long Read n
Long reads >20 kb,
90% accuracy
HiFi reads ≤20 kb,
>99% accuracy
TWO MODES OF PACBIO SMRT SEQUENCING
Continuous Long Read
Sequencing (CLR)
consensus sequence
Long Read 1
.
.
.
.
.
.
.
Long Read n
Long reads >20 kb,
90% accuracy
Circular Consensus
Sequencing (CCS)
HiFi read
Subread 1
.
.
.
.
Subread n
HIFI READS MAP THROUGH DIFFICULT REGIONS
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74,
5463 (2019).
Short
reads
PacBio
HiFi
STRC
STRC is a congenital deafness gene that requires long reads to cover all exons.
PACBIO HIFI DATASETS FOR GIAB SAMPLES
Each dataset sequenced to approximately 30-fold coverage
Sample
Insert
length Platform Reads (SRA) Alignments
HG002 10 kb Sequel System https://bit.ly/2OCLeA2 https://bit.ly/2OCLeA2
HG002 15 kb Sequel System PRJNA520771 https://bit.ly/2p1ISA8
HG002 11 kb Sequel II System PRJNA527278 https://bit.ly/2VqdJm1
HG001 11 kb Sequel II System PRJNA540705 https://bit.ly/2AWtVSM
HG005 11 kb Sequel II System PRJNA540706 https://bit.ly/2ogGbuI
DETECTING VARIANTS IN HIFI READS WITH GATK
HAPLOTYPECALLER
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74,
5463 (2019).
HiFi reads
pbmm2
HaplotypeCaller
VariantFiltration
variant calls (vcf)
GATK4
SMRT Link
Mapping
DETECTING VARIANTS IN HIFI READS WITH GATK
HAPLOTYPECALLER
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74,
5463 (2019).
HiFi reads
pbmm2
HaplotypeCaller
VariantFiltration
variant calls (vcf)
GATK4
SMRT Link
Mapping
-High SNP Recall and Precision
-Lower Indel Recall and Precision, due to
1bp indel errors
DETECTING VARIANTS IN HIFI READS WITH GATK
HAPLOTYPECALLER
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74,
5463 (2019).
HiFi reads
pbmm2
HaplotypeCaller
VariantFiltration
variant calls (vcf)
GATK4
SMRT Link
Mapping
-High SNP Recall and Precision
-Lower Indel Recall and Precision, due to
1bp indel errors
-HaplotypeCaller optimized for error
mode of short reads
Indel
Mismatch
96.6%
PacBio HiFi
99.1%
Short reads
DETECTING VARIANTS IN HIFI READS WITH GATK
HAPLOTYPECALLER
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74,
5463 (2019).
HiFi reads
pbmm2
HaplotypeCaller
VariantFiltration
variant calls (vcf)
GATK4
SMRT Link
Mapping
-High SNP Recall and Precision
-Lower Indel Recall and Precision, due to
1bp indel errors
-HaplotypeCaller optimized for error
mode of short reads
-We recommend using a caller that can
adapt to the error mode of long reads,
such as DeepVariant
(see Pi-Chuan Chang’s lightning talk)
V4 DRAFT BENCHMARK INCREASES PRECISION AND TRUE
POSITIVE VARIANTS
Recall Precision TP
SNVs 99.7% 99.8% 3,314,633
Indels 86.1% 92.8% 444,945
Recall Precision TP
SNVs 99.7% 99.7% 3,306,764
Indels 85.9% 92.7% 444,342
GRCh38hs37d5
Recall Precision TP
SNVs 99.8% 99.6% 3,042,089
Indels 86.3% 92.4% 401,306
Recall Precision TP
SNVs 99.8% 99.5% 3,022,502
Indels 83.8% 92.5% 398,726
+ ~272k TP SNPs
+ ~44k TP INDELs
+ ~284k TP SNPs
+ ~46k TP INDELs
v3.3.2
v4 draft
MANUAL CURATION OF FP AND FN
General themes:
GATK misses or makes incorrect indel calls in homopolymer stretches
GATK false positives due to mis-mapped LINE elements and segmental duplications
GATK false negatives due to low coverage depth or mapping quality
15
19
2 3
1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Putative FN
Putative FP
Manually Curated Discordant Variants
Benchmark Correct GATK Callset Correct Unsure
Opportunities to improve variant calling:
-incorrect indel calls in homopolymer stretches (FP + FN)
-mis-mapped LINE elements and segmental duplications (FP)
-low mapping quality (FN)
FN IN CALLSET - UNSURE ABOUT BENCHMARK
Benchmark - homozygous T➔A
A/A
A/TA
T/A
A/TA
Illumina
PacBio HiFi
ONT
10X
FP IN CALLSET - UNSURE ABOUT BENCHMARK
Illumina
PacBio HiFi
ONT
10X
no coverage
C/T
C/T
C/T (odd allele frequency)
Benchmark - no call
FP IN CALLSET - UNSURE ABOUT BENCHMARK (CONT’D)
Illumina
PacBio HiFi
ONT
10X
Illumina
FP + FN IN CALLSET - BENCHMARK INCORRECT FOR STR
CONTRACTION
Benchmark - GGAG⨯9 deletion
low coverage
GGAG⨯2 deletion
~GGAG⨯2 deletion
GGAG⨯2 deletion
PacBio HiFi
ONT
10X
CONCLUSIONS
-v4 draft benchmark satisfies GIAB goal for GATK calls on HiFi reads:
-75% of putative FN and 95% of putative FP are clearly errors in the GATK callset
-Suggestions for improving the benchmark:
-Exclude regions with SNV disagreements between long/linked read datasets or odd
SNV frequencies (2:1, 3:1) in long/linked read datasets
-Require support from long reads for indels in repetitive regions with low short read
coverage
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2019 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio,
SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO
Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc.
All other trademarks are the sole property of their respective owners.
www.pacb.com
Poster 1866/W
Booth 1020

More Related Content

What's hot

Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normalGenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
New methods deep variant evaluation of draft v4alpha
New methods   deep variant evaluation of draft v4alphaNew methods   deep variant evaluation of draft v4alpha
New methods deep variant evaluation of draft v4alphaGenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphsGenomeInABottle
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethionGenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
Giab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGiab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGenomeInABottle
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomesGenomeInABottle
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomicsGenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 

What's hot (20)

Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
New methods deep variant evaluation of draft v4alpha
New methods   deep variant evaluation of draft v4alphaNew methods   deep variant evaluation of draft v4alpha
New methods deep variant evaluation of draft v4alpha
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Giab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGiab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarking
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Giab sv genotyping
Giab sv genotypingGiab sv genotyping
Giab sv genotyping
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 

Similar to GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccsGenomeInABottle
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assemblyRamya P
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers Golden Helix Inc
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justinGenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Sept2016 sv mt_sinai_assembly_discussionintro
Sept2016 sv mt_sinai_assembly_discussionintroSept2016 sv mt_sinai_assembly_discussionintro
Sept2016 sv mt_sinai_assembly_discussionintroGenomeInABottle
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Miten Jain
 
Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.Nathan Olson
 

Similar to GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK (20)

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Sept2016 sv nist_intro
Sept2016 sv nist_introSept2016 sv nist_intro
Sept2016 sv nist_intro
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
Ngs part iii 2013
Ngs part iii 2013Ngs part iii 2013
Ngs part iii 2013
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Sept2016 sv mt_sinai_assembly_discussionintro
Sept2016 sv mt_sinai_assembly_discussionintroSept2016 sv mt_sinai_assembly_discussionintro
Sept2016 sv mt_sinai_assembly_discussionintro
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.
 

More from GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seqGenomeInABottle
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanoporeGenomeInABottle
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samplesGenomeInABottle
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortiumGenomeInABottle
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...GenomeInABottle
 
How giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionHow giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionGenomeInABottle
 

More from GenomeInABottle (11)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
 
How giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionHow giab fits in the rest of the world introduction
How giab fits in the rest of the world introduction
 

Recently uploaded

Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!ibtesaam huma
 
Rheumatoid arthritis - Musculoskeletal disorders.ppt
Rheumatoid arthritis - Musculoskeletal disorders.pptRheumatoid arthritis - Musculoskeletal disorders.ppt
Rheumatoid arthritis - Musculoskeletal disorders.pptraviapr7
 
Plant Fibres used as Surgical Dressings PDF.pdf
Plant Fibres used as Surgical Dressings PDF.pdfPlant Fibres used as Surgical Dressings PDF.pdf
Plant Fibres used as Surgical Dressings PDF.pdfDivya Kanojiya
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Classmanuelazg2001
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiGoogle
 
Informed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxInformed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxSasikiranMarri
 
SGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdf
SGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdfSGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdf
SGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdfHongBiThi1
 
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...MehranMouzam
 
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...sdateam0
 
Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)Mohamed Rizk Khodair
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfSasikiranMarri
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners
 
Chronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptx
Chronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptxChronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptx
Chronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptxSasikiranMarri
 
Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...
Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...
Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...Dr. Dheeraj Kumar
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Badalona Serveis Assistencials
 
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfSGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfHongBiThi1
 
Culture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptxCulture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptxDr. Dheeraj Kumar
 

Recently uploaded (20)

Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!
 
Rheumatoid arthritis - Musculoskeletal disorders.ppt
Rheumatoid arthritis - Musculoskeletal disorders.pptRheumatoid arthritis - Musculoskeletal disorders.ppt
Rheumatoid arthritis - Musculoskeletal disorders.ppt
 
Plant Fibres used as Surgical Dressings PDF.pdf
Plant Fibres used as Surgical Dressings PDF.pdfPlant Fibres used as Surgical Dressings PDF.pdf
Plant Fibres used as Surgical Dressings PDF.pdf
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Class
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali Rai
 
Informed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxInformed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptx
 
SGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdf
SGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdfSGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdf
SGK HÓA SINH ENZYM 2006 CHỊ THU RẤT HAY.pdf
 
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
 
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
 
Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdf
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...
 
Chronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptx
Chronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptxChronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptx
Chronic-Fatigue-Syndrome-CFS-Understanding-a-Complex-Disorder.pptx
 
Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...
Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...
Units of Radiation Measurements, Quality Specification, Half-Value Thickness,...
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
 
JANGAMA VISHA .pptx-
JANGAMA VISHA .pptx-JANGAMA VISHA .pptx-
JANGAMA VISHA .pptx-
 
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfSGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
 
Culture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptxCulture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptx
 

GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK

  • 1. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2019 by Pacific Biosciences of California, Inc. All rights reserved. Evaluation of HG002 v4 draft benchmark against GATK calls on PacBio HiFi reads William Rowell, Sr Bioinformatics Scientist, 2019-10-15 @nothingclever
  • 2. AGENDA -PacBio Sequencing Modes: Long reads (CLR) vs HiFi -HiFi datasets available through GIAB -Detecting variants in HiFi reads with GATK HaplotypeCaller -Evaluation of v4 draft benchmark
  • 3. TWO MODES OF PACBIO SMRT SEQUENCING Continuous Long Read Sequencing (CLR) consensus sequence Long Read 1 . . . . . . . Long Read n Long reads >20 kb, 90% accuracy
  • 4. HiFi reads ≤20 kb, >99% accuracy TWO MODES OF PACBIO SMRT SEQUENCING Continuous Long Read Sequencing (CLR) consensus sequence Long Read 1 . . . . . . . Long Read n Long reads >20 kb, 90% accuracy Circular Consensus Sequencing (CCS) HiFi read Subread 1 . . . . Subread n
  • 5. HIFI READS MAP THROUGH DIFFICULT REGIONS Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). Short reads PacBio HiFi STRC STRC is a congenital deafness gene that requires long reads to cover all exons.
  • 6. PACBIO HIFI DATASETS FOR GIAB SAMPLES Each dataset sequenced to approximately 30-fold coverage Sample Insert length Platform Reads (SRA) Alignments HG002 10 kb Sequel System https://bit.ly/2OCLeA2 https://bit.ly/2OCLeA2 HG002 15 kb Sequel System PRJNA520771 https://bit.ly/2p1ISA8 HG002 11 kb Sequel II System PRJNA527278 https://bit.ly/2VqdJm1 HG001 11 kb Sequel II System PRJNA540705 https://bit.ly/2AWtVSM HG005 11 kb Sequel II System PRJNA540706 https://bit.ly/2ogGbuI
  • 7. DETECTING VARIANTS IN HIFI READS WITH GATK HAPLOTYPECALLER DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping
  • 8. DETECTING VARIANTS IN HIFI READS WITH GATK HAPLOTYPECALLER DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping -High SNP Recall and Precision -Lower Indel Recall and Precision, due to 1bp indel errors
  • 9. DETECTING VARIANTS IN HIFI READS WITH GATK HAPLOTYPECALLER DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping -High SNP Recall and Precision -Lower Indel Recall and Precision, due to 1bp indel errors -HaplotypeCaller optimized for error mode of short reads Indel Mismatch 96.6% PacBio HiFi 99.1% Short reads
  • 10. DETECTING VARIANTS IN HIFI READS WITH GATK HAPLOTYPECALLER DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping -High SNP Recall and Precision -Lower Indel Recall and Precision, due to 1bp indel errors -HaplotypeCaller optimized for error mode of short reads -We recommend using a caller that can adapt to the error mode of long reads, such as DeepVariant (see Pi-Chuan Chang’s lightning talk)
  • 11. V4 DRAFT BENCHMARK INCREASES PRECISION AND TRUE POSITIVE VARIANTS Recall Precision TP SNVs 99.7% 99.8% 3,314,633 Indels 86.1% 92.8% 444,945 Recall Precision TP SNVs 99.7% 99.7% 3,306,764 Indels 85.9% 92.7% 444,342 GRCh38hs37d5 Recall Precision TP SNVs 99.8% 99.6% 3,042,089 Indels 86.3% 92.4% 401,306 Recall Precision TP SNVs 99.8% 99.5% 3,022,502 Indels 83.8% 92.5% 398,726 + ~272k TP SNPs + ~44k TP INDELs + ~284k TP SNPs + ~46k TP INDELs v3.3.2 v4 draft
  • 12. MANUAL CURATION OF FP AND FN General themes: GATK misses or makes incorrect indel calls in homopolymer stretches GATK false positives due to mis-mapped LINE elements and segmental duplications GATK false negatives due to low coverage depth or mapping quality 15 19 2 3 1 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Putative FN Putative FP Manually Curated Discordant Variants Benchmark Correct GATK Callset Correct Unsure Opportunities to improve variant calling: -incorrect indel calls in homopolymer stretches (FP + FN) -mis-mapped LINE elements and segmental duplications (FP) -low mapping quality (FN)
  • 13. FN IN CALLSET - UNSURE ABOUT BENCHMARK Benchmark - homozygous T➔A A/A A/TA T/A A/TA Illumina PacBio HiFi ONT 10X
  • 14. FP IN CALLSET - UNSURE ABOUT BENCHMARK Illumina PacBio HiFi ONT 10X no coverage C/T C/T C/T (odd allele frequency) Benchmark - no call
  • 15. FP IN CALLSET - UNSURE ABOUT BENCHMARK (CONT’D) Illumina PacBio HiFi ONT 10X
  • 16. Illumina FP + FN IN CALLSET - BENCHMARK INCORRECT FOR STR CONTRACTION Benchmark - GGAG⨯9 deletion low coverage GGAG⨯2 deletion ~GGAG⨯2 deletion GGAG⨯2 deletion PacBio HiFi ONT 10X
  • 17. CONCLUSIONS -v4 draft benchmark satisfies GIAB goal for GATK calls on HiFi reads: -75% of putative FN and 95% of putative FP are clearly errors in the GATK callset -Suggestions for improving the benchmark: -Exclude regions with SNV disagreements between long/linked read datasets or odd SNV frequencies (2:1, 3:1) in long/linked read datasets -Require support from long reads for indels in repetitive regions with low short read coverage
  • 18. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2019 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. www.pacb.com Poster 1866/W Booth 1020