♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
GIAB ASHG 2019 Small Variant poster
1. Results from Adding Long and Linked Reads
NIST hosts the Genome in a Bottle (GIAB) Consortium that develops
metrology infrastructure for characterization of human whole genome
variant detection. Consortium products include:
• Characterization of seven broadly-consented human genomes including
2 son-mother-father trios released as Reference Materials (RMs)
• Reference data associated with RMs are benchmark variants and
genomic regions covering, for example, 87.8% of assembled bases in
chromosomes 1-22 in GRCh37 for the sample HG002
A limitation of the current GIAB benchmark is short read variant callers
perform poorly in genomic locations with high homology such as segmental
duplications and low-complexity repeat-rich regions. We incorporated
PacBio CCS long reads and 10x Genomics linked reads to generate a draft for
a new GIAB benchmark. Initial results show long and linked reads add
greater than 276,840 SNPs and 42,980 insertions/deletions to the
benchmark, mostly in regions difficult to map with short reads.
Overview
Integration data for HG002
Using long and linked reads to generate
a new Genome in a Bottle small variant benchmark
J. Wagner1, A. Carroll6, I.T. Fiddes3, A.M. Wenger2, W.J. Rowell2, N. Olson1, L. Harris1, J. McDaniel1, C. Xiao5, M. Salit4, J. Zook1, Genome in a Bottle Consortium
1) Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899; 2) Pacific Biosciences, 1305 O'Brien Drive, Menlo Park
CA 94025; 3) 10x Genomics, 7068 Koll Center Parkway, Pleasanton CA 94566; 4) Joint Initiative for Metrology in Biology, Stanford, CA 94305; 5) National Center for
Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894; 6) Google, Inc. Mountain View, CA
Ongoing and Future work
Integration Pipeline Process
Benchmark includes more bases, variants, and segmental duplications in v4
Comparison of Illumina RTG VCF against benchmark sets
• SNP FNs increases by a factor of more than 3, mostly due to new
benchmark variants in difficult to map regions and segmental duplications
Performance in medically-relevant genes in GRCh37
• v4 draft covers more of the MHC region, see poster 1707W for details
• Outside of MHC updates, top 5 genes with variants increased from v3.3.2
to v4 draft benchmark: TSPEAR (31), LAMA5 (28), FCGBP (18), TPSAB1 (15),
HSPG2 (13)
• PMS2 from ACMG59 has 2 more variants and RET, SCN5A, TNNI3 have 1
more variant covered in v4 draft benchmark that are not in v3.3.2
Sanger sequencing
• Performed long range PCR before
sequencing
• Confirmed 12 variants in CYP21A2,
which is a medically-relevant gene in
the MHC region
• Confirmed 6 variants in PMS2
Genome in a Bottle Consortium
Platform Characteristics Alignment; Variant Calling
PacBio Sequel II ~11Kbp reads; ~32x coverage
minimap2; GATK4
minimap2; DeepVariant
10X Genomics Linked reads; ~84x coverage LongRanger Pipeline
PASS variants #2
Benchmark regions
0/1 1/11/1
Benchmark calls 0/11/1
Callable regions #2
Callable regions #1
1/10/11/1PASS variants #1
InputMethods
1/1
Concordant
Discordant
unresolved
Discordant
arbitrated
Concordant
not callable
Variants in Medical Exome
(genes from OMIM, HGMD, ClinVar, UniProt)
Benchmark Regions v3.3.2 8,209
Benchmark Regions v4 draft 9,527
Difficult Region Description Bases Covered in
GRCh37
Bases Covered
in GRCh38
v0.6 SV Benchmark 32,596,754 32,872,907
Potential copy number variation 51,713,344 62,666,746
Tandem Repeats > 10kb 5,731,885 71,942,255
Highly similar and high depth segmental duplications 1,232,701 2,094,143
Regions that are collapsed and expanded from GRCh37/38
Primary Assembly Alignments 17,979,597 N/A
Modeled centromere and heterochromatin N/A 62,304,573
Subset v3.3.2 FNs v4 FNs
All SNPs 8,594 30,229
Low mappability 6,708 25,295
Segmental duplications 1,429 14,008
• Refine use of genome stratifications
• Adding variant calls from raw PacBio and Oxford Nanopore
• Improve benchmark for larger indels, homopolymers, and tandem repeats
• Improve normalization of complex variants
• Generating benchmark variants from diploid assemblies
• Machine learning
- Outlier detection, active learning
The input data for GIAB benchmark v3.3.2 consisted of Illumina, Complete
Genomics, Ion, 10X, and Solid technologies. The draft v4 benchmark
incorporates new PacBio CCS and 10x Genomics linked read data.
New members welcome! Sign up for newsletters at www.genomeinabottle.org
Volunteer to evaluate draft benchmark by emailing: justin.zook@nist.gov
Excluded all methods:
The following regions are excluded from all technologies and methods:
• Tandem Repeats < 51bp except GATK from Illumina PCR-free, Complete
Genomics, and CCS DeepVariant
• Tandem Repeats > 51bp and < 200bp except GATK from Illumina PCR-Free
and CCS DeepVariant
• Tandem Repeats > 200bp except CCS DeepVariant
• Homopolymers > 6bp except GATK from Illumina PCR-free, Complete
Genomics, Ion Exome, CCS
• Imperfect homopolymer > 10bp except GATK from Illumina PCR-Free
• Difficult to map regions for short reads except 10x and CCS
• LINE:L1Hs > 500 except Illumina MatePair, 10x, and CCS
• Segmental duplications except 10x and CCS
Evaluation by GIAB collaborators
Compared benchmark to callsets from a variety of technologies and variant
calling methods including:
• Illumina PCR-Free and Dragen
• 10x Genomics and Aquila (variants from local diploid assembly)
• PacBio CCS and GATK4
• PacBio CCS and Clair (Next generation of Clairvoyante)
• PacBio CCS and DeepVariant
• ONT Promethion and Clair
Preliminary results suggest that a majority of FPs and FNs are correct in the
benchmark and errors in the tested callsets.
v4 draft GRCh37 v4 draft GRCh38
Base pairs 2,504,027,936 2,509,269,277
Reference
covered
93.2% 91.03%
SNPs 3,323,773 3,314,941
Indels 519,152 519,494
Base pairs in
Segmental
Duplications
64,300,499 73,819,342
Arbitration Example
80.00%
85.00%
90.00%
95.00%
Percent of reference covered
Only in v3.3.2
GRCh37
Only in v4
draft GRCh37
SNPs INDELs
More
volunteers
welcomed
Genome in a Bottle
Consortium
SNPs INDELs
Only in v3.3.2
GRCh38
Only in v4
draft GRCh38
343,358
69,495
77,324
23,828
376,653
91,837
91,719
48,753