Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH

Benchmarking results for the GIAB Ashkenazim Jewish Trio demon-
strating our recommendations for visualizing benchmarking results.
See Zook et al. 2019 for methods used to generate the dataset.
Figure 6. Example IGV Session for Manual Curation.
Interpreting Small Variant Benchmarking Results
Figure 5. Precision-Recall scatter plots. Points represent different
stratifications. Poor performing stratifications are in the top right
corner of the plots. Colors used to indicate low performing stratifi-
cations.
Manually curating a subset of FPs and FNs ensures they are errors in
the query set and helps determine biases responsible for the error.
Figure 4. Performance
metrics for high-level
stratifications. Colored
circles indicate the
observed performance
for the three genomes.
The gray bars repre-
sent the 95% highest
density intervals (HDI)
and black crossbars
the estimated mean
calculated using hap-
pyR (https://illumina.
github.io/happyR/).
1- Precision and 1-Recall scatter plots help
identify of poor performing stratifications.
For better separation plot 1 minus Metric on a log10
scale.
Confidence intervals aid stratification comparisons by accounting
for stratification size differences.
Structural variant (SV) benchmark set developed using variant calls and as-
semblies from multiple technologies (Fig. 8, Zook et al. 2019).
• Methods for structural variant benchmarking
• TRUVARI (https://github.com/spiralgenetics/truvari)
• SVanalyzer (https://svanalyzer.readthedocs.io/en/latest/)
Figure 8. SV Benchmark set development process. Theobjectiveisthat,when
comparedtonewcallset,mostFPandFNareerrorsinthecallset,notthebenchmark.
Structural Variant Benchmarking
New stratifications enabling
benchmarking of GRCh38 variant
calls.
Validating GRCh38 stratifications
by comparing GRCh37 and GRCh38
stratification genome coverage (Fig.
8), region size distributions (Fig. 9),
and benchmarking results.
GRCh38 Stratifications
Figure 9. Region size distribution comparison
between GRCh37 and GRCh38 functional region
stratifications.
Figure 9. Ratio of bases covered by the GRCh37andGRCh38functionalregionstratifications.
We extended the small variant benchmark set to include more challenging
regions such as low complexity and segmental duplications.
• We extended the benchmark set by using long- and linked-read data to char-
acterize low complexity regions and redefined filtering parameters for more
precise method specific filtering.
• The V4 benchmark set covers ~93.2% of the reference genome, a 5.4% in-
crease over V3.3.2 with GRCh37, including more SNPs, Indels, and bases in
segmental duplications (Fig . 7).
GIAB Small Variant V4 Benchmark Set
Subset v3.3.2 FNs v4 FNs
All SNPs 8,594 30,229
Low mappability 6,708 25,295
Segmental duplications 1,429 14,008
Benchmarking results are benchmark set dependent. For example, there
are ~3X more false negatives in the Illumina Real Time Genomics HG002
variant callset when compared to V4 versus V3.3.2 (Table 3).
Figure 7. Comparison of GIAB HG002 V3.3.2 and V4 beta benchmark set.
Table 3. Number of Illumina Real Time Genomes HG002 variant callset false negatives (FNs) total and in
low mappability and segmental duplications for V3.3.2 and V4.
The GIAB benchmark includes benchmark small variant calls and
regions (Fig. 1). When users compare a query variant call set to the
benchmark set differences are likely errors in the query set.
Figure 1. Small variant integration pipeline (Zook et al. 2019). Colored squares
represent different input small variant callsets. Tan hexagons indicate callset
specific parameters used to filter low confidence variants. A machine learning
outlier detection algorithm is used to identify callset specific high confidence
variants. Finally, a heuristic based arbitration algorithm is used to generate the
benchmark callset.
Outlier
Detection
Arbitration
Expert-Defined Parameters
and Heuristics
Computational Process
Legend
Input Callsets Filters High ConfidenceVariants
Benchmark Set
GIAB Small Variant Benchmark Set
• The GA4GH Benchmarking team has defined best practices for small variant
benchmarking including (Fig. 2);
• Variant comparison methods that account for differences in variant repre-
sentation (Fig. 3),
• Defining matching stringencies and performance metrics (Table 1), and
• Stratifying performance metrics by variant type and genomic context.
• The GA4GH best practices are summarized in Table 2 (Supplementary Table 1 of
Krusche et al. 2019).
• The benchmarking team implemented these best practices in the tool hap.py
(https://github.com/Illumina/hap.py), which is available as a web application on
the precisionFDA platform (https://precision.fda.gov, account required for ac-
cess, available upon request).
Small Variant Benchmarking
Figure 2. Benchmarking workflow described in Krusche et al. 2019 and imple-
mented in hap.py, https://github.com/Illumina/hap.py (Krusche et al. 2019).
Figure 3. Example al-
ternative variant rep-
resentations (Krusche
et al. 2019).
Table 1. GA4GH
match definitions
for true positives
(TP), false posi-
tives (FP), false
negatives (FN),
and matching
stringencies allele
match (AL) and
genotype match
(GT) (Krusche et
al. 2019).
Table 2. GA4GH germline variant call benchmarking best
practices (Krusche et al. 2019).
Started in 2012, over the last seven years the Genome in a Bottle
Consortium has developed and characterized the first set of human
whole genome reference materials. GIAB has developed small vari-
ant benchmark sets for 7 genomes (HG001 - HG007), NA12878 and
two trios from the Personal Genome Project. Approximately 90% of
non-gapped bases, primarily easy to characterize regions, of the ge-
nome are covered by the small variant benchmark sets.
2012 GIAB Consortium formed – no human genome Reference Materials
2014 Small variant genotypes for ~77% of pilot genome NA12878
2015 NIST releases first human genome Reference Material
2016 Small variants for 90% of 7 genomes for GRCh37/38
2018 Draft SV Benchmark
2019+ Characterizing difficult variants and regions
Genome In A Bottle
Integrating variant calls across laboratories and sequencing methods requires a comprehensive
understanding of false positive and false negative rates for different types of variants in diverse ge-
nome contexts. Similarly, the clinical use of genomics requires validation of genome sequencing
and variant calling methods. Clinical laboratories can validate their sequencing and variant calling
methods using the GA4GH benchmarking tool and the NIST human genome reference materials de-
veloped with the Genome In A Bottle (GIAB) Consortium. The GIAB reference materials are author-
itatively characterized for differences to the human reference genome and provided as benchmark
sets enabling variant call benchmarking. The benchmark sets include benchmark calls, well-char-
acterized sequence differences from the reference genome, and benchmark regions, positions that
agree with the reference genome or a benchmark call. The benchmarking tool compares a user-pro-
vided variant callset to the GIAB benchmark set (benchmark variant calls and benchmark regions) in
a sophisticated manner that accounts for variant representation differences and disagreement types.
The tool calculates performance metrics stratified by genomic context from the comparison results.
Stratified performance metrics allow users to critically evaluate method performance. However, the
benchmarking tool’s capabilities are limited due to; a lack of resources for benchmarking against
GRCh38, difficult to interpret output, and a limited challenging variants in the GIAB benchmark sets.
To address these challenges we are developing resourcesforbenchmarkingagainstGRCh38anda sum-
mary report to simplify interpretation of the benchmarking results. We are also expanding the GIAB
characterization genomic coverage to include more challenging regions such as low complexity re-
gions and segmental duplications. Moreover, the GIAB consortium is developing benchmark sets
and benchmarking methods for validating structural variants, allowing for high confidence structur-
al variant detection. Expanding the GIAB genome characterization and improving the interoperabili-
ty of the GA4GH benchmarking tool will inevitably help clinical genomics reach its full potential.
Abstract
Assuring Data Quality with Variant Benchmarking tools from GIAB and GA4GH
N. D. Olson1
, J. McDaniel1
,J. Wagner1
, J. M. Zook1
, the GA4GH Benchmarking Team, and the Genome In A Bottle Consortium
1
Biosystems and Biomaterials Division, Materials Measurement Laboratory, NIST
Benchmarking Results Interpretation
• Developing a benchmarking report document will guide the identifica-
tion of challenging genomic context and variant types.
• Development of a new stratification approach accounting for genome
context interactions and data driven stratification definitions.
GIAB Small Variant V4 Benchmark Set
• Currently released as a draft, with official release later this year for HG002.
• V4 benchmark sets for the other 6 GIAB genomes is expected next year.
GRCh38 Stratifications
• Currently undergoing validation and will release soon.
Structural Variant Benchmarking
• Working on a new structural variant benchmark set utilizing recent se-
quencing technology advances, including PacBio HiFi sequencing and Ox-
ford Nanopore ultra-long reads.
• Integration pipeline development is underway enabling the transparent
and streamlined generation of new benchmark sets for all 7 GIAB ge-
nomes against both GRCh37 and GRCh38 reference genomes.
Somatic variant and diploid assembly benchmarking.
Future Work
We would like to thank our collaborators on the GA4GH Benchmarking Team and Genome in a Bottle Con-
sortium.
Opinions expressed in this paper are the authors’and do not necessarily reflect the policies and views of
NIST, or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this
poster only to specify the experimental procedure adequately. Such identification is not intended to imply
recommendation or endorsement by the NIST, nor is it intended to imply that the materials or equipment
identified are necessarily the best available for the purpose.
Official contribution of NIST; not subject to copyrights in USA.
Acknowledgments
• Krusche, Peter, et al.“Best practices for benchmarking germline small-variant
calls in human genomes.” Nature biotechnology 37.5 (2019): 555.
• Zook, Justin M., et al.“An open resource for accurately benchmarking small
variant and reference calls.” Nature biotechnology 37.5 (2019): 561.
• Zook, Justin M., et al.“A robust benchmark for germline structural variant de-
tection.” BioRxiv (2019): 664623.
References

Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH

Similaire à Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH (20)

Plus de GenomeInABottle

Plus de GenomeInABottle (11)

Dernier

Dernier (20)

Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH