SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
Benchmarking results for the GIAB Ashkenazim Jewish Trio demon-
strating our recommendations for visualizing benchmarking results.
See Zook et al. 2019 for methods used to generate the dataset.
Figure 6. Example IGV Session for Manual Curation.
Interpreting Small Variant Benchmarking Results
Figure 5. Precision-Recall scatter plots. Points represent different
stratifications. Poor performing stratifications are in the top right
corner of the plots. Colors used to indicate low performing stratifi-
cations.
Manually curating a subset of FPs and FNs ensures they are errors in
the query set and helps determine biases responsible for the error.
Figure 4. Performance
metrics for high-level
stratifications. Colored
circles indicate the
observed performance
for the three genomes.
The gray bars repre-
sent the 95% highest
density intervals (HDI)
and black crossbars
the estimated mean
calculated using hap-
pyR (https://illumina.
github.io/happyR/).
1- Precision and 1-Recall scatter plots help
identify of poor performing stratifications.
For better separation plot 1 minus Metric on a log10
scale.
Confidence intervals aid stratification comparisons by accounting
for stratification size differences.
Structural variant (SV) benchmark set developed using variant calls and as-
semblies from multiple technologies (Fig. 8, Zook et al. 2019).
•	 Methods for structural variant benchmarking
•	 TRUVARI (https://github.com/spiralgenetics/truvari)
•	 SVanalyzer (https://svanalyzer.readthedocs.io/en/latest/)
Figure 8. SV Benchmark set development process. Theobjectiveisthat,when
comparedtonewcallset,mostFPandFNareerrorsinthecallset,notthebenchmark.
Structural Variant Benchmarking
New stratifications enabling
benchmarking of GRCh38 variant
calls.
Validating GRCh38 stratifications
by comparing GRCh37 and GRCh38
stratification genome coverage (Fig.
8), region size distributions (Fig. 9),
and benchmarking results.
GRCh38 Stratifications
Figure 9. Region size distribution comparison
between GRCh37 and GRCh38 functional region
stratifications.
Figure 9. Ratio of bases covered by the GRCh37andGRCh38functionalregionstratifications.
We extended the small variant benchmark set to include more challenging
regions such as low complexity and segmental duplications.
•	 We extended the benchmark set by using long- and linked-read data to char-
acterize low complexity regions and redefined filtering parameters for more
precise method specific filtering.
•	 The V4 benchmark set covers ~93.2% of the reference genome, a 5.4% in-
crease over V3.3.2 with GRCh37, including more SNPs, Indels, and bases in
segmental duplications (Fig . 7).
GIAB Small Variant V4 Benchmark Set
Subset v3.3.2 FNs v4 FNs
All SNPs 8,594 30,229
Low mappability 6,708 25,295
Segmental duplications 1,429 14,008
Benchmarking results are benchmark set dependent. For example, there
are ~3X more false negatives in the Illumina Real Time Genomics HG002
variant callset when compared to V4 versus V3.3.2 (Table 3).
Figure 7. Comparison of GIAB HG002 V3.3.2 and V4 beta benchmark set.
Table 3. Number of Illumina Real Time Genomes HG002 variant callset false negatives (FNs) total and in
low mappability and segmental duplications for V3.3.2 and V4.
The GIAB benchmark includes benchmark small variant calls and
regions (Fig. 1). When users compare a query variant call set to the
benchmark set differences are likely errors in the query set.
Figure 1. Small variant integration pipeline (Zook et al. 2019). Colored squares
represent different input small variant callsets. Tan hexagons indicate callset
specific parameters used to filter low confidence variants. A machine learning
outlier detection algorithm is used to identify callset specific high confidence
variants. Finally, a heuristic based arbitration algorithm is used to generate the
benchmark callset.
Outlier
Detection
Arbitration
Expert-Defined Parameters
and Heuristics
Computational Process
Legend
Input Callsets Filters High ConfidenceVariants
Benchmark Set
GIAB Small Variant Benchmark Set
•	 The GA4GH Benchmarking team has defined best practices for small variant
benchmarking including (Fig. 2);
•	 Variant comparison methods that account for differences in variant repre-
sentation (Fig. 3),
•	 Defining matching stringencies and performance metrics (Table 1), and
•	 Stratifying performance metrics by variant type and genomic context.
•	 The GA4GH best practices are summarized in Table 2 (Supplementary Table 1 of
Krusche et al. 2019).
•	 The benchmarking team implemented these best practices in the tool hap.py
(https://github.com/Illumina/hap.py), which is available as a web application on
the precisionFDA platform (https://precision.fda.gov, account required for ac-
cess, available upon request).
Small Variant Benchmarking
Figure 2. Benchmarking workflow described in Krusche et al. 2019 and imple-
mented in hap.py, https://github.com/Illumina/hap.py (Krusche et al. 2019).
Figure 3. Example al-
ternative variant rep-
resentations (Krusche
et al. 2019).
Table 1. GA4GH
match definitions
for true positives
(TP), false posi-
tives (FP), false
negatives (FN),
and matching
stringencies allele
match (AL) and
genotype match
(GT) (Krusche et
al. 2019).
Table 2. GA4GH germline variant call benchmarking best
practices (Krusche et al. 2019).
Started in 2012, over the last seven years the Genome in a Bottle
Consortium has developed and characterized the first set of human
whole genome reference materials. GIAB has developed small vari-
ant benchmark sets for 7 genomes (HG001 - HG007), NA12878 and
two trios from the Personal Genome Project. Approximately 90% of
non-gapped bases, primarily easy to characterize regions, of the ge-
nome are covered by the small variant benchmark sets.
2012 GIAB Consortium formed – no human genome Reference Materials
2014 Small variant genotypes for ~77% of pilot genome NA12878
2015 NIST releases first human genome Reference Material
2016 Small variants for 90% of 7 genomes for GRCh37/38
2018 Draft SV Benchmark
2019+ Characterizing difficult variants and regions
Genome In A Bottle
Integrating variant calls across laboratories and sequencing methods requires a comprehensive
understanding of false positive and false negative rates for different types of variants in diverse ge-
nome contexts. Similarly, the clinical use of genomics requires validation of genome sequencing
and variant calling methods. Clinical laboratories can validate their sequencing and variant calling
methods using the GA4GH benchmarking tool and the NIST human genome reference materials de-
veloped with the Genome In A Bottle (GIAB) Consortium. The GIAB reference materials are author-
itatively characterized for differences to the human reference genome and provided as benchmark
sets enabling variant call benchmarking. The benchmark sets include benchmark calls, well-char-
acterized sequence differences from the reference genome, and benchmark regions, positions that
agree with the reference genome or a benchmark call. The benchmarking tool compares a user-pro-
vided variant callset to the GIAB benchmark set (benchmark variant calls and benchmark regions) in
a sophisticated manner that accounts for variant representation differences and disagreement types.
The tool calculates performance metrics stratified by genomic context from the comparison results.
Stratified performance metrics allow users to critically evaluate method performance. However, the
benchmarking tool’s capabilities are limited due to; a lack of resources for benchmarking against
GRCh38, difficult to interpret output, and a limited challenging variants in the GIAB benchmark sets.
To address these challenges we are developing resourcesforbenchmarkingagainstGRCh38anda sum-
mary report to simplify interpretation of the benchmarking results. We are also expanding the GIAB
characterization genomic coverage to include more challenging regions such as low complexity re-
gions and segmental duplications. Moreover, the GIAB consortium is developing benchmark sets
and benchmarking methods for validating structural variants, allowing for high confidence structur-
al variant detection. Expanding the GIAB genome characterization and improving the interoperabili-
ty of the GA4GH benchmarking tool will inevitably help clinical genomics reach its full potential.
Abstract
Assuring Data Quality with Variant Benchmarking tools from GIAB and GA4GH
N. D. Olson1
, J. McDaniel1
,J. Wagner1
, J. M. Zook1
, the GA4GH Benchmarking Team, and the Genome In A Bottle Consortium
1
Biosystems and Biomaterials Division, Materials Measurement Laboratory, NIST
Benchmarking Results Interpretation
•	 Developing a benchmarking report document will guide the identifica-
tion of challenging genomic context and variant types.
•	 Development of a new stratification approach accounting for genome
context interactions and data driven stratification definitions.
GIAB Small Variant V4 Benchmark Set
•	 Currently released as a draft, with official release later this year for HG002.
•	 V4 benchmark sets for the other 6 GIAB genomes is expected next year.
GRCh38 Stratifications
•	 Currently undergoing validation and will release soon.
Structural Variant Benchmarking
•	 Working on a new structural variant benchmark set utilizing recent se-
quencing technology advances, including PacBio HiFi sequencing and Ox-
ford Nanopore ultra-long reads.
•	 Integration pipeline development is underway enabling the transparent
and streamlined generation of new benchmark sets for all 7 GIAB ge-
nomes against both GRCh37 and GRCh38 reference genomes.
Somatic variant and diploid assembly benchmarking.
Future Work
We would like to thank our collaborators on the GA4GH Benchmarking Team and Genome in a Bottle Con-
sortium.
Opinions expressed in this paper are the authors’and do not necessarily reflect the policies and views of
NIST, or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this
poster only to specify the experimental procedure adequately. Such identification is not intended to imply
recommendation or endorsement by the NIST, nor is it intended to imply that the materials or equipment
identified are necessarily the best available for the purpose.
Official contribution of NIST; not subject to copyrights in USA.
Acknowledgments
•	 Krusche, Peter, et al.“Best practices for benchmarking germline small-variant
calls in human genomes.” Nature biotechnology 37.5 (2019): 555.
•	 Zook, Justin M., et al.“An open resource for accurately benchmarking small
variant and reference calls.” Nature biotechnology 37.5 (2019): 561.
•	 Zook, Justin M., et al.“A robust benchmark for germline structural variant de-
tection.” BioRxiv (2019): 664623.
References

Contenu connexe

Tendances

GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphsGenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normalGenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justinGenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomesGenomeInABottle
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethionGenomeInABottle
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justinGenomeInABottle
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giabGenomeInABottle
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGenomeInABottle
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_posterGenomeInABottle
 

Tendances (20)

GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 

Similaire à Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH

Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxPABOLU TEJASREE
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHRafael C. Jimenez
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...ijseajournal
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821GenomeInABottle
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron Locker, MPH
 
Final Report
Final ReportFinal Report
Final Reportimu409
 
Classification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s EntropyClassification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s EntropyIJERA Editor
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret languagejaved khan
 
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...IJECEIAES
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 

Similaire à Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH (20)

Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
 
Final Report
Final ReportFinal Report
Final Report
 
Classification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s EntropyClassification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s Entropy
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
 
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Kk341721880
Kk341721880Kk341721880
Kk341721880
 

Plus de GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccsGenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seqGenomeInABottle
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanoporeGenomeInABottle
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samplesGenomeInABottle
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortiumGenomeInABottle
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...GenomeInABottle
 
How giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionHow giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionGenomeInABottle
 

Plus de GenomeInABottle (11)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
 
How giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionHow giab fits in the rest of the world introduction
How giab fits in the rest of the world introduction
 

Dernier

Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiLow Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiSuhani Kapoor
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...indiancallgirl4rent
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...Neha Kaur
 
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Call Girls in Nagpur High Profile
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomdiscovermytutordmt
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...chandars293
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...aartirawatdelhi
 
Chandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableChandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableDipal Arora
 
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsBangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsGfnyt
 

Dernier (20)

Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiLow Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
 
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Chandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableChandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD available
 
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsBangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
 

Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH

  • 1. Benchmarking results for the GIAB Ashkenazim Jewish Trio demon- strating our recommendations for visualizing benchmarking results. See Zook et al. 2019 for methods used to generate the dataset. Figure 6. Example IGV Session for Manual Curation. Interpreting Small Variant Benchmarking Results Figure 5. Precision-Recall scatter plots. Points represent different stratifications. Poor performing stratifications are in the top right corner of the plots. Colors used to indicate low performing stratifi- cations. Manually curating a subset of FPs and FNs ensures they are errors in the query set and helps determine biases responsible for the error. Figure 4. Performance metrics for high-level stratifications. Colored circles indicate the observed performance for the three genomes. The gray bars repre- sent the 95% highest density intervals (HDI) and black crossbars the estimated mean calculated using hap- pyR (https://illumina. github.io/happyR/). 1- Precision and 1-Recall scatter plots help identify of poor performing stratifications. For better separation plot 1 minus Metric on a log10 scale. Confidence intervals aid stratification comparisons by accounting for stratification size differences. Structural variant (SV) benchmark set developed using variant calls and as- semblies from multiple technologies (Fig. 8, Zook et al. 2019). • Methods for structural variant benchmarking • TRUVARI (https://github.com/spiralgenetics/truvari) • SVanalyzer (https://svanalyzer.readthedocs.io/en/latest/) Figure 8. SV Benchmark set development process. Theobjectiveisthat,when comparedtonewcallset,mostFPandFNareerrorsinthecallset,notthebenchmark. Structural Variant Benchmarking New stratifications enabling benchmarking of GRCh38 variant calls. Validating GRCh38 stratifications by comparing GRCh37 and GRCh38 stratification genome coverage (Fig. 8), region size distributions (Fig. 9), and benchmarking results. GRCh38 Stratifications Figure 9. Region size distribution comparison between GRCh37 and GRCh38 functional region stratifications. Figure 9. Ratio of bases covered by the GRCh37andGRCh38functionalregionstratifications. We extended the small variant benchmark set to include more challenging regions such as low complexity and segmental duplications. • We extended the benchmark set by using long- and linked-read data to char- acterize low complexity regions and redefined filtering parameters for more precise method specific filtering. • The V4 benchmark set covers ~93.2% of the reference genome, a 5.4% in- crease over V3.3.2 with GRCh37, including more SNPs, Indels, and bases in segmental duplications (Fig . 7). GIAB Small Variant V4 Benchmark Set Subset v3.3.2 FNs v4 FNs All SNPs 8,594 30,229 Low mappability 6,708 25,295 Segmental duplications 1,429 14,008 Benchmarking results are benchmark set dependent. For example, there are ~3X more false negatives in the Illumina Real Time Genomics HG002 variant callset when compared to V4 versus V3.3.2 (Table 3). Figure 7. Comparison of GIAB HG002 V3.3.2 and V4 beta benchmark set. Table 3. Number of Illumina Real Time Genomes HG002 variant callset false negatives (FNs) total and in low mappability and segmental duplications for V3.3.2 and V4. The GIAB benchmark includes benchmark small variant calls and regions (Fig. 1). When users compare a query variant call set to the benchmark set differences are likely errors in the query set. Figure 1. Small variant integration pipeline (Zook et al. 2019). Colored squares represent different input small variant callsets. Tan hexagons indicate callset specific parameters used to filter low confidence variants. A machine learning outlier detection algorithm is used to identify callset specific high confidence variants. Finally, a heuristic based arbitration algorithm is used to generate the benchmark callset. Outlier Detection Arbitration Expert-Defined Parameters and Heuristics Computational Process Legend Input Callsets Filters High ConfidenceVariants Benchmark Set GIAB Small Variant Benchmark Set • The GA4GH Benchmarking team has defined best practices for small variant benchmarking including (Fig. 2); • Variant comparison methods that account for differences in variant repre- sentation (Fig. 3), • Defining matching stringencies and performance metrics (Table 1), and • Stratifying performance metrics by variant type and genomic context. • The GA4GH best practices are summarized in Table 2 (Supplementary Table 1 of Krusche et al. 2019). • The benchmarking team implemented these best practices in the tool hap.py (https://github.com/Illumina/hap.py), which is available as a web application on the precisionFDA platform (https://precision.fda.gov, account required for ac- cess, available upon request). Small Variant Benchmarking Figure 2. Benchmarking workflow described in Krusche et al. 2019 and imple- mented in hap.py, https://github.com/Illumina/hap.py (Krusche et al. 2019). Figure 3. Example al- ternative variant rep- resentations (Krusche et al. 2019). Table 1. GA4GH match definitions for true positives (TP), false posi- tives (FP), false negatives (FN), and matching stringencies allele match (AL) and genotype match (GT) (Krusche et al. 2019). Table 2. GA4GH germline variant call benchmarking best practices (Krusche et al. 2019). Started in 2012, over the last seven years the Genome in a Bottle Consortium has developed and characterized the first set of human whole genome reference materials. GIAB has developed small vari- ant benchmark sets for 7 genomes (HG001 - HG007), NA12878 and two trios from the Personal Genome Project. Approximately 90% of non-gapped bases, primarily easy to characterize regions, of the ge- nome are covered by the small variant benchmark sets. 2012 GIAB Consortium formed – no human genome Reference Materials 2014 Small variant genotypes for ~77% of pilot genome NA12878 2015 NIST releases first human genome Reference Material 2016 Small variants for 90% of 7 genomes for GRCh37/38 2018 Draft SV Benchmark 2019+ Characterizing difficult variants and regions Genome In A Bottle Integrating variant calls across laboratories and sequencing methods requires a comprehensive understanding of false positive and false negative rates for different types of variants in diverse ge- nome contexts. Similarly, the clinical use of genomics requires validation of genome sequencing and variant calling methods. Clinical laboratories can validate their sequencing and variant calling methods using the GA4GH benchmarking tool and the NIST human genome reference materials de- veloped with the Genome In A Bottle (GIAB) Consortium. The GIAB reference materials are author- itatively characterized for differences to the human reference genome and provided as benchmark sets enabling variant call benchmarking. The benchmark sets include benchmark calls, well-char- acterized sequence differences from the reference genome, and benchmark regions, positions that agree with the reference genome or a benchmark call. The benchmarking tool compares a user-pro- vided variant callset to the GIAB benchmark set (benchmark variant calls and benchmark regions) in a sophisticated manner that accounts for variant representation differences and disagreement types. The tool calculates performance metrics stratified by genomic context from the comparison results. Stratified performance metrics allow users to critically evaluate method performance. However, the benchmarking tool’s capabilities are limited due to; a lack of resources for benchmarking against GRCh38, difficult to interpret output, and a limited challenging variants in the GIAB benchmark sets. To address these challenges we are developing resourcesforbenchmarkingagainstGRCh38anda sum- mary report to simplify interpretation of the benchmarking results. We are also expanding the GIAB characterization genomic coverage to include more challenging regions such as low complexity re- gions and segmental duplications. Moreover, the GIAB consortium is developing benchmark sets and benchmarking methods for validating structural variants, allowing for high confidence structur- al variant detection. Expanding the GIAB genome characterization and improving the interoperabili- ty of the GA4GH benchmarking tool will inevitably help clinical genomics reach its full potential. Abstract Assuring Data Quality with Variant Benchmarking tools from GIAB and GA4GH N. D. Olson1 , J. McDaniel1 ,J. Wagner1 , J. M. Zook1 , the GA4GH Benchmarking Team, and the Genome In A Bottle Consortium 1 Biosystems and Biomaterials Division, Materials Measurement Laboratory, NIST Benchmarking Results Interpretation • Developing a benchmarking report document will guide the identifica- tion of challenging genomic context and variant types. • Development of a new stratification approach accounting for genome context interactions and data driven stratification definitions. GIAB Small Variant V4 Benchmark Set • Currently released as a draft, with official release later this year for HG002. • V4 benchmark sets for the other 6 GIAB genomes is expected next year. GRCh38 Stratifications • Currently undergoing validation and will release soon. Structural Variant Benchmarking • Working on a new structural variant benchmark set utilizing recent se- quencing technology advances, including PacBio HiFi sequencing and Ox- ford Nanopore ultra-long reads. • Integration pipeline development is underway enabling the transparent and streamlined generation of new benchmark sets for all 7 GIAB ge- nomes against both GRCh37 and GRCh38 reference genomes. Somatic variant and diploid assembly benchmarking. Future Work We would like to thank our collaborators on the GA4GH Benchmarking Team and Genome in a Bottle Con- sortium. Opinions expressed in this paper are the authors’and do not necessarily reflect the policies and views of NIST, or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this poster only to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose. Official contribution of NIST; not subject to copyrights in USA. Acknowledgments • Krusche, Peter, et al.“Best practices for benchmarking germline small-variant calls in human genomes.” Nature biotechnology 37.5 (2019): 555. • Zook, Justin M., et al.“An open resource for accurately benchmarking small variant and reference calls.” Nature biotechnology 37.5 (2019): 561. • Zook, Justin M., et al.“A robust benchmark for germline structural variant de- tection.” BioRxiv (2019): 664623. References