SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
© 2016 Illumina, Inc. All rights reserved.
Structural variant validation using
population data
Peter Krusche
Genome in a Bottle workshop – January 2018
For Research Use Only. Not for use in Diagnostic Procedures.
2
Community resources and data we use for testing
● Platinum Genomes – WGS data for Platinum Genomes pedigree
- 6 samples available on ENA (HiSeq2000 2x100bp and soon 10X, HiSeqX & NovaSeq)
- 11 samples available on EGA soon (10X, HiSeqX & NovaSeq)
- 17 samples available on dbGaP (HiSeq2000 2x100bp)
- https://github.com/Illumina/PlatinumGenomes
● Polaris – WGS data for a larger cohort
- 150 1kGP samples available on ENA (HiSeqX 2x150bp)
- 51 1kGP samples to complete trios on above data soon (HiSeqX 2x150bp)
- 70 samples available on ENA (HiSeqX 2x150bp and soon 10X)
- Insertion/deletion variant calls validated with population-statistics
- https://github.com/illumina/polaris
● Paragraph – graph-realigner for SV breakpoints
- Our targeted validation tools: https://github.com/illumina/paragraph
For Research Use Only. Not for use in Diagnostic Procedures.
3
● Given a putative SV, we can genotype in samples using targeted software
● Start with >1,000 unrelated samples for hypothesis-based testing
- Population datasets let us look at most variants rather than just those in NA12877 & NA12878
- Additionally genotype the variants in the 220 unrelated samples, 51 trios and the Platinum Genomes
● Validate the calls:
- Populations level metrics such as HWE
- Mendelian consistency in the Platinum Genomes and Trios
● Sources of the SVs can come from
- Aggregated calls within any sample
- Other projects (e.g. GiaB)
- We share information on variants that are common /
observable in publicly available datasets.
How we validate structural variants: targeted joint calling
For Research Use Only. Not for use in Diagnostic Procedures.
4
Validation of GiaB SV candidates using paragraph
For Research Use Only. Not for use in Diagnostic Procedures.
5
Validation of GiaB SV candidates using paragraph
Event Type Count
Bi-allelic 6232 (65%)
HWE-P > 0.05 3614 (58%)
Validation Summary
Contains duplicates
w. different representations
For Research Use Only. Not for use in Diagnostic Procedures.
6
● 738 variants overlap between
Polaris set and GiaB test set
● Over 70% of the overlapping
variants have different
descriptions, but most of them fail
HWE in one or two call sets, or are
likely STRs
● ~60 SVs have different descriptions
in Polaris and GiaB, but they both
pass the HWE test
These provide test cases for how to better-
validate the calls – i.e. we want to validate
both the variant and the representation
Comparing GiaB (Ashkenazi trio) with the Polaris callset
For Research Use Only. Not for use in Diagnostic Procedures.
7
Working to improve representation with joint mapping
● For each variant, we remap reads a graph consisting of the reference and the two alternative
paths (as defined by Polaris set and GiaB).
● The path with more uniquely mapped reads is more likely to be the better one.
#MappedtoPolaris
# Mapped to GiaB
For Research Use Only. Not for use in Diagnostic Procedures.
8
● In Ashkenazi, the event is described as a
swap, while in Polaris it is a pure deletion.
● More reads are uniquely mapped to the
Ashkenazi description than the Polaris
one.
Example: reads supporting a GiaB representation
Mummerplot of alternative allele sequences
between the two descriptions
Presence of short
insertion
REF + GiaB
REF + Polaris
For Research Use Only. Not for use in Diagnostic Procedures.
9
● More reads are uniquely mapped
to the Polaris description.
● Small insertions were observed in
both representations indicating
that neither is fully correct.
Example: SV with “better” description in Polaris than GiaB
Mummerplot of alternative allele sequences
between the two descriptions
REF + GiaB
REF + Polaris
Presence of short
insertions highlight need
for improvement
For Research Use Only. Not for use in Diagnostic Procedures.
10
Future plans
● Run graph-realignment and validation genome-wide
- Our tools have gotten faster, we can now run on more samples + on all events.
- We will share the results + genotypes on Polaris samples and PG.
● Improve our targeted validation tools
https://github.com/illumina/paragraph
● Make graph visualisation for paragraph publicly available
- Based on https://github.com/vgteam/sequencetubemap, extended to use inputs from the
paragraph tool.
● Share more data for our population datasets.
https://github.com/illumina/polaris
For Research Use Only. Not for use in Diagnostic Procedures.
11
● Mike Eberle
● Egor Dolzhenko
● Sai Chen
● Mitchell Bekritsky
● Subramanian S Ajay
● Vani Rajan
● Sean Humphray
● Ryan J Taft
● David R Bentley
Thank you! Any questions?
● Justin Zook
For Research Use Only. Not for use in Diagnostic Procedures.

Contenu connexe

Similaire à Peter krusche population based targeted validation of structural variant breakpoints and st rs

Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation GuidelinesIntroducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Golden Helix
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Golden Helix
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012
GenomeInABottle
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
GenomeInABottle
 

Similaire à Peter krusche population based targeted validation of structural variant breakpoints and st rs (20)

Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
CNV, GWAS & Clinical Analysis Advancements in SVS
CNV, GWAS & Clinical Analysis Advancements in SVSCNV, GWAS & Clinical Analysis Advancements in SVS
CNV, GWAS & Clinical Analysis Advancements in SVS
 
Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation GuidelinesIntroducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
 
Giab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summaryGiab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summary
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
New Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVSNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVS
 
CS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesCS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databases
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
Applying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation ExperimentsApplying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation Experiments
 
Getting More from GWAS
Getting More from GWASGetting More from GWAS
Getting More from GWAS
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 

Plus de GenomeInABottle

Plus de GenomeInABottle (20)

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 

Dernier

Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
adilkhan87451
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Hosur Just Call 9630942363 Top Class Call Girl Service Available
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 

Peter krusche population based targeted validation of structural variant breakpoints and st rs

  • 1. © 2016 Illumina, Inc. All rights reserved. Structural variant validation using population data Peter Krusche Genome in a Bottle workshop – January 2018 For Research Use Only. Not for use in Diagnostic Procedures.
  • 2. 2 Community resources and data we use for testing ● Platinum Genomes – WGS data for Platinum Genomes pedigree - 6 samples available on ENA (HiSeq2000 2x100bp and soon 10X, HiSeqX & NovaSeq) - 11 samples available on EGA soon (10X, HiSeqX & NovaSeq) - 17 samples available on dbGaP (HiSeq2000 2x100bp) - https://github.com/Illumina/PlatinumGenomes ● Polaris – WGS data for a larger cohort - 150 1kGP samples available on ENA (HiSeqX 2x150bp) - 51 1kGP samples to complete trios on above data soon (HiSeqX 2x150bp) - 70 samples available on ENA (HiSeqX 2x150bp and soon 10X) - Insertion/deletion variant calls validated with population-statistics - https://github.com/illumina/polaris ● Paragraph – graph-realigner for SV breakpoints - Our targeted validation tools: https://github.com/illumina/paragraph For Research Use Only. Not for use in Diagnostic Procedures.
  • 3. 3 ● Given a putative SV, we can genotype in samples using targeted software ● Start with >1,000 unrelated samples for hypothesis-based testing - Population datasets let us look at most variants rather than just those in NA12877 & NA12878 - Additionally genotype the variants in the 220 unrelated samples, 51 trios and the Platinum Genomes ● Validate the calls: - Populations level metrics such as HWE - Mendelian consistency in the Platinum Genomes and Trios ● Sources of the SVs can come from - Aggregated calls within any sample - Other projects (e.g. GiaB) - We share information on variants that are common / observable in publicly available datasets. How we validate structural variants: targeted joint calling For Research Use Only. Not for use in Diagnostic Procedures.
  • 4. 4 Validation of GiaB SV candidates using paragraph For Research Use Only. Not for use in Diagnostic Procedures.
  • 5. 5 Validation of GiaB SV candidates using paragraph Event Type Count Bi-allelic 6232 (65%) HWE-P > 0.05 3614 (58%) Validation Summary Contains duplicates w. different representations For Research Use Only. Not for use in Diagnostic Procedures.
  • 6. 6 ● 738 variants overlap between Polaris set and GiaB test set ● Over 70% of the overlapping variants have different descriptions, but most of them fail HWE in one or two call sets, or are likely STRs ● ~60 SVs have different descriptions in Polaris and GiaB, but they both pass the HWE test These provide test cases for how to better- validate the calls – i.e. we want to validate both the variant and the representation Comparing GiaB (Ashkenazi trio) with the Polaris callset For Research Use Only. Not for use in Diagnostic Procedures.
  • 7. 7 Working to improve representation with joint mapping ● For each variant, we remap reads a graph consisting of the reference and the two alternative paths (as defined by Polaris set and GiaB). ● The path with more uniquely mapped reads is more likely to be the better one. #MappedtoPolaris # Mapped to GiaB For Research Use Only. Not for use in Diagnostic Procedures.
  • 8. 8 ● In Ashkenazi, the event is described as a swap, while in Polaris it is a pure deletion. ● More reads are uniquely mapped to the Ashkenazi description than the Polaris one. Example: reads supporting a GiaB representation Mummerplot of alternative allele sequences between the two descriptions Presence of short insertion REF + GiaB REF + Polaris For Research Use Only. Not for use in Diagnostic Procedures.
  • 9. 9 ● More reads are uniquely mapped to the Polaris description. ● Small insertions were observed in both representations indicating that neither is fully correct. Example: SV with “better” description in Polaris than GiaB Mummerplot of alternative allele sequences between the two descriptions REF + GiaB REF + Polaris Presence of short insertions highlight need for improvement For Research Use Only. Not for use in Diagnostic Procedures.
  • 10. 10 Future plans ● Run graph-realignment and validation genome-wide - Our tools have gotten faster, we can now run on more samples + on all events. - We will share the results + genotypes on Polaris samples and PG. ● Improve our targeted validation tools https://github.com/illumina/paragraph ● Make graph visualisation for paragraph publicly available - Based on https://github.com/vgteam/sequencetubemap, extended to use inputs from the paragraph tool. ● Share more data for our population datasets. https://github.com/illumina/polaris For Research Use Only. Not for use in Diagnostic Procedures.
  • 11. 11 ● Mike Eberle ● Egor Dolzhenko ● Sai Chen ● Mitchell Bekritsky ● Subramanian S Ajay ● Vani Rajan ● Sean Humphray ● Ryan J Taft ● David R Bentley Thank you! Any questions? ● Justin Zook For Research Use Only. Not for use in Diagnostic Procedures.