SlideShare une entreprise Scribd logo
1  sur  14
© 2010 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro,
GenomeStudio, Genetic Energy, HiSeq, and HiScan are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
Platinum Genomes:
Identifying variants
using a large
pedigree
Michael A. Eberle
GIAB August, 2013
2
Platinum Genome project: Improving technology & tools
Create a catalogue of highly accurate whole-genome variant calls within a well
characterized pedigree
– SNPs, indels & CNVs
– Including highly confident reference positions
– Provide direct supporting evidence for every variant call
Develop a framework to assess variant callers
Provide a path to improve variant callers by providing a better truth data to
sensitively assess sensitivity and precision
– Modifying the SNP filters to maximize accuracy
Correct FPFN
Truth Test
3
NIST GIAB – Pedigree analysis
12889 12890 12891 12892
12877 12878
12879 12880 12881 12882 12883 12884 12885 1288712886 12888 12893
All 17 members sequenced to at least 50x depth (PCR-Free protocol)
Variants are called across the pedigree using different software & technology
Inheritance information provides high confident, direct validation of variant calls
Analysis of SNPs in
the parents and 11
children
4
Pedigree Analysis – Using haplotypes to detect conflicts
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
T
T
A
A
C
A
G
T
A
A
T
C
T
G
A
A
T
C
T
G
A
A
T
C
T
G
A
G
T
C
G
T
C
G
T
C
G
T
C
G
T
C
G
T
C
G
C
A
T
T
A
G
C
A
T
T
A
G
C
A
T
T
A
G
C
A
T
T
A
G
C
A
T
T
A
With a sufficiently large pedigree all
four possible inheritance patterns
will be observed and most of the
genotypes can be phased into
haplotypes
Parents
Children
5
Using haplotypes to detect conflicts
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
G
T
A
A
C
A
T
T
A
A
C
A
G
T
A
A
T
C
T
G
A
A
T
C
T
G
A
A
T
C
T
G
A
G
T
C
G
T
C
G
T
C
G
T
C
G
T
C
G
T
C
G
C
A
T
T
A
G
C
A
T
T
A
G
C
A
T
T
A
G
C
A
T
T
A
G
C
A
T
T
A
Individual GT accuracy is assessed
using surrounding genotype calls
across the pedigree
Genotypes are parsimoniously
phased to minimize the number of
conflicts across the pedigree
Facilitates assigning conflicts to
sample, imputation of missing data
and error correction
Error at this sample/position
Parents
Children
6
First step is to define the inheritance of the parental chromosomes to the eleven
children everywhere in the genome
– Identified 709 crossover events between the parents and eleven children
Variants called across the pedigree using multiple callers
– E.g. GATK, Cortex, Isaac & CGI for SNPs
Define accurate variants as those where the genotypes are 100% consistent
with the transmission of the parental haplotypes
– At any position of the genome there are only 16 possible combinations of genotypes
(biallelic & diploid) across the pedigree that are consistent with the inheritance pattern
– 313 (~1.6M) possible genotype combinations
Analysis of variant calls within the pedigree structure
7
Homozygous positions (GATK)
– ~2.6B positions identified as homozygous reference across the pedigree
SNPs (GATK, Cortex, Isaac & CGI)
– ~4.7M positions where SNPs agree with transmission of parental chromosomes
– >95% (4.5M) called consistent with transmission by multiple algorithms/technologies
– >98% (4.6M) with supporting evidence from other call sets (i.e. same variant called in
at least one of the samples)
Indels (GATK, Cortex & CGI)
– ~640k indels consistent with transmission of parental chromosomes
– Events range in size from 1 to 350bp
CNVs (BreakDancer & Grouper)
– ~772 CNVs - mostly deletions though a couple of duplications
– Events range from 1kb to 322kb though still refining break points
Current state
8
CNVs
9
Incorporating larger variants
SNPs and small indels work well because the genotypes are highly accurate
– A single genotyping error in any of the 13 samples will almost never be consistent
with the haplotype transmission
Developing approaches for other variants types that have lower calling accuracy
– Many CNV callers do not provide GT information
– Accuracy is too low to use pedigree-consistency
10
Incorporating CNVs into this framework
Make breakpoint calls within
each sample using
BreakDancer & Grouper
Identify regions of overlap
between samples (keeping
singletons)
Corroborate based on read
counts within the putative CNV
events
Refine to breakpoint
resolution
NA12877
NA12878
NA12879
NA12880
NA12881
NA12882
Test Regions
• Count the uniquely aligned reads within the
defined break points for the test regions for each
sample & identify events where the read counts
are consistent with a deletion or duplication
• For internally-consistent events, follow up with
targeted analysis to identify bp resolution of events
• On average ~150x depth for every event
11
AB CD CB DA CB DB DA CB CA DB CB CA DA
0
500
1000
1500
2000
ReadCounts
0
1
2
Using read counts to confirm deletions – 8.5kb deletion
Best Sol’n: A=0 ; B=1 ; C=1 ; D=1
All Samples with
haplotype A are
consistent with
haploid based on
read countsA A A A A A
Diploid
Haploid
Zero-ploid
12
Breakdown of 772 “accurate” CNVs (1kb to 322kb in size)
26640898
BreakDancerGrouper
13
Assembling breakpoints for the 772 CNVs
– Reassessing the “failed” calls where applicable
Incorporating different calling algorithms / methods
– E.g. SNP inheritance can help identify CNVs that are missed by other methods
– Including mate pair data (~2kb insert size)
Working on different methods to improve our catalogue of ~30bp to 2kb events &
incorporating different callers
Assigning error modes for “failed” SNPs
– Many look like cell line mutations & alignment errors
Comparing our call set to other datasets to assess accuracy and completeness
– Other GIAB call sets
– Fosmid data (Jaffe & Kidd)
Next steps
14
Illumina Oxford
Morten Kallberg Zamin Iqbal
Xiaoyu Chen Gil McVean
Han-Yu Chuang
Phil Tedder
Sean Humphray
Elliott Margulies
David Bentley
This data and more available at www.platinumgenomes.org
Acknowledgements

Contenu connexe

Tendances

Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
GenomeInABottle
 

Tendances (20)

Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
GIAB GRC Workshop slides
GIAB GRC Workshop slidesGIAB GRC Workshop slides
GIAB GRC Workshop slides
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 

Similaire à Aug2013 illumina platinum genomes

Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
GenomeInABottle
 

Similaire à Aug2013 illumina platinum genomes (20)

Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Making smarter choices on c dna clones
Making smarter choices on c dna clonesMaking smarter choices on c dna clones
Making smarter choices on c dna clones
 
Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
 
Aug2014 giab status update and wg charge
Aug2014 giab status update and wg chargeAug2014 giab status update and wg charge
Aug2014 giab status update and wg charge
 
Sept2016 sv nist_intro
Sept2016 sv nist_introSept2016 sv nist_intro
Sept2016 sv nist_intro
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies
 

Plus de GenomeInABottle

Plus de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Aug2013 illumina platinum genomes

  • 1. © 2010 Illumina, Inc. All rights reserved. Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro, GenomeStudio, Genetic Energy, HiSeq, and HiScan are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Platinum Genomes: Identifying variants using a large pedigree Michael A. Eberle GIAB August, 2013
  • 2. 2 Platinum Genome project: Improving technology & tools Create a catalogue of highly accurate whole-genome variant calls within a well characterized pedigree – SNPs, indels & CNVs – Including highly confident reference positions – Provide direct supporting evidence for every variant call Develop a framework to assess variant callers Provide a path to improve variant callers by providing a better truth data to sensitively assess sensitivity and precision – Modifying the SNP filters to maximize accuracy Correct FPFN Truth Test
  • 3. 3 NIST GIAB – Pedigree analysis 12889 12890 12891 12892 12877 12878 12879 12880 12881 12882 12883 12884 12885 1288712886 12888 12893 All 17 members sequenced to at least 50x depth (PCR-Free protocol) Variants are called across the pedigree using different software & technology Inheritance information provides high confident, direct validation of variant calls Analysis of SNPs in the parents and 11 children
  • 4. 4 Pedigree Analysis – Using haplotypes to detect conflicts A C A G T A A C A G T A A C A G T A A C A T T A A C A G T A A T C T G A A T C T G A A T C T G A G T C G T C G T C G T C G T C G T C G C A T T A G C A T T A G C A T T A G C A T T A G C A T T A With a sufficiently large pedigree all four possible inheritance patterns will be observed and most of the genotypes can be phased into haplotypes Parents Children
  • 5. 5 Using haplotypes to detect conflicts A C A G T A A C A G T A A C A G T A A C A T T A A C A G T A A T C T G A A T C T G A A T C T G A G T C G T C G T C G T C G T C G T C G C A T T A G C A T T A G C A T T A G C A T T A G C A T T A Individual GT accuracy is assessed using surrounding genotype calls across the pedigree Genotypes are parsimoniously phased to minimize the number of conflicts across the pedigree Facilitates assigning conflicts to sample, imputation of missing data and error correction Error at this sample/position Parents Children
  • 6. 6 First step is to define the inheritance of the parental chromosomes to the eleven children everywhere in the genome – Identified 709 crossover events between the parents and eleven children Variants called across the pedigree using multiple callers – E.g. GATK, Cortex, Isaac & CGI for SNPs Define accurate variants as those where the genotypes are 100% consistent with the transmission of the parental haplotypes – At any position of the genome there are only 16 possible combinations of genotypes (biallelic & diploid) across the pedigree that are consistent with the inheritance pattern – 313 (~1.6M) possible genotype combinations Analysis of variant calls within the pedigree structure
  • 7. 7 Homozygous positions (GATK) – ~2.6B positions identified as homozygous reference across the pedigree SNPs (GATK, Cortex, Isaac & CGI) – ~4.7M positions where SNPs agree with transmission of parental chromosomes – >95% (4.5M) called consistent with transmission by multiple algorithms/technologies – >98% (4.6M) with supporting evidence from other call sets (i.e. same variant called in at least one of the samples) Indels (GATK, Cortex & CGI) – ~640k indels consistent with transmission of parental chromosomes – Events range in size from 1 to 350bp CNVs (BreakDancer & Grouper) – ~772 CNVs - mostly deletions though a couple of duplications – Events range from 1kb to 322kb though still refining break points Current state
  • 9. 9 Incorporating larger variants SNPs and small indels work well because the genotypes are highly accurate – A single genotyping error in any of the 13 samples will almost never be consistent with the haplotype transmission Developing approaches for other variants types that have lower calling accuracy – Many CNV callers do not provide GT information – Accuracy is too low to use pedigree-consistency
  • 10. 10 Incorporating CNVs into this framework Make breakpoint calls within each sample using BreakDancer & Grouper Identify regions of overlap between samples (keeping singletons) Corroborate based on read counts within the putative CNV events Refine to breakpoint resolution NA12877 NA12878 NA12879 NA12880 NA12881 NA12882 Test Regions • Count the uniquely aligned reads within the defined break points for the test regions for each sample & identify events where the read counts are consistent with a deletion or duplication • For internally-consistent events, follow up with targeted analysis to identify bp resolution of events • On average ~150x depth for every event
  • 11. 11 AB CD CB DA CB DB DA CB CA DB CB CA DA 0 500 1000 1500 2000 ReadCounts 0 1 2 Using read counts to confirm deletions – 8.5kb deletion Best Sol’n: A=0 ; B=1 ; C=1 ; D=1 All Samples with haplotype A are consistent with haploid based on read countsA A A A A A Diploid Haploid Zero-ploid
  • 12. 12 Breakdown of 772 “accurate” CNVs (1kb to 322kb in size) 26640898 BreakDancerGrouper
  • 13. 13 Assembling breakpoints for the 772 CNVs – Reassessing the “failed” calls where applicable Incorporating different calling algorithms / methods – E.g. SNP inheritance can help identify CNVs that are missed by other methods – Including mate pair data (~2kb insert size) Working on different methods to improve our catalogue of ~30bp to 2kb events & incorporating different callers Assigning error modes for “failed” SNPs – Many look like cell line mutations & alignment errors Comparing our call set to other datasets to assess accuracy and completeness – Other GIAB call sets – Fosmid data (Jaffe & Kidd) Next steps
  • 14. 14 Illumina Oxford Morten Kallberg Zamin Iqbal Xiaoyu Chen Gil McVean Han-Yu Chuang Phil Tedder Sean Humphray Elliott Margulies David Bentley This data and more available at www.platinumgenomes.org Acknowledgements

Notes de l'éditeur

  1. Thank you Tanya and thanks to everyone for attending this seminar.
  2. This project grew out of an observation that there is no comprehensive truth set of variant calls and this gap is becoming increasingly problematic as sequencing moves to the clinic. Additionally, the validation that has been done using trio conflicts or perpendicular technologies usually only assess a relatively small percentage of the variants. Alternatively, we are working to solve this by sequencing a large pedigree and using the parental inheritance to assess accuracy of variant calls with the goal that we will deliver a set of highly accurate variant calls, make the data available publicly as a community resource and also demonstrating a framework for validating variant calls and improving variant callers – especially for more complicated variants such as indels and structural variants.
  3. To demonstate the utility of analyzing a full pedigree we have sequenced all 17 members of a well-characterized CEPH pedigree to 50x depth. In addition we have sequenced the trio highlighted in bold to 200x each and performed a technical replicate of the child of this trio (NA12882) again to 200x so that we have a total of 400x sequence depth on this child. For the work I’m presenting today we will concentrate on SNP analysis in the parents and 11 children of the last two generations but we are already looking at indels and larger variants.
  4. The way that we are able to gain power for error detection is by having the ability to calculated inheritance of the parental haplotypes. With a large number of children we will observe all 4 possible pairings of the parental haplotypes and when that occurs we have much increased power to identify genotype errors. Because there are 11 siblings we even have additional power because there are internal replicates built in for some inherited parental haplotype pairings. In this figure, I’ve highlighted the inheritance pattern for six of the children in a small region of chromosome 22 where a single inheritance pattern occurs – e.g. a region bounded by detected crossover events. Within this region we can convert genotypes to haplotypes as I’ve illustrated above.
  5. If we just look at the haplotypes in blue, we can immediately detect conflicts. For example, one child is the “odd man out” out showing a T rather than a G at the fourth site indicating that there is an error in this genotype. This also illustrates the power of this method. Each genotype call is supported or not supported based on the surrounding genotype calls across the pedigree. In practice, when we calculate conflict rates we choose a parsimonious solution that agrees most closely with the observed genotypes and thus will under-estimate the true error rate though likely this effect is small. This method allows us to assign an error to a sample, impute missing calls and, in some cases, error correct.