SlideShare a Scribd company logo
1 of 26
Download to read offline
Genome in a Bottle Consortium
Progress Update
January 27, 2014
Justin Zook, Marc Salit, and the Genome in a
Bottle Consortium
Whole Genome RMs vs.
Current Validation Methods
• Sanger confirmation
– Limited by number of sites (and sometimes it’s wrong)

• High depth NGS confirmation
– May have same systematic errors

• Genotyping microarrays
– Limited to known (easier) variants
– Problems with neighboring “complex” variants, duplications

• Mendelian inheritance
– Can’t account for some systematic errors

• Simulated data
– Generally not very representative of errors in real data

• Ti/Tv
– Varies by region of genome, and only gives overall statistic
2
Goals for Data to Accompany RM
• ~0 false positive AND false negative calls in
confident regions
• Include as much of the genome as possible in
the confident regions (i.e., don’t just take the
intersection)
• Avoid bias towards any particular platform
– take advantage of strengths of each platform

• Avoid bias towards any particular
bioinformatics algorithms
3
Integrate 12 14 Datasets from 5
platforms

4
Integration of Data to
Form Highly Confident Genotype Calls
Candidate variants

Find all possible variant sites

Concordant variants

Find concordant sites across multiple datasets

Find characteristics
of bias

Identify sites with atypical characteristics signifying
sequencing, mapping, or alignment bias

Arbitrate using
evidence of bias

For each site, remove datasets with decreasingly atypical
characteristics until all datasets agree

Confidence Level

Even if all datasets agree, identify them as uncertain if
few have typical characteristics, or if they fall in known
segmental duplications, SVs, or long repeats
5
Verification of “Highly Confident”
Genotype accuracy
• Sanger sequencing
– 100% accuracy but only 100s of sites

• X Prize Fosmid sequencing
– Sometimes call only part of a complex variant

• Microarrays
– Differences appear to be FP or FN in arrays

• Broad 250bp HaplotypeCaller
– Very highly concordant

• Platinum genomes pedigree SNPs
– Some systematic errors are inherited; different representations
of complex variants

• Real Time Genomics SNPs and indels
– Some interesting sites called by RTG complex caller
6
GCAT – Interactive Performance
Metrics
• NIST is working with
GCAT to use our highly
confident variant calls
• Assess performance of
many combinations of
mappers and variant
callers
• www.bioplanet.com/gc
at

Improvement of FreeBayes over 1 year with indels

7
Why do calls differ from our highly
confident genotypes?
Apparent False Positives
• Platform-specific systematic
sequencing errors for SNPs
• Analysis-specific
• Difficult to map regions
• Indels in long
homopolymers

Apparent False Negatives
• Different complex variant
representation
• Near indels
• Inside repeats

8
Complex variants have multiple correct
unphased representations
BWA

T
insertion

CGTools

Ref:

FP indels

TCTCT
insertion

Traditional
comparison

0.38%
(610)

100%
(915)

6.5%
(733)

Comparison
with
realignment

ssaha2

Novoalign

FP SNPs FP MNPs

0.15%
(249)

4.2%
(38)

2.6%
(298)

• ~225,000 highly confident
variants are within 10bp of
another variant
• FPs and FNs are significantly
enriched for complex variants
• RTG vcfeval can fix this issue!
9
Reasons we exclude regions from highconfidence set
Reasons we exclude regions from highconfidence set
Structural variant analytical approach
Depth of coverage (DOC)
Control-FREEC
CnD
Paired-end mapping (PEM)
Breakdancer
Split read (SR)
Pindel
Assembly based (AS)
Velvet
ABySS
Combination
Genome-STRiP

SVMerge

List of
structural
variant calls
140127 GIAB update and NIST high-confidence calls
Validation parameters for each SV
• Coverage (mean and standard deviation)
• Paired-end distance/insert size (mean and
standard deviation)
• # of discordant paired-ends
• Soft clipping of the reads (mean and standard
deviation)
• Mapping quality (mean and standard deviation)
• # of heterozygous and homozygous SNP
genotype calls
Challenges with assessing
performance
• All variant types are not
equal
• All regions of the genome
are not equal
– Homopolymers, STRs, dupli
cations
– Can be similar or different
in different genomes

• Labeling difficult variants
as uncertain leads to
higher apparent accuracy
when assessing
performance
• Genotypes fall in 3+
categories (not
positive/negative)
– standard diagnostic
accuracy measures not
well posed
15
Pedigree calls
• RTG and Illumina Platinum
Genomes working on this
• Sequence
NA12878, husband, and 11
children to identify high
confidence variants
– Identify cross-over events
– Determine if genotypes are
consistent with inheritance

• Should we integrate these
with the NIST high-confidence
genotypes?
• Should we find larger families
for future genomes?
• See afternoon presentations!

Source: Mike Eberle, Illumina

16
Pedigree Calls in Uncertain Regions
GIAB Characterization of pilot RM
•
•
•
•

NIST – 300x 150x150bp HiSeq (from 6 vials)
NIST – 100x 75bp ECC SOLiD 5500W
Illumina – 50x 100x100bp HiSeq
Complete Genomics – Normal and LFR (nonRM)
• Garvan Institute – Illumina exome
• NCI – Ion Proton whole genome
• INOVA – Infinium SNP/CNV array
Homogeneity and Stability
Homogeneity
• Multiplex First and last vial
– 3 libraries x 33x HiSeq each

• Multiplex 4 Random vials
– 2 libraries x 12.5x HiSeq each

• Compare variability due to:
–
–
–
–
–
–

vial
library
day
flow cell
lane
sampling

• Run PFGE on each vial for size

Stability
• Run PFGE to detect DNA
degradation
• Freeze-thaw 2 and 5 times
• Vortex for 10s
• 4°C for 2 and 8 weeks
• 37°C for 2 and 8 weeks
FTP site and Amazon S3
• NCBI is hosting fastq, bam, and vcf files on the
giab ftp site
• These data are mirrored to Amazon S3, so we
encourage you to take advantage of this!
Pilot Reference Material
• High-confidence calls are available on the ftp
site and are already being used
• NIST plans to release this as a NIST Reference
Material in the next couple months
Future Directions
• Characterize more
“difficult” regions/variants
• Structural variants
• Compare to pedigree calls
• Examine potentially
clinically relevant
regions/variants in RMs
• Use long-read technologies
–
–
–
–
–

Moleculo
CG LFR
PacBio
BioNano Genomics
future technologies??

• Use glia/platypus to realign
reads to candidate variants

• Analyze interlaboratory
study data
• Characterize PGP genomes
–
–
–
–

Ashkenazim trio
son in Asian trio
DNA at NIST in Jan-Feb 2014
Volunteers to sequence?

• Select future genomes
• Tumor-normal?
Topic #1: Moving beyond the easy
regions/variants
Presentations
• Emerging Technologies
–
–
–
–

PacBio
Complete Genomics LFR
Moleculo
BioNano Genomics

• Structural Variants
– Bina Technologies

Topics
• Structural Variants
• Phasing
• Validation
• Where should we set the
threshold(s) for confidence?
Topic #2: Cancer and Future Genomes
Cancer
• Spike-ins
• Mixtures of normal cell lines
• Tumor-normal cell line pair
• Transriptome controls

Priorities for Future Genomes
• Diverse ancestry groups
• Larger families
• Recruitment with consent
for commercialization
• How many genomes?
• Should the parents be NIST
Reference Materials, or only
the child?
Working Group Questions
RM Selection & Design
• Spike-in controls
• FFPE
• Commercial RMs
• ABRF interlaboratory study
• Should we prioritize one or
two genomes?

RM Characterization
• Production mode for new
trios
– Pilot was characterized by
Illumina, SOLiD, Ion
Proton, and Complete
Genomics
– What resources should we
invest in measurements for
each new family?
Working Group Questions
Bioinformatics
• Storing data/pipelines
– Suggestions for ftp structure
– Data submission/accessioning
process
– Data model for genomic data
– Archiving pipelines and
reproducible research

• GRCh38
• How to use pedigree calls for pilot
genome?
• Clones for targeted regions (hard
regions if not whole genome)
• In which difficult regions should
we focus our characterization?

Performance Metrics
• Target audience
• Requirements for user
interface
– Establishing truth set(s)
– Inputs/Outputs
– Visualization

• Integration with GeT-RM

More Related Content

What's hot

RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
Nida ws neale_seq_data_gen
Nida ws neale_seq_data_genNida ws neale_seq_data_gen
Nida ws neale_seq_data_genFonareerat
 
Giab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGiab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGenomeInABottle
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review PaperApplication of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review PaperJournal of Agriculture and Crops
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2Dan Gaston
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 

What's hot (20)

171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Nida ws neale_seq_data_gen
Nida ws neale_seq_data_genNida ws neale_seq_data_gen
Nida ws neale_seq_data_gen
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Giab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarkingGiab v0.6 genoox sv benchmarking
Giab v0.6 genoox sv benchmarking
 
Rna seq
Rna seq Rna seq
Rna seq
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review PaperApplication of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
Application of Molecular Markers SNP and DArT in Plant Breeding: A Review Paper
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 

Similar to 140127 GIAB update and NIST high-confidence calls

Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878GenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Aug2014 giab status update and wg charge
Aug2014 giab status update and wg chargeAug2014 giab status update and wg charge
Aug2014 giab status update and wg chargeGenomeInABottle
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slidesGenomeInABottle
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studiesFOODCROPS
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_posterGenomeInABottle
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summaryGenomeInABottle
 
Molecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxMolecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxTrilokMandal2
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.QIAGEN
 

Similar to 140127 GIAB update and NIST high-confidence calls (20)

Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Aug2014 giab status update and wg charge
Aug2014 giab status update and wg chargeAug2014 giab status update and wg charge
Aug2014 giab status update and wg charge
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summary
 
Molecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxMolecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptx
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
 

More from GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyGenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphsGenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normalGenomeInABottle
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccsGenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seqGenomeInABottle
 

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 

Recently uploaded

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 

Recently uploaded (20)

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 

140127 GIAB update and NIST high-confidence calls

  • 1. Genome in a Bottle Consortium Progress Update January 27, 2014 Justin Zook, Marc Salit, and the Genome in a Bottle Consortium
  • 2. Whole Genome RMs vs. Current Validation Methods • Sanger confirmation – Limited by number of sites (and sometimes it’s wrong) • High depth NGS confirmation – May have same systematic errors • Genotyping microarrays – Limited to known (easier) variants – Problems with neighboring “complex” variants, duplications • Mendelian inheritance – Can’t account for some systematic errors • Simulated data – Generally not very representative of errors in real data • Ti/Tv – Varies by region of genome, and only gives overall statistic 2
  • 3. Goals for Data to Accompany RM • ~0 false positive AND false negative calls in confident regions • Include as much of the genome as possible in the confident regions (i.e., don’t just take the intersection) • Avoid bias towards any particular platform – take advantage of strengths of each platform • Avoid bias towards any particular bioinformatics algorithms 3
  • 4. Integrate 12 14 Datasets from 5 platforms 4
  • 5. Integration of Data to Form Highly Confident Genotype Calls Candidate variants Find all possible variant sites Concordant variants Find concordant sites across multiple datasets Find characteristics of bias Identify sites with atypical characteristics signifying sequencing, mapping, or alignment bias Arbitrate using evidence of bias For each site, remove datasets with decreasingly atypical characteristics until all datasets agree Confidence Level Even if all datasets agree, identify them as uncertain if few have typical characteristics, or if they fall in known segmental duplications, SVs, or long repeats 5
  • 6. Verification of “Highly Confident” Genotype accuracy • Sanger sequencing – 100% accuracy but only 100s of sites • X Prize Fosmid sequencing – Sometimes call only part of a complex variant • Microarrays – Differences appear to be FP or FN in arrays • Broad 250bp HaplotypeCaller – Very highly concordant • Platinum genomes pedigree SNPs – Some systematic errors are inherited; different representations of complex variants • Real Time Genomics SNPs and indels – Some interesting sites called by RTG complex caller 6
  • 7. GCAT – Interactive Performance Metrics • NIST is working with GCAT to use our highly confident variant calls • Assess performance of many combinations of mappers and variant callers • www.bioplanet.com/gc at Improvement of FreeBayes over 1 year with indels 7
  • 8. Why do calls differ from our highly confident genotypes? Apparent False Positives • Platform-specific systematic sequencing errors for SNPs • Analysis-specific • Difficult to map regions • Indels in long homopolymers Apparent False Negatives • Different complex variant representation • Near indels • Inside repeats 8
  • 9. Complex variants have multiple correct unphased representations BWA T insertion CGTools Ref: FP indels TCTCT insertion Traditional comparison 0.38% (610) 100% (915) 6.5% (733) Comparison with realignment ssaha2 Novoalign FP SNPs FP MNPs 0.15% (249) 4.2% (38) 2.6% (298) • ~225,000 highly confident variants are within 10bp of another variant • FPs and FNs are significantly enriched for complex variants • RTG vcfeval can fix this issue! 9
  • 10. Reasons we exclude regions from highconfidence set
  • 11. Reasons we exclude regions from highconfidence set
  • 12. Structural variant analytical approach Depth of coverage (DOC) Control-FREEC CnD Paired-end mapping (PEM) Breakdancer Split read (SR) Pindel Assembly based (AS) Velvet ABySS Combination Genome-STRiP SVMerge List of structural variant calls
  • 14. Validation parameters for each SV • Coverage (mean and standard deviation) • Paired-end distance/insert size (mean and standard deviation) • # of discordant paired-ends • Soft clipping of the reads (mean and standard deviation) • Mapping quality (mean and standard deviation) • # of heterozygous and homozygous SNP genotype calls
  • 15. Challenges with assessing performance • All variant types are not equal • All regions of the genome are not equal – Homopolymers, STRs, dupli cations – Can be similar or different in different genomes • Labeling difficult variants as uncertain leads to higher apparent accuracy when assessing performance • Genotypes fall in 3+ categories (not positive/negative) – standard diagnostic accuracy measures not well posed 15
  • 16. Pedigree calls • RTG and Illumina Platinum Genomes working on this • Sequence NA12878, husband, and 11 children to identify high confidence variants – Identify cross-over events – Determine if genotypes are consistent with inheritance • Should we integrate these with the NIST high-confidence genotypes? • Should we find larger families for future genomes? • See afternoon presentations! Source: Mike Eberle, Illumina 16
  • 17. Pedigree Calls in Uncertain Regions
  • 18. GIAB Characterization of pilot RM • • • • NIST – 300x 150x150bp HiSeq (from 6 vials) NIST – 100x 75bp ECC SOLiD 5500W Illumina – 50x 100x100bp HiSeq Complete Genomics – Normal and LFR (nonRM) • Garvan Institute – Illumina exome • NCI – Ion Proton whole genome • INOVA – Infinium SNP/CNV array
  • 19. Homogeneity and Stability Homogeneity • Multiplex First and last vial – 3 libraries x 33x HiSeq each • Multiplex 4 Random vials – 2 libraries x 12.5x HiSeq each • Compare variability due to: – – – – – – vial library day flow cell lane sampling • Run PFGE on each vial for size Stability • Run PFGE to detect DNA degradation • Freeze-thaw 2 and 5 times • Vortex for 10s • 4°C for 2 and 8 weeks • 37°C for 2 and 8 weeks
  • 20. FTP site and Amazon S3 • NCBI is hosting fastq, bam, and vcf files on the giab ftp site • These data are mirrored to Amazon S3, so we encourage you to take advantage of this!
  • 21. Pilot Reference Material • High-confidence calls are available on the ftp site and are already being used • NIST plans to release this as a NIST Reference Material in the next couple months
  • 22. Future Directions • Characterize more “difficult” regions/variants • Structural variants • Compare to pedigree calls • Examine potentially clinically relevant regions/variants in RMs • Use long-read technologies – – – – – Moleculo CG LFR PacBio BioNano Genomics future technologies?? • Use glia/platypus to realign reads to candidate variants • Analyze interlaboratory study data • Characterize PGP genomes – – – – Ashkenazim trio son in Asian trio DNA at NIST in Jan-Feb 2014 Volunteers to sequence? • Select future genomes • Tumor-normal?
  • 23. Topic #1: Moving beyond the easy regions/variants Presentations • Emerging Technologies – – – – PacBio Complete Genomics LFR Moleculo BioNano Genomics • Structural Variants – Bina Technologies Topics • Structural Variants • Phasing • Validation • Where should we set the threshold(s) for confidence?
  • 24. Topic #2: Cancer and Future Genomes Cancer • Spike-ins • Mixtures of normal cell lines • Tumor-normal cell line pair • Transriptome controls Priorities for Future Genomes • Diverse ancestry groups • Larger families • Recruitment with consent for commercialization • How many genomes? • Should the parents be NIST Reference Materials, or only the child?
  • 25. Working Group Questions RM Selection & Design • Spike-in controls • FFPE • Commercial RMs • ABRF interlaboratory study • Should we prioritize one or two genomes? RM Characterization • Production mode for new trios – Pilot was characterized by Illumina, SOLiD, Ion Proton, and Complete Genomics – What resources should we invest in measurements for each new family?
  • 26. Working Group Questions Bioinformatics • Storing data/pipelines – Suggestions for ftp structure – Data submission/accessioning process – Data model for genomic data – Archiving pipelines and reproducible research • GRCh38 • How to use pedigree calls for pilot genome? • Clones for targeted regions (hard regions if not whole genome) • In which difficult regions should we focus our characterization? Performance Metrics • Target audience • Requirements for user interface – Establishing truth set(s) – Inputs/Outputs – Visualization • Integration with GeT-RM

Editor's Notes

  1. ----- Meeting Notes (5/28/13 17:05) -----ask heng for decoy