Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
2014 agbt giab_progress update
1. Genome in a Bottle Consortium:
Update on a Public-Private-Academic Consortium Developing
a Standards Infrastructure for Human Genome Sequencing
Marc Salit1,2, Sarah A. Munro1,2, Justin Zook1, Genome in a Bottle Consortium
(1) National Institute of Standards and Technology, Gaithersburg, MD 20899; (2) National Institute of Standards and Technology
Advances in Biomedical Measurement Sciences Program at Stanford University, Stanford, CA 94035
Reference Material Selection and Design Group
Overview
In 2012, NIST convened the Genome in a Bottle Consortium to develop
the metrology infrastructure needed to enable confidence in human
whole genome variant calls.
Consortium products will include:
• Well-characterized whole genome and synthetic DNA Reference
Materials (RMs)
• Reference data associated with the RMs
• Reference methods (Comparison tools, documentary standards)
Reference Materials
Sample
Preparation
Variant
List, Performance
metrics
Sequencing
• Personal Genome Project samples – consent for
commercialization
• Ashkenazi Jewish trio
• East Asian trio
www.personalgenomes.org
• Looking for additional large family
• Supporting interlaboratory analysis of potential commercial
Alien Barcode
reference materials, new participants welcome
Mutation of
• COLO-829/COLO 829BL cancer/normal cell line Interest
• Artificial structures as spike-ins for point
mutations or more complex structural variants
Point Mutation Control Plasmids from
• FFPE samples based on RM cell lines
M. Williams et al. Frederick National
Laboratory for Cancer Research
Bioinformatics
These Genome in a Bottle products will enable translation
of whole genome sequencing to clinical applications.
Expected use cases of these products include:
• Enable regulated applications
• Validation, QC, proficiency testing
• Identify and quantify sources of bias & variability
• Optimize measurement technologies
• Resolve structural variants
• Improve reference assembly
• Integrate data from multiple platforms
New participants are welcome to join:
www.genomeinabottle.org
Measurements for Reference Material
Characterization Group
• Initiated experiments to characterize
pilot RM (NA12878)
• 6 institutions have signed Material
Transfer Agreements with NIST for
pilot RM
• Other institutions are welcome to
contact NIST to help with sequencing
PGP trio RMs
• Cell lines are also available from
Coriell
Vial of ~10 g of NA12878 cell line genomic DNA
Multiple measurement
technologies and modes e.g.
• Illumina
• Life Technologies
• Complete Genomics
• Pacific Biosciences
• BioNano Genomics
Milestones
Worked with Coriell to develop ~8300 vials of pilot RM from
NA12878 cell line; samples received by NIST April 2013
NCBI-team led development of Genome in a Bottle FTP site
containing curated data associated with RMs
(ftp://ftp-trace.ncbi.nih.gov/giab/ftp)
NIST-led team prepared Data Integration Manuscript;
preprint available on arXiv (http://arxiv.org/abs/1307.4661).
in press in Nature Biotechnology.
Selected next 2 families for genome RMs from PGP collection;
DNA samples at NIST February 2014
Decided on governance policies including formation of a
steering committee and a data release policy based on Fort
Lauderdale Principles, August 2013
Pilot RM release planned for May 2014; simultaneous release
of pilot RM genotype calls
How you can get involved:
•
•
•
•
•
•
Sequencing/analyzing the new Personal Genome Project trios
Help with Structural Variant calls
Help with analyzing data from long-read technologies
Attend our biannual workshops (January in CA, August in MD)
Help develop methods to measure performance using our wellcharacterized genomes
Use our integrated SNP/indel genotypes for NA12878 and give us
feedback
Bioinformatics, Data Integration, and Data
Representation Group
• Developed data integration methods
and genotype calls for NA12878
• Multi-platform method
• NIST-led team preprint on arXiv,
accepted by Nature Biotechnology
• Pedigree methods
• Real Time Genomics (RTG)
• Illumina Platinum Genomes
• Established data release and QC policies,
FTP site with curated data
Real Time Genomics pedigree-based method
courtesy of Francisco De La Vega
Performance Metrics Group
• Integration of software from partners:
• GCAT (Genome Comparison & Analytic
Testing) tool enables mapping and variant
call comparisons in exome, NIST analysis
results publicly available here
• NCBI/CDC GeT-RM browser supports
visualization of variant calls; includes
NIST highly confident genotypes track
• RTG tool vcfeval for complex variants
• VCFComparator
• HSPH bcbio.variation tools