Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
ASHG 2015 Genome in a bottle
1. Genome in a Bottle: You’ve
sequenced. How well did you do?
October 9, 2015
Justin Zook, Marc Salit, and the
Genome in a Bottle Consortium
*Nothing to Disclose
4. Genome in a Bottle Consortium (GIAB)
Hosted by US National Institute of Standards and Technology
Goal: Provide infrastructure to assess
confidence in human variant calls
• Appropriately consented widely
available DNA samples, distributed by
the Coriell Institute
– Also, QCed Reference Material (RM)
versions from controlled lots will be
available from NIST
– Also, PGP samples are commercially
available
• High-accuracy reference data for these
samples
• Tools to facilitate their use
– With the Global Alliance Data Working
Group Benchmarking Team
Global Alliance for Genomics and Health
ga4gh.org
Genome in a Bottle
genomeinabottle.org
5. GIAB Selected Samples
CEPH/Utah Pedigree 1463
✔
NA1288
9
NA12879
NA12890
NA12880
NA12881
NA12882
NA12883
NA12884
NA12885
NA12886
NA12887
NA12888
NA12893
NA12877 NA12878
NA12891 NA12892
✔ ✔
NA24149 NA24143
NA24385
Ashkenazi Jewish Trio
✔
NA24694 NA24695
NA24631
Asian (Han Chinese) Trio
✔
Note: Illumina and RTG have used data from the pedigree
to improve variant calls in the specific GIAB samples.
New
New
Personal
Genome
Project
Available as
NIST RM8398
6. NGS Validation Process using
Genomes in Bottles
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
Analytical Process
Genome in a Bottle Scope
Pre-Analytical Process
Clinical Interpretation
GIAB
Data
8. Integrated 14 datasets from 5 platforms
to establish Reference SNP/indel Calls for
NA12878
Zook et al., Nature Biotechnology, 2014.
~77 % High-confidence
~23 % Uncertain
9. Uses of GIAB NA12878
Oncology – Molecular and Cellular Tumor Markers
“Next Generation” Sequencing (NGS) guidelines for
somatic genetic variant detection
www.bioplanet.com/gcat
10. GeT-RM Browser from NCBI and CDC
• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/
11. Global Alliance for Genomics and Health
Benchmarking Task Team
• Developed standardized
definitions for
performance metrics like
TP, FP, and FN.
• Developing sophisticated
benchmarking tools
• vcfeval – Len Trigg
• hap.py – Peter Krusche
• vgraph – Kevin Jacobs
• Standardized bed files
with difficult genome
contexts for stratification
Credit: GA4GH, Abby Beeler, Ellie Wood
Stratification of FP Rates
Higher FP rates at Tandem Repeats
14. GIAB Analysis Group – New Data Sets
Leaders
• Francisco de la Vega
• Chris Mason
• Tina Graves
• Valerie Schneider
• Justin Zook
• Marc Salit
Status
• Analysis Group Responsibilities:
– https://docs.google.com/document/d/10e
A0DwB4iYTSFM_LPO9_2LyyN2xEqH49OXH
htNH1uzw/edit?usp=sharing
• Analysis Milestones:
– https://docs.google.com/spreadsheets/d/1Pj4nSz
H742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?u
sp=sharing
• Analysis Methods
– https://docs.google.com/spreadsheet
s/d/1Je2g85H7oK6kMXbBOoqQ1FM
NrvGnFuUJTJn7deyYiS8/edit?usp=sha
ring
• Analysis Plan:
– https://drive.google.com/file/d/0B7Ao1qq
JJDHQdnVEaVdqbWdEdkE/view?usp=shari
ng
• Collecting Data and analyses on GIAB
FTP Site
• Recruiting people to help with the
work.
Goal: Establish and distribute a set of authoritative benchmark variant calls of all
types and sizes, as well as homozygous reference regions, on GIAB PGP trios
15. Analysis Progress: AJ Trio
• SNPs/indels
– NIST working on integration
– 10X/moleculo/PacBio for difficult-to-map regions
• Assembly
– 2 de novo assemblies
– Useful for SV calling
• Structural variants
– Candidate calls being generated by 15+ groups with >20
different algorithms and 6 datasets
– 3+ integration methods
• Long-range Phasing
– 2 phased calls so far (CG LFR and 10X)
– Integration methods needed
• Other analyses
– CpG methylation with PacBio and Illumina
16. GIAB AJ Trio PacBio-only Assemblies
PacBio Only
Input Algorithm
# of
Contigs N50 Max Total
Child
MHAP/Celera
(Phillippy Lab) 13,048 4.5Mb 35.1Mb 3.0Gb
Child
Daligner/Falcon
(Chin/Bashir) 9,973 7.1Mb 39.2Mb 3.0Gb
Mother
MHAP/Celera
(Phillippy Lab) 23,493 1.03Mb 8.9Mb 3.0Gb
Father
MHAP/Celera
(Phillippy Lab) 16,326 0.91Mb 9.8Mb 3.0Gb
Merged
Trio
Daligner/Falcon
(Chin/Bashir) 5,680 9.25 Mb 50.3Mb 2.9Gb
Credits: Ali Bashir, Jason Chin, Adam Phillippy, and Serge Koren
17. GIAB AJ Trio Hybrid PacBio/BioNano
Assembly
Hybrid (PacBio with BioNano)
Input Assembly Notes
# of
Scaffolds N50 Max Total
HG002 Falcon 248 22.7Mb 92.8Mb 2.38Gb
Trio Falcon 210 29.3Mb 87.6Mb 2.32Gb
Two Step
Trio
celera (child) +
falcon (trio) 187 34.3Mb 98.0Mb 2.6Gb
Credits: Ali Bashir, Jason Chin, Alex Hastie
Pendleton et al, Nature Methods, 2015
18. Proposed approach to form high-
confidence SV (and non-SV) calls
Generate Candidate Calls
Compare/evaluate calls using
Parliament/MetaSV/svclassify/others?;
manual inspection
Integrate new and revised calls; manual
inspection
Combine integrated calls; manual inspection;
targeted experimental validation?
August 30, 2015
Nov 1, 2015
Jan 1, 2016
Jan 26, 2016 and beyond
19. Very Preliminary Confirmation of SVs
Integration results from AJ son
Parliament: BMC Genomics, 2015, 16:286 (performed by Andrew Carroll, DNAnexus)
MetaSV: Bioinformatics, 2015, 31:2741 (performed by Marghoob Mohiyuddin, Bina/Roche)
• Parliament
– Candidates from Illumina
– Confirmed by PacBio and/or
Illumina
– ~50% in both technologies
– ~4.5k deletions, 1k insertions
– 85% of Genotypes consistent
within Trio
• MetaSV
– Multiple types of evidence
from Illumina
MetaSV
Total:
2809
Parliament
Total:
5467
569
(20 %)
977
(18 %)
MetaSV
2240
(80 %)
Parliament
4490
(82 %)
50 % reciprocal overlap
Some overlap within Parliament calls
20. New GIAB GitHub Site
github.com/genome-in-a-bottle Credit: Chunlin Xiao, NCBI
21. WARNINGS
• Easiest to benchmark only within high-
confidence bed file
• Benchmark calls/regions tend to be biased
towards easier variants and regions
– Some clinical tests are enriched for difficult sites
• Always manually inspect a subset of FPs/FNs
• Stratification by variant type and region is
important
• Always calculate confidence intervals
22. Acknowledgments
• FDA – Elizabeth
Mansfield, Computing
staff
• Many members of
Genome in a Bottle
– New members
welcome!
– Sign up on website for
email newsletters
Steering Committee
– Marc Salit
– Justin Zook
– David Mittelman
– Andrew Grupe
– Michael Eberle
– Steve Sherry
– Deanna Church
– Francisco De La Vega
– Christian Olsen
– Monica Basehore
– Lisa Kalman
– Christopher Mason
– Elizabeth Mansfield
– Liz Kerrigan
– Leming Shi
– Melvin Limson
– Alexander Wait Zaranek
– Nils Homer
– Fiona Hyland
– Steve Lincoln
– Don Baldwin
– Robyn Temple-Smolkin
– Chunlin Xiao
– Kara Norman
– Luke Hickey
23. For More Information
www.genomeinabottle.org - sign up for general GIAB and Analysis
Team google group emails
github.com/genome-in-a-bottle – Guide to GIAB data & ftp
www.slideshare.net/genomeinabottle
www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser
Data: http://biorxiv.org/content/early/2015/09/15/026468
Global Alliance Benchmarking Team
– ga4gh.org/#/benchmarking-team
Twice yearly workshop
– Winter: January 28-29, 2016 at Stanford University, California, USA
– Summer at NIST, Maryland, USA
Public Meetings!
Justin Zook: jzook@nist.gov
Marc Salit: salit@nist.gov
Contribute calls or
critically evaluate
GIAB calls!
NIST/NRC Postdoc
Opportunities available!