2. NIST RM Development Plans
Genome(s) Q4 2014 Q1 2015 Q2 2015 Q3 2015 Q4 2015
HG-
001/NA1287
8
Release NIST
RM8398;
Preliminary
large
deletions
Refined
Structural
Variants
HG-002 to
HG-004
(Ashkenazim
trio)
Illumina,
Complete
Genomics,
Ion,
BioNano,
and SOLiD
data
Preliminary
SNPs/indels;
100x PacBio
data;
Illumina
assembled
long reads
Refined
SNPs/indels;
Preliminary
SVs
Refined
Structural
Variants
NIST RMs
8391/8392
release
HG-005 (son
in Asian trio)
Illumina,
Complete
Genomics,
Ion,
BioNano,
and SOLiD
data
Illumina
assembled
long reads
Preliminary
SNPs/indels
Refined
SNPs/indels;
Refined
Structural
Variants
NIST
RM8393
release
3. Preliminary uses of high-confidence
NIST-GIAB genotypes for NA12878
โข NIST have released
several versions of high-
confidence genotypes
for its pilot RM
โข These data are
presently being used for
benchmarking
โ prior to release of RMs
โ SNPs & indels
โข ~77% of the genome
4. Data Release Plans
Individual Datasets
โข Uploaded to GIAB FTP site
as it is collected
โข May include raw reads,
aligned reads, and
variant/reference calls
Integrated High-confidence Calls
โข First develop SNP, indel, and
homozygous reference calls
โข Then develop SV and non-
SV calls
โข Released calls are versioned
โข Preliminary callsets will be
made available to be
critiqued
โข Data jamboree??
5. Pilot RM (NA12878)
โข HapMap/1000
Genomes sample
โข Lots of public data and
analyses
โข Not consented for
commercial
redistribution
โข Data from pedigree
available and analyzed
โข ~8000 units for NIST RM
โข High-confidence calls
released
โ integrates multiple
datasets and phased
pedigree analysis
โข Developing SV calls
โข Planned release as NIST
RM8398 in Q4 2014
6. Ashkenazim PGP trio
โข Personal Genome Project
trio
(huAA53E0/hu8E87A9/hu6E
4515)
โข Father/mother/son at
Coriell
(GM24143/GM24149/GM2
4385)
โข Consented for commercial
redistribution
โข Most short-read data will be
available Q3 2014
โข 100x PacBio WGS
completed ~Q1 2015
โข 10x Illumina assembled long
reads for son ~Q1 2015
โข Planned NIST RM release
~Q4 2015
โ NIST RM 8391 will be only the
son (~8000 units)
โ NIST RM 8392 will contain all
3 family members (~2500
units)
7. Asian PGP trio
โข Personal Genome Project
trio
(hu91BD69/hu38168C/hu
CA017E)
โข Father/mother/son at
Coriell
(GM24695/GM24694/GM
24631)
โข Only the son planned for
NIST RM but trio will be
characterized
โข Consented for
commercial redistribution
โข Most short-read data will
be available Q3-Q4 2014
โข 10x Illumina assembled
long reads for son ~Q1
2015
โข Planned NIST RM release
~Q4 2015
โ NIST RM 8393 will be only
the son (~11000 units)
8. New Platform-specific (-independent?)
Integration Method
Normalize and
take union of calls
Simple
SNPs/indels
Illumina/SOLiD โ
GATK HC force
calls
Ion โ TVC force
calls
If all biased or low
qual, uncertain
Elseif all
concordant, high-
conf
Elseif all unbiased
are concordant,
high-conf
Else uncertain
CG โ use Ref file
Complex Variants
Use vcfeval or
SMASH for
sequential pair-
wise comparison
9. Integration Method Plans
โข Implement new integration methods on the cloud
โ Easier forโฆ
โข distributed analysis
โข scalability
โข transparency
โข others to reproduce results
โข First, analyze NA12878 RM data with new
methods to ensure they work well
โข Then, apply to PGP trios