SlideShare une entreprise Scribd logo
1  sur  25
Development & applications of a
segregation-phasing ground truth
GENOME- IN- A- BOTTLE W ORKSHOP

Francisco M. De La Vega, D.Sc.
Visiting Scholar, Department of Genetics
Stanford University School of Medicine
In collaboration with Real Time Genomics, Inc.
Evaluating Variant Calls

O'Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and
genome sequencing. Genome Medicine 5, 28 (2013).
Beyond Venn Diagrams
Experimental validation (e.g. Sanger, qPCR)
 Expensive
 Limited by platform success
 Statistical sample
Reference orthogonal data available for some genomes
 SNP array data
 Sparse fosmid sequencing data
 Incomplete
Reference genomes sequenced by multiple platforms
 Arbitration methods (e.g. NIST, Genome-in-a-Bottle)
 Low FP, but unknown FN (genome-wide)
 Biases?
Mendelian segregation as “ground truth”
CEPH/Utah Pedigree 1463
Sequenced by CGI and Illumina (Platinum Genomes)
Started with 2x100bp 50X WGS Illumina Platinum data
 Aligned & variant called with rtgVariant 1.1, filter by quality score
(AVR≥0.15) across the samples, excluding problematic sites

NA12889

NA12890

NA12891

NA12877

NA12879

NA12880

NA12881

NA12882

NA12892

NA12878

NA12883

NA12884

NA12885

NA12886

NA12887

NA12888

NA12893
Example: Heterozygous variant segregation

NA12890

NA12877

NA12891

0/0

0/1

Trio Cal ling

NA12889

NA12892

NA12878

NA12879

NA12880

NA12881

NA12882

NA12883

NA12884

NA12885

NA12886

NA12887

NA12888

NA12893

0/0

0/1

0/1

0/1

0/1

0/0

0/1

0/0

0/1

0/0

0/0
Segregation of heterozygous variants to offspring
SNV

All Variants
80,000

80,000

SNV count

Variant count

100,000

60,000
40,000

60,000
40,000
20,000

20,000
0

0
1

2

3

4

5

6

7

8

9

10

1

11

2

3

4

6

7

8

9

10

9

10

11

# of offspring segregating

# of offspirng segregating

MNP

indel
500

8,000

400

MNP count

10,000

indel count

5

6,000
4,000

300
200

2,000

100

0

0
1

2

3

4

5

6

7

8

# of offspring segregating

9

10

11

1

2

3

4

5

6

7

8

# of offspring segregating

11
Steps for haplotype phasing in large family

Identify crossovers
Phase contiguity extension
Connect haplotype islands
Check calls vs haplotype framework
Phasing labels given parent and child genotypes
Parents

Children

fa/fb

ma/mb

0/0

0/1

fa/mb

fb/ma

fb/mb

0/0

0/1

1/1

fa/ma

0/1

0/1

fa/ma
0/1

0/0

fb/ma

fb/mb

fa/mb
0/0

2/3

fa/mb
fb/mb

0/1

0/2

1/1

1/2

fa/ma
0/1

0/2

fb/ma
1/2

0/1
fa/ma

0/1

1/2

fa/mb

fb/ma

fb/mb

0/2

0/3

1/2

1/3

fa/ma

fa/mb

fb/ma

fb/mb
Identification of recombination crossovers
Chr 1 Mother

Chr 6, Mother
Recombination crossovers statistics
45

Total: 686

40

35
30
25
20
15
10
5

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Father

Mother
Linking of phased regions
Chr 1, Mother

Chr 6, Mother
Testing for Phase Consistency

Example with 4 offspring
Father

Phasing
Labels

fa

Phasings

Genotypes

fb

ma

0/1
0
0
1
1

Genotypes
Phasings

Mother

mb

Offspring 2

Offspring 3

Offspring 4

fa

fa

fb

fb

0/1
1
1
0
0

0
1
0
1

0/0
0
0

Offspring 1

0/1
1
0
1
0

0
0
1
1

0/1
0
0

0
1

ma

0/0
0
1
0
1

0
0
1
1

0/0
1
0

0
0

mb

1/1
1
0
1
0

1
1
0
0

0/1
0
1

0
0

ma

0/1
0
1
0
1

1
1
0
0

0/0
1
0

0
0

mb
1
0
1
0

0/1
0
1

0
0

1
0
Probability of a set of genotypes being phase-consistent
by chance
Given that there are d different genotypes across both the parents and
children and that the number of times each of these genotypes occurs is ni
and
, then the probability is:

Cleary, J. G., et al. Joint variant and de novo mutation identification on pedigrees from high-throughput
sequencing data. bioRxiv (2014). doi:10.1101/001958
Probability of a set of genotypes being phase-consistent
by chance – some examples
Genotype Counts
0/0

0/1

1/1

0/2

1/2

13

Probability
1

13

3.01x10-1

6

7

1.01x10-2

1

12

1.11x10-1

1

11

1

1.36x10-2

4

4

5

5.53x10-4

3

3

3

4

6.13x10-5

1

3

3

12

3.68x10-1

1

5

6

1

2.75x10-4

1

11

13

1

7.46x10-2
Phasing consistent variants

Illumina 2x100 bp 50X WGS Data, RTG Trio Calls

Raw
Call Set

AVR >0.15

n

%

n

%

Phase consistent

5,224,138

77.35

4,606,574

99.28

Phase inconsistent

1,329,189

19.68

13,951

0.30

200,450

2.96

19,197

0.41

6,753,777

99.99

4,639,722

99.99

Repaired
Calls inside
phased segments

Y-chromosome excluded
Phasing consistent variants

Illumina 2x100 bp 50X WGS Data, BWA/GATK UG v1.7 Calls

VQSR 1st Tranche

Raw
Call Set

n

%

n

%

Phase consistent

6,941,213

68.34

5,863,035

96.00

Phase inconsistent

2,263,975

22.29

184,169

3.01

951,682

9.36

59,592

0.97

10,156,870

99.53

6,106,796

99.98

Repaired
Calls inside
phased segments

Y-chromosome excluded
ROC curve: NA12878 vs Phased-Consistent
4,000,000

3,500,000

3,000,000

True Positive

2,500,000

2,000,000

1,500,000

singleton
1,000,000

trio
trio-cohort

500,000

gatk
0
0

50,000

100,000

150,000

200,000

250,000

300,000

False Positive

RTG sorted by AVR; GATK sorted by VQSLOD (1st tranche)

350,000

400,000
NIST GiaB arbitration vs Phase-Consistent
Confident regions
Genome-wide
Assessment of score recalibration models

rtgVariant v 1.1; NA12878
21

Assessment of MNP & indel calling (rtgVariant 1.0)
Deletions

Insertions

•

•
•

In rtgVariant 1.0,
longer insertions
have higher FP than
small and deletions.
More FP in MNP
Improvements in
aligner for v1.2
SNV/MNPs

0.5%

Percentage of phase inconsistent calls
rtgVariant v 1.0; NA12878
Summary & Perspectives
• Genetic segregation in a large family offers a unique
opportunity to identify “true” sets of variants
• Requires collecting data for whole family as new
chemistries and platforms become available (e.g.
2x250bp, Moleculo reads)
• Data from multiple platforms can be merged to create
a comprehensive phase-consistent ground truth
• Allows rational assessment of variant pipelines and
improvement of algorithms
• Some issues that need to be dealt with: cell line
artifacts, CNVs, systematic errors, SVs.
rtgTools v1.0
A toolkit to compare and analyze VCFs

•
•
•
•
•
•
•

vcfeval – comparison of VCFs for ROC curves
rocplot – draw ROC curves from vcfeval output
medelian – counts of Mendelian inheritance errors in pedigrees
vcfstats – basic statistics of VCF files
vcffilter – filtering of VCFs by scores, etc.
vcfannotate – annotation of VCF files
vcfmerge – merge VCF files

Java compiled code freely available at GiaB repository:
ftp://ftp-trace.ncbi.nih.gov/giab/ftp/tools/RTG/
http://biorxiv.org/content/early/2014/01/24/001958
Acknowledgements
RTG, Hamilton, New Zealand
 John Cleary
 Ross Braithwaite
 Len Trigg
RTG, San Bruno, CA
 Sahar Malakshah
 Minita Shah
Michael Eberle, Illumina, Inc. – Platinum Project data
Complete Genomics, Inc. – CEPH pedigree data
Justin Zook – NIST
Data and tools to compare with phased standard released publicly at NIST
Genome-in-a-Bottle repository (s3://giab)
This work was done while the presenter was employed by Real Time Genomics Inc.,
San Bruno, CA.
© 2014 Real Time Genomics, Inc. All rights reserved.

Contenu connexe

Tendances

160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminarGenomeInABottle
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleGenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normalGenomeInABottle
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justinGenomeInABottle
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomicsGenomeInABottle
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_posterGenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 

Tendances (20)

160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 

Similaire à 140127 rtg phased pedigree analyses

Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02鋒博 蔡
 
Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02t7260678
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesDenise Carvalho-Silva, PhD
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Map Based Cloning.pptx
Map Based Cloning.pptxMap Based Cloning.pptx
Map Based Cloning.pptxAnkit136730
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence callsGenomeInABottle
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Klaas Vandepoele
 

Similaire à 140127 rtg phased pedigree analyses (20)

Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02
 
Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar series
 
Ngs pgd
Ngs pgdNgs pgd
Ngs pgd
 
Ngs pgd
Ngs pgdNgs pgd
Ngs pgd
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Map Based Cloning.pptx
Map Based Cloning.pptxMap Based Cloning.pptx
Map Based Cloning.pptx
 
molecular markers
molecular markersmolecular markers
molecular markers
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
Biotech 2012 spring-7_-rflp_0
Biotech 2012 spring-7_-rflp_0Biotech 2012 spring-7_-rflp_0
Biotech 2012 spring-7_-rflp_0
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?
 

Plus de GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyGenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphsGenomeInABottle
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccsGenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seqGenomeInABottle
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethionGenomeInABottle
 

Plus de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 

Dernier

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Dernier (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

140127 rtg phased pedigree analyses

  • 1. Development & applications of a segregation-phasing ground truth GENOME- IN- A- BOTTLE W ORKSHOP Francisco M. De La Vega, D.Sc. Visiting Scholar, Department of Genetics Stanford University School of Medicine In collaboration with Real Time Genomics, Inc.
  • 2. Evaluating Variant Calls O'Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Medicine 5, 28 (2013).
  • 3. Beyond Venn Diagrams Experimental validation (e.g. Sanger, qPCR)  Expensive  Limited by platform success  Statistical sample Reference orthogonal data available for some genomes  SNP array data  Sparse fosmid sequencing data  Incomplete Reference genomes sequenced by multiple platforms  Arbitration methods (e.g. NIST, Genome-in-a-Bottle)  Low FP, but unknown FN (genome-wide)  Biases?
  • 4. Mendelian segregation as “ground truth”
  • 5. CEPH/Utah Pedigree 1463 Sequenced by CGI and Illumina (Platinum Genomes) Started with 2x100bp 50X WGS Illumina Platinum data  Aligned & variant called with rtgVariant 1.1, filter by quality score (AVR≥0.15) across the samples, excluding problematic sites NA12889 NA12890 NA12891 NA12877 NA12879 NA12880 NA12881 NA12882 NA12892 NA12878 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893
  • 6. Example: Heterozygous variant segregation NA12890 NA12877 NA12891 0/0 0/1 Trio Cal ling NA12889 NA12892 NA12878 NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 0/0 0/1 0/1 0/1 0/1 0/0 0/1 0/0 0/1 0/0 0/0
  • 7. Segregation of heterozygous variants to offspring SNV All Variants 80,000 80,000 SNV count Variant count 100,000 60,000 40,000 60,000 40,000 20,000 20,000 0 0 1 2 3 4 5 6 7 8 9 10 1 11 2 3 4 6 7 8 9 10 9 10 11 # of offspring segregating # of offspirng segregating MNP indel 500 8,000 400 MNP count 10,000 indel count 5 6,000 4,000 300 200 2,000 100 0 0 1 2 3 4 5 6 7 8 # of offspring segregating 9 10 11 1 2 3 4 5 6 7 8 # of offspring segregating 11
  • 8. Steps for haplotype phasing in large family Identify crossovers Phase contiguity extension Connect haplotype islands Check calls vs haplotype framework
  • 9. Phasing labels given parent and child genotypes Parents Children fa/fb ma/mb 0/0 0/1 fa/mb fb/ma fb/mb 0/0 0/1 1/1 fa/ma 0/1 0/1 fa/ma 0/1 0/0 fb/ma fb/mb fa/mb 0/0 2/3 fa/mb fb/mb 0/1 0/2 1/1 1/2 fa/ma 0/1 0/2 fb/ma 1/2 0/1 fa/ma 0/1 1/2 fa/mb fb/ma fb/mb 0/2 0/3 1/2 1/3 fa/ma fa/mb fb/ma fb/mb
  • 10. Identification of recombination crossovers Chr 1 Mother Chr 6, Mother
  • 11. Recombination crossovers statistics 45 Total: 686 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Father Mother
  • 12. Linking of phased regions Chr 1, Mother Chr 6, Mother
  • 13. Testing for Phase Consistency Example with 4 offspring Father Phasing Labels fa Phasings Genotypes fb ma 0/1 0 0 1 1 Genotypes Phasings Mother mb Offspring 2 Offspring 3 Offspring 4 fa fa fb fb 0/1 1 1 0 0 0 1 0 1 0/0 0 0 Offspring 1 0/1 1 0 1 0 0 0 1 1 0/1 0 0 0 1 ma 0/0 0 1 0 1 0 0 1 1 0/0 1 0 0 0 mb 1/1 1 0 1 0 1 1 0 0 0/1 0 1 0 0 ma 0/1 0 1 0 1 1 1 0 0 0/0 1 0 0 0 mb 1 0 1 0 0/1 0 1 0 0 1 0
  • 14. Probability of a set of genotypes being phase-consistent by chance Given that there are d different genotypes across both the parents and children and that the number of times each of these genotypes occurs is ni and , then the probability is: Cleary, J. G., et al. Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data. bioRxiv (2014). doi:10.1101/001958
  • 15. Probability of a set of genotypes being phase-consistent by chance – some examples Genotype Counts 0/0 0/1 1/1 0/2 1/2 13 Probability 1 13 3.01x10-1 6 7 1.01x10-2 1 12 1.11x10-1 1 11 1 1.36x10-2 4 4 5 5.53x10-4 3 3 3 4 6.13x10-5 1 3 3 12 3.68x10-1 1 5 6 1 2.75x10-4 1 11 13 1 7.46x10-2
  • 16. Phasing consistent variants Illumina 2x100 bp 50X WGS Data, RTG Trio Calls Raw Call Set AVR >0.15 n % n % Phase consistent 5,224,138 77.35 4,606,574 99.28 Phase inconsistent 1,329,189 19.68 13,951 0.30 200,450 2.96 19,197 0.41 6,753,777 99.99 4,639,722 99.99 Repaired Calls inside phased segments Y-chromosome excluded
  • 17. Phasing consistent variants Illumina 2x100 bp 50X WGS Data, BWA/GATK UG v1.7 Calls VQSR 1st Tranche Raw Call Set n % n % Phase consistent 6,941,213 68.34 5,863,035 96.00 Phase inconsistent 2,263,975 22.29 184,169 3.01 951,682 9.36 59,592 0.97 10,156,870 99.53 6,106,796 99.98 Repaired Calls inside phased segments Y-chromosome excluded
  • 18. ROC curve: NA12878 vs Phased-Consistent 4,000,000 3,500,000 3,000,000 True Positive 2,500,000 2,000,000 1,500,000 singleton 1,000,000 trio trio-cohort 500,000 gatk 0 0 50,000 100,000 150,000 200,000 250,000 300,000 False Positive RTG sorted by AVR; GATK sorted by VQSLOD (1st tranche) 350,000 400,000
  • 19. NIST GiaB arbitration vs Phase-Consistent Confident regions Genome-wide
  • 20. Assessment of score recalibration models rtgVariant v 1.1; NA12878
  • 21. 21 Assessment of MNP & indel calling (rtgVariant 1.0) Deletions Insertions • • • In rtgVariant 1.0, longer insertions have higher FP than small and deletions. More FP in MNP Improvements in aligner for v1.2 SNV/MNPs 0.5% Percentage of phase inconsistent calls rtgVariant v 1.0; NA12878
  • 22. Summary & Perspectives • Genetic segregation in a large family offers a unique opportunity to identify “true” sets of variants • Requires collecting data for whole family as new chemistries and platforms become available (e.g. 2x250bp, Moleculo reads) • Data from multiple platforms can be merged to create a comprehensive phase-consistent ground truth • Allows rational assessment of variant pipelines and improvement of algorithms • Some issues that need to be dealt with: cell line artifacts, CNVs, systematic errors, SVs.
  • 23. rtgTools v1.0 A toolkit to compare and analyze VCFs • • • • • • • vcfeval – comparison of VCFs for ROC curves rocplot – draw ROC curves from vcfeval output medelian – counts of Mendelian inheritance errors in pedigrees vcfstats – basic statistics of VCF files vcffilter – filtering of VCFs by scores, etc. vcfannotate – annotation of VCF files vcfmerge – merge VCF files Java compiled code freely available at GiaB repository: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/tools/RTG/
  • 25. Acknowledgements RTG, Hamilton, New Zealand  John Cleary  Ross Braithwaite  Len Trigg RTG, San Bruno, CA  Sahar Malakshah  Minita Shah Michael Eberle, Illumina, Inc. – Platinum Project data Complete Genomics, Inc. – CEPH pedigree data Justin Zook – NIST Data and tools to compare with phased standard released publicly at NIST Genome-in-a-Bottle repository (s3://giab) This work was done while the presenter was employed by Real Time Genomics Inc., San Bruno, CA. © 2014 Real Time Genomics, Inc. All rights reserved.

Notes de l'éditeur

  1. The lengths of the female and male genetic maps are 1,817 cM and 1,386 cM, respectively