SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
TOWARDS PRECISION MEDICINE:
a cloud-based application for analysis of personal genomes
Reid J. Robison, MD MBA
December 6th, 2013
THE FALLING COST
of sequencing the human genome

$3 BILLION

$2000
THE SEQUENCING EXPLOSION
PACE OF DISCOVERY
of novel rare-disease-causing genes
using whole-exome sequencing
140

105

70

35

2009

2010

2011

2012

Boycott et al. Rare-disease genetics in the era of next-generation sequencing:
discovery to translation. Nature Reviews Genetics 14, 681–691 (2013)
RATE OF APPROVAL
of rare disease drug products
160

120

80

40

2000

2005

2010

2015

2020

2025

Extrapolation from January 2013 Orphanet Report Series

2030
“We are on the tipping point of a whole
new game in how we develop drugs.”
Janet Woodcock, M.D.
Director, Center for Drug Evaluation and Research, FDA
GENE-FINDING
Patient w/ unknown disease
Next-gen sequencing
Data processing
Variant calling
3 million SNVs
0.5 million indels
1000 SVs

????????????????????????????????
STEPWISE REDUCTION

ANNOVAR
450 citations
>40,000 downloads

Wang K et al. ANNOVAR: Functional annotation of genetic variants from
next-generation sequencing data Nucleic Acids Research, 38:e164, 2010
①

25523 variants

ADHD

Only non-synonymous or frameshift

6423 variants
Conserved variants from 44-species alignment

and anemia?

2935 variants
Remove variants in segmental duplication regions

2652 variants
Remove variants with MAF>1%

421 variants
Apply recessive model

17 genes
Remove “dispensable” genes

10 genes

Literature survey identifies
PKLR as candidate gene
(confirmed with biochemical assay)
Genome browser shot of the PKLR gene and the location
of the two causal mutations. Each of the two mutations sits
within an evolutionarily conserved region, and has been
reported once in patients affected with PKLR deficiency.
3910156 variants
Keep only exonic/splicing variants

21488 variants
Remove synonymous & non-synonymous
frameshift variants

10380 variants
Remove variants in 1000 genomes project

1146 variants
Remove variants in ESP6500 database

935 variants
Remove variants in dbSNP135

582 variants
Keep only genes with multiple variants

52 genes

!
②
BOOKMAN

syndrome
52 genes
Remove psuedogenes & questionable calls
Remove olfactory receptor genes
Sanger sequencing validation

!

2 genes left:
TAF1L
RBCK1!
RanBP-type and C3HC4-type zinc finger containing 1
(Mutation results in splicing error)
OGDEN SYNDROME
Clinical Features

Two Families
3441 variants on X chromosome
Keep only heterozygous

2381 variants
Keep only Stop/NonSyn/FS/Splice

!
③
OGDEN

syndrome

136 variants
Remove variants in dbSNP

40 variants
Remove variants in ClinSeq

40 variants
Keep only variants in shared haplotype

1 variant
NAA10
(Encodes the catalytic subunit of the major human N-terminal acetyltransferase)
VARIANT
annotation & interpretation

•
•
•
•

>60 annotation types (SIFT, PolyPhen, Allele Freq, HGMD…)
User-driven filtering for step-wise reduction. Run gene panels.
Robust, scalable, secure. Run case-control & family-based analyses
Machine-learning algorithms to generate TUTE score for prioritization
of variants
THE TUTE SCORE
using machine-learning to prioritize disease genes

① Select a set of functional prediction
scores for which coding and non-coding
variants can be assigned into
② Built SVM prediction models using
SVMsensus
③ Identify the optimal hyperplane for the
biggest margin between training points
for neutral and deleterious variants &
genes
④ Test & refine prediction model using
known disease variants from UniProt &
synthetic data sets
COMING SOON:

more accurate
variant calling
Toward  more  accurate  variant  calling  for  “personal  genomes”
Jason  O’Rawe1,2, Tao Jiang3, Guangqing Sun3, Yiyang Wu1,2, Wei Wang4, Jingchu Hu3, Paul Bodily5, Lifeng Tian6,
Hakon Hakonarson6, W. Evan Johnson7, Reid J. Robison9, Zhi Wei4, Kai Wang8,9, Gholson J. Lyon1,2,9
1) Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, NY, USA; 2) Stony Brook University, Stony Brook, NY, USA; 3) BGI-Shenzhen, Shenzhen, China; 4) Department of Computer Science, New Jersey Institute of Technology,
Newark, NJ, USA; 5) Department of Computer Science, Brigham Young University, Provo, UT, USA; 6) Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA; 7) Department of Medicine, Boston University School of
Medicine, Boston MA, USA; 8) Zilkha Neurogenetic Institute, Department of Psychiatry and Preventive Medicine, University of Southern California, Los Angeles, CA, USA; 9) Utah Foundation for Biomedical Research, Salt Lake City, UT, USA.

Background

Methods

Results

To facilitate the clinical implementation of genomic medicine by nextgeneration sequencing, it will be critically important to obtain accurate and
consistent variant calls on personal genomes. Multiple software tools for
variant calling are available, but it is unclear how comparable these tools are
or what their relative merits in real-world scenarios might be. Under
conditions where “perfect” pipeline parameterization is un-attainable,
researchers and clinicians stand to benefit from a greater understanding of the
variability introduced into human genetic variation discovery when utilizing
many different bioinformatics pipelines or different sequencing platforms.

We sequenced 15 exomes from four families using the Illumina HiSeq 2000 platform and
Agilent SureSelect v.2 capture kit, with ~120X coverage on average. We analyzed the raw
data using near-default parameters with 5 different alignment and variant calling pipelines
(SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMTools). We additionally
sequenced a single whole genome using the Complete Genomics (CG) sequencing and
analysis pipeline (v2.0), with 95% of the exome region being covered by 20 or more reads
per base. Finally, we attempted to validate 919 SNVs and 841 indels, including similar
fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon
sequencing with ~5000X average coverage.

SNV concordance between five Illumina pipelines across all 15 exomes is 57.4%, while 0.5-5.1%
variants were called as unique to each pipeline. Indel concordance is only 26.8% between three
indel calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20
base pairs. 2085 CG v2.0 variants that fall within targeted regions in exome sequencing were not
called by any of the Illumina-based exome analysis pipelines, likely due to poor capture efficiency
in those regions. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2% and
99.1% of the GATK(v.15)-only, SOAPsnp(v1.03)-only and shared SNVs can be validated, yet
54.0%, 44.6% and 78.1% of the GATK-only, SOAP-only and shared indels can be validated.

• All Illumina exomes have at least 20
reads or more per base pair in >80% or
more of the 44 MB target region.
• Concordance rates with common SNPs
genotyped on Illumina 610K genotyping
chips were calculated.

• Sensitivities and specificities were calculated for each pipeline
using the Illumina 610k genotyping chips as a golden standard.

A) SNV concordance was measured between all SNV calls made by the five
illumina data pipelines. Overall concordance is low: 57.4%.

• All pipelines show relatively high sensitivity and specificity
when detecting known and common SNPS.

B) SNV concordance is higher for already described variation (present in
dbSNP135).

• Specificity generally increases for sets of variants detected by
more than a single pipeline.

C) SNV concordance is lower for novel, un-described, human genetic
variation (absent in dbSNP135).

• All pipelines are very good with
identifying already known, common
SNPs.

Sensitivity

Known SNPs

Mother-1 SOAPsnp
GATK
SNVer
GNUMAP
SAMTools
Son-1
SOAPsnp
GATK
SNVer
GNUMAP
SAMTools
Son-2
SOAPsnp
GATK
SNVer
GNUMAP
SAMTools
Father-1 SOAPsnp
GATK
SNVer
GNUMAP
SAMTools

Concordance Concordance
Sites
rate
6088
6074
99.77%
6249
6224
99.60%
5723
5708
99.74%
5458
5434
99.56%
5885
5848
99.37%
6366
6353
99.80%
6341
6323
99.72%
6255
6239
99.74%
5850
5828
99.62%
6383
6362
99.67%
6412
6401
99.83%
6426
6413
99.80%
6336
6325
99.83%
5906
5889
99.71%
6477
6450
99.58%
6247
6238
99.86%
6304
6288
99.75%
6205
6192
99.79%
5805
5786
99.67%
6344
6327
99.73%

Compared Sites

C

Novel SNPs

Mean*

Software

B

A
Specificity
SD

Mean*

SD

#Total

#cSNP

Ti/Tv

#Total

#cSNP

Ti/Tv

SOAPsnp
Sample

The similarity between SNV and indel calls made
between two versions of GATK, v1.5 and v2.3-9, was
measured. SNV and indel calls were made using both
the UnifiedGenotyper and HaplotypeCaller modules
on the same k8101-49685 participant sample.

99.82

0.039

94.53

2.287

30,022

17,409

2.77

875

419

1.94

GATK

99.72

0.085

95.33

1.161

29,620

17,306

2.8

365

206

2.34

SNVer

99.78

0.044

92.32

4.339

28,242

17,111

2.85

490

253

2.52

GNUMAP

99.64

0.065

86.67

3.286

24,893

15,144

3.03

1,091

659

1.28

SAMTools

99.59

0.158

94.45

4.221

29,577

17,449

2.78

949

539

1.33

ANY pipeline

99.62

0.113

97.72

1.215

33,947

19,638

2.68

2,163

1,182

1.23

>=2 pipelines

99.69

0.074

96.68

2.298

31,099

18,108

2.77

639

323

2.17

>=3 pipelines

99.73

0.045

95.65

3.143

29,363

17,257

2.84

416

230

2.56

>=4 pipelines

99.82

0.041

92.63

3.412

26,772

16,097

2.91

318

193

2.67

5 pipelines

99.87

0.015

80.61

5.266

21,174

13,320

3.12

234

149

2.83

• SNP concordance between the illumina data calls and the Complete Genomics v2.0 data calls was
calculated for a single sample, “k8101-49685”.

• Indel concordance between the three indel calling Illumina data
pipelines (A) is low, 26.8%.

• MiSeq validation was performed on a combination of SNPs and indels chosen
(1756 in total) from sequencing data from the sample “k8101-49685”.

• There are 2085 SNVs that Complete Genomics v2.0 detected but are not detected by any of the
five Illumina data pipelines, despite high mappability among these variants.

• Concordance is much better for known indels (B), and
conversely much lower for novel, unknown, indels (C) (as
defined by presence or absence in dbSNP135).

• SNVs that were uniquely called by the SOAP-SNP v.1.03/Soap indel v2.01
and GATK v1.5 pipeline validated relatively well, with the SNVs called by
both pipelines being better validated.

A

• Indels validated poorly for both unique to GATK(v.1.5) and SOAPindel
(v2.01) calls. Overlapping indel calls validated better, though still relatively
poorly.
#variantcallingproblems
•
•
•
•
•
•
•
•

!

15 exomes from 4 families
1 whole genome from Complete Genomics
Illumina HiSeq platform and Agilent SureSelect capture kit
120X mean coverage
Five NGS alignment+variant calling pipelines are tested (SOAP,
BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools)
Illumina 610k SNP array used as gold standard
~60% SNVs are called by all five pipelines
0.5 to 5.1% of variants were called as unique to each pipeline.
•
•
•
•

All pipelines are ‘good’ with known, common SNPs
Specificity increases for variants detected by more than one
pipeline
Indel concordance between all 3 platforms was very low
(26.8%)
Complete Genomics picked up >2000 variants that weren’t
seen on Illumina, despite high mappability in these regions
Specificity

Sensitivity

Known SNPs

Novel SNPs

Mean*

SD

Mean*

SD

#Total

#cSNP Ti/Tv

#Total

#cSNP Ti/Tv

SOAPsnp

99.82

0.039

94.53

2.287

30,022

17,409

2.77

875

419

1.94

GATK1.5

99.72

0.085

95.33

1.161

29,620

17,306

2.8

365

206

2.34

SNVer

99.78

0.044

92.32

4.339

28,242

17,111

2.85

490

253

2.52

GNUMAP

99.64

0.065

86.67

3.286

24,893

15,144

3.03

1,091

659

1.28

SAMTools

99.59

0.158

94.45

4.221

29,577

17,449

2.78

949

539

1.33

ANY
pipeline

99.62

0.113

97.72

1.215

33,947

19,638

2.68

2,163

1,182

1.23

>=2 pipelines 99.69

0.074

96.68

2.298

31,099

18,108

2.77

639

323

2.17

>=3 pipelines

99.73

0.045

95.65

3.143

29,363

17,257

2.84

416

230

2.56

>=4 pipelines

99.82

0.041

92.63

3.412

26,772

16,097

2.91

318

193

2.67

5 pipelines

99.87

0.015

80.61

5.266

21,174

13,320

3.12

234

149

2.83
VARIANT CALLING CONCLUSIONS
① Significant discrepancy among all pipelines when applied to
the same Illumina datasets
② There exists a set of robust calls that are shared among all
pipelines even under lax parameters (although false negative
rate is high)

To get an accurate genome, you
need to run multiple algorithms.
A results portal that lets doctors, labs & researchers
give their patients access to important genetic findings
Reid J. Robison, MD MBA

reid@tutegenomics.com

Contenu connexe

Tendances

Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
GenomeInABottle
 
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Thermo Fisher Scientific
 
A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers
Thermo Fisher Scientific
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
Dr. Gerry Higgins
 
The trivial case of the missing heritability
The trivial case of the missing heritabilityThe trivial case of the missing heritability
The trivial case of the missing heritability
Max Moldovan
 

Tendances (20)

The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population Variability
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
 
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
 
Analysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNAAnalysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNA
 
A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers
 
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
 
Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2
Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2
Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2
 
Clinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceClinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidance
 
Multigenic (mechanistic) biomarkers
Multigenic (mechanistic) biomarkersMultigenic (mechanistic) biomarkers
Multigenic (mechanistic) biomarkers
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
 
Next Generation Sequencing application in virology
Next Generation Sequencing application in virologyNext Generation Sequencing application in virology
Next Generation Sequencing application in virology
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
 
Development of a next-generation (NGS) assay for pediatric, childhood, and yo...
Development of a next-generation (NGS) assay for pediatric, childhood, and yo...Development of a next-generation (NGS) assay for pediatric, childhood, and yo...
Development of a next-generation (NGS) assay for pediatric, childhood, and yo...
 
RNA-based screening in drug discovery – introducing sgRNA technologies
RNA-based screening in drug discovery – introducing sgRNA technologiesRNA-based screening in drug discovery – introducing sgRNA technologies
RNA-based screening in drug discovery – introducing sgRNA technologies
 
Use of Methylation Markers for Age Estimation of an unknown Individual based ...
Use of Methylation Markers for Age Estimation of an unknown Individual based ...Use of Methylation Markers for Age Estimation of an unknown Individual based ...
Use of Methylation Markers for Age Estimation of an unknown Individual based ...
 
The trivial case of the missing heritability
The trivial case of the missing heritabilityThe trivial case of the missing heritability
The trivial case of the missing heritability
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
 

En vedette

En vedette (7)

Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 

Similaire à Towards Precision Medicine: Tute Genomics, a cloud-based application for analysis of personal genomes

Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
GenomeInABottle
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
Ulises Urzua
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
Long Pei
 

Similaire à Towards Precision Medicine: Tute Genomics, a cloud-based application for analysis of personal genomes (20)

2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochure
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Creating custom gene panels for next-generation sequencing: optimization of 5...
Creating custom gene panels for next-generation sequencing: optimization of 5...Creating custom gene panels for next-generation sequencing: optimization of 5...
Creating custom gene panels for next-generation sequencing: optimization of 5...
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
 
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
 
ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Ngs pgd
Ngs pgdNgs pgd
Ngs pgd
 
Ngs pgd
Ngs pgdNgs pgd
Ngs pgd
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
 

Dernier

Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan 087776558899
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
Sheetaleventcompany
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
amritaverma53
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
Sheetaleventcompany
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Sheetaleventcompany
 
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Sheetaleventcompany
 
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Sheetaleventcompany
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Sheetaleventcompany
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Sheetaleventcompany
 

Dernier (20)

Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryCall 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
 
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
 
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
 
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
 
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
 
tongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacytongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacy
 
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
Low Cost Call Girls Bangalore {9179660964} ❤️VVIP NISHA Call Girls in Bangalo...
 
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
Gorgeous Call Girls Dehradun {8854095900} ❤️VVIP ROCKY Call Girls in Dehradun...
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
 
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
 
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
 

Towards Precision Medicine: Tute Genomics, a cloud-based application for analysis of personal genomes

  • 1. TOWARDS PRECISION MEDICINE: a cloud-based application for analysis of personal genomes Reid J. Robison, MD MBA December 6th, 2013
  • 2. THE FALLING COST of sequencing the human genome $3 BILLION $2000
  • 4. PACE OF DISCOVERY of novel rare-disease-causing genes using whole-exome sequencing 140 105 70 35 2009 2010 2011 2012 Boycott et al. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nature Reviews Genetics 14, 681–691 (2013)
  • 5. RATE OF APPROVAL of rare disease drug products 160 120 80 40 2000 2005 2010 2015 2020 2025 Extrapolation from January 2013 Orphanet Report Series 2030
  • 6. “We are on the tipping point of a whole new game in how we develop drugs.” Janet Woodcock, M.D. Director, Center for Drug Evaluation and Research, FDA
  • 8. Patient w/ unknown disease Next-gen sequencing Data processing Variant calling 3 million SNVs 0.5 million indels 1000 SVs ????????????????????????????????
  • 9. STEPWISE REDUCTION ANNOVAR 450 citations >40,000 downloads Wang K et al. ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data Nucleic Acids Research, 38:e164, 2010
  • 10. ① 25523 variants ADHD Only non-synonymous or frameshift 6423 variants Conserved variants from 44-species alignment and anemia? 2935 variants Remove variants in segmental duplication regions 2652 variants Remove variants with MAF>1% 421 variants Apply recessive model 17 genes Remove “dispensable” genes 10 genes Literature survey identifies PKLR as candidate gene (confirmed with biochemical assay)
  • 11. Genome browser shot of the PKLR gene and the location of the two causal mutations. Each of the two mutations sits within an evolutionarily conserved region, and has been reported once in patients affected with PKLR deficiency.
  • 12.
  • 13. 3910156 variants Keep only exonic/splicing variants 21488 variants Remove synonymous & non-synonymous frameshift variants 10380 variants Remove variants in 1000 genomes project 1146 variants Remove variants in ESP6500 database 935 variants Remove variants in dbSNP135 582 variants Keep only genes with multiple variants 52 genes ! ② BOOKMAN syndrome
  • 14. 52 genes Remove psuedogenes & questionable calls Remove olfactory receptor genes Sanger sequencing validation ! 2 genes left: TAF1L RBCK1! RanBP-type and C3HC4-type zinc finger containing 1 (Mutation results in splicing error)
  • 16. 3441 variants on X chromosome Keep only heterozygous 2381 variants Keep only Stop/NonSyn/FS/Splice ! ③ OGDEN syndrome 136 variants Remove variants in dbSNP 40 variants Remove variants in ClinSeq 40 variants Keep only variants in shared haplotype 1 variant NAA10 (Encodes the catalytic subunit of the major human N-terminal acetyltransferase)
  • 17.
  • 18. VARIANT annotation & interpretation • • • • >60 annotation types (SIFT, PolyPhen, Allele Freq, HGMD…) User-driven filtering for step-wise reduction. Run gene panels. Robust, scalable, secure. Run case-control & family-based analyses Machine-learning algorithms to generate TUTE score for prioritization of variants
  • 19. THE TUTE SCORE using machine-learning to prioritize disease genes ① Select a set of functional prediction scores for which coding and non-coding variants can be assigned into ② Built SVM prediction models using SVMsensus ③ Identify the optimal hyperplane for the biggest margin between training points for neutral and deleterious variants & genes ④ Test & refine prediction model using known disease variants from UniProt & synthetic data sets
  • 21. Toward  more  accurate  variant  calling  for  “personal  genomes” Jason  O’Rawe1,2, Tao Jiang3, Guangqing Sun3, Yiyang Wu1,2, Wei Wang4, Jingchu Hu3, Paul Bodily5, Lifeng Tian6, Hakon Hakonarson6, W. Evan Johnson7, Reid J. Robison9, Zhi Wei4, Kai Wang8,9, Gholson J. Lyon1,2,9 1) Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, NY, USA; 2) Stony Brook University, Stony Brook, NY, USA; 3) BGI-Shenzhen, Shenzhen, China; 4) Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA; 5) Department of Computer Science, Brigham Young University, Provo, UT, USA; 6) Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA; 7) Department of Medicine, Boston University School of Medicine, Boston MA, USA; 8) Zilkha Neurogenetic Institute, Department of Psychiatry and Preventive Medicine, University of Southern California, Los Angeles, CA, USA; 9) Utah Foundation for Biomedical Research, Salt Lake City, UT, USA. Background Methods Results To facilitate the clinical implementation of genomic medicine by nextgeneration sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be. Under conditions where “perfect” pipeline parameterization is un-attainable, researchers and clinicians stand to benefit from a greater understanding of the variability introduced into human genetic variation discovery when utilizing many different bioinformatics pipelines or different sequencing platforms. We sequenced 15 exomes from four families using the Illumina HiSeq 2000 platform and Agilent SureSelect v.2 capture kit, with ~120X coverage on average. We analyzed the raw data using near-default parameters with 5 different alignment and variant calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMTools). We additionally sequenced a single whole genome using the Complete Genomics (CG) sequencing and analysis pipeline (v2.0), with 95% of the exome region being covered by 20 or more reads per base. Finally, we attempted to validate 919 SNVs and 841 indels, including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with ~5000X average coverage. SNV concordance between five Illumina pipelines across all 15 exomes is 57.4%, while 0.5-5.1% variants were called as unique to each pipeline. Indel concordance is only 26.8% between three indel calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. 2085 CG v2.0 variants that fall within targeted regions in exome sequencing were not called by any of the Illumina-based exome analysis pipelines, likely due to poor capture efficiency in those regions. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2% and 99.1% of the GATK(v.15)-only, SOAPsnp(v1.03)-only and shared SNVs can be validated, yet 54.0%, 44.6% and 78.1% of the GATK-only, SOAP-only and shared indels can be validated. • All Illumina exomes have at least 20 reads or more per base pair in >80% or more of the 44 MB target region. • Concordance rates with common SNPs genotyped on Illumina 610K genotyping chips were calculated. • Sensitivities and specificities were calculated for each pipeline using the Illumina 610k genotyping chips as a golden standard. A) SNV concordance was measured between all SNV calls made by the five illumina data pipelines. Overall concordance is low: 57.4%. • All pipelines show relatively high sensitivity and specificity when detecting known and common SNPS. B) SNV concordance is higher for already described variation (present in dbSNP135). • Specificity generally increases for sets of variants detected by more than a single pipeline. C) SNV concordance is lower for novel, un-described, human genetic variation (absent in dbSNP135). • All pipelines are very good with identifying already known, common SNPs. Sensitivity Known SNPs Mother-1 SOAPsnp GATK SNVer GNUMAP SAMTools Son-1 SOAPsnp GATK SNVer GNUMAP SAMTools Son-2 SOAPsnp GATK SNVer GNUMAP SAMTools Father-1 SOAPsnp GATK SNVer GNUMAP SAMTools Concordance Concordance Sites rate 6088 6074 99.77% 6249 6224 99.60% 5723 5708 99.74% 5458 5434 99.56% 5885 5848 99.37% 6366 6353 99.80% 6341 6323 99.72% 6255 6239 99.74% 5850 5828 99.62% 6383 6362 99.67% 6412 6401 99.83% 6426 6413 99.80% 6336 6325 99.83% 5906 5889 99.71% 6477 6450 99.58% 6247 6238 99.86% 6304 6288 99.75% 6205 6192 99.79% 5805 5786 99.67% 6344 6327 99.73% Compared Sites C Novel SNPs Mean* Software B A Specificity SD Mean* SD #Total #cSNP Ti/Tv #Total #cSNP Ti/Tv SOAPsnp Sample The similarity between SNV and indel calls made between two versions of GATK, v1.5 and v2.3-9, was measured. SNV and indel calls were made using both the UnifiedGenotyper and HaplotypeCaller modules on the same k8101-49685 participant sample. 99.82 0.039 94.53 2.287 30,022 17,409 2.77 875 419 1.94 GATK 99.72 0.085 95.33 1.161 29,620 17,306 2.8 365 206 2.34 SNVer 99.78 0.044 92.32 4.339 28,242 17,111 2.85 490 253 2.52 GNUMAP 99.64 0.065 86.67 3.286 24,893 15,144 3.03 1,091 659 1.28 SAMTools 99.59 0.158 94.45 4.221 29,577 17,449 2.78 949 539 1.33 ANY pipeline 99.62 0.113 97.72 1.215 33,947 19,638 2.68 2,163 1,182 1.23 >=2 pipelines 99.69 0.074 96.68 2.298 31,099 18,108 2.77 639 323 2.17 >=3 pipelines 99.73 0.045 95.65 3.143 29,363 17,257 2.84 416 230 2.56 >=4 pipelines 99.82 0.041 92.63 3.412 26,772 16,097 2.91 318 193 2.67 5 pipelines 99.87 0.015 80.61 5.266 21,174 13,320 3.12 234 149 2.83 • SNP concordance between the illumina data calls and the Complete Genomics v2.0 data calls was calculated for a single sample, “k8101-49685”. • Indel concordance between the three indel calling Illumina data pipelines (A) is low, 26.8%. • MiSeq validation was performed on a combination of SNPs and indels chosen (1756 in total) from sequencing data from the sample “k8101-49685”. • There are 2085 SNVs that Complete Genomics v2.0 detected but are not detected by any of the five Illumina data pipelines, despite high mappability among these variants. • Concordance is much better for known indels (B), and conversely much lower for novel, unknown, indels (C) (as defined by presence or absence in dbSNP135). • SNVs that were uniquely called by the SOAP-SNP v.1.03/Soap indel v2.01 and GATK v1.5 pipeline validated relatively well, with the SNVs called by both pipelines being better validated. A • Indels validated poorly for both unique to GATK(v.1.5) and SOAPindel (v2.01) calls. Overlapping indel calls validated better, though still relatively poorly.
  • 22. #variantcallingproblems • • • • • • • • ! 15 exomes from 4 families 1 whole genome from Complete Genomics Illumina HiSeq platform and Agilent SureSelect capture kit 120X mean coverage Five NGS alignment+variant calling pipelines are tested (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools) Illumina 610k SNP array used as gold standard ~60% SNVs are called by all five pipelines 0.5 to 5.1% of variants were called as unique to each pipeline.
  • 23. • • • • All pipelines are ‘good’ with known, common SNPs Specificity increases for variants detected by more than one pipeline Indel concordance between all 3 platforms was very low (26.8%) Complete Genomics picked up >2000 variants that weren’t seen on Illumina, despite high mappability in these regions Specificity Sensitivity Known SNPs Novel SNPs Mean* SD Mean* SD #Total #cSNP Ti/Tv #Total #cSNP Ti/Tv SOAPsnp 99.82 0.039 94.53 2.287 30,022 17,409 2.77 875 419 1.94 GATK1.5 99.72 0.085 95.33 1.161 29,620 17,306 2.8 365 206 2.34 SNVer 99.78 0.044 92.32 4.339 28,242 17,111 2.85 490 253 2.52 GNUMAP 99.64 0.065 86.67 3.286 24,893 15,144 3.03 1,091 659 1.28 SAMTools 99.59 0.158 94.45 4.221 29,577 17,449 2.78 949 539 1.33 ANY pipeline 99.62 0.113 97.72 1.215 33,947 19,638 2.68 2,163 1,182 1.23 >=2 pipelines 99.69 0.074 96.68 2.298 31,099 18,108 2.77 639 323 2.17 >=3 pipelines 99.73 0.045 95.65 3.143 29,363 17,257 2.84 416 230 2.56 >=4 pipelines 99.82 0.041 92.63 3.412 26,772 16,097 2.91 318 193 2.67 5 pipelines 99.87 0.015 80.61 5.266 21,174 13,320 3.12 234 149 2.83
  • 24. VARIANT CALLING CONCLUSIONS ① Significant discrepancy among all pipelines when applied to the same Illumina datasets ② There exists a set of robust calls that are shared among all pipelines even under lax parameters (although false negative rate is high) To get an accurate genome, you need to run multiple algorithms.
  • 25.
  • 26. A results portal that lets doctors, labs & researchers give their patients access to important genetic findings
  • 27. Reid J. Robison, MD MBA reid@tutegenomics.com