3. ®
3
What’s in the Talk
• GIAB in PrecisionFDA
• Datasets on DNAnexus
• Example 1: Comparing mapper+variant caller combination
• Example 2: Assessing structural variation in AJ-Trio
11. ®
11
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
BWA is used for mapping sequences against a large reference genomes, such as the
human genome. It preforms very well for low divergent sequences or reads.
Bowtie2 is a memory efficient tool for aligning sequencing reads to long reference
sequences. It performs extremely well for sequences length between 50 bp and
1000.
ISAAC, developed by Illumina, is a set of DNA sequence aligner and variant caller
that uses high memory hardware to improve efficiency and accuracy.
SNAP is a relatively new aligner as accurate as existing tools like BWA-mem,
Bowtie2 and Novoalign. SNAP was developed by a team from the UC Berkeley AMP
Lab, Microsoft, and UCSF.
Mappers
12. ®
12
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Atlas is a variant caller that is known differentiating between the genuine SNPs and
indels from sequencing and mapping errors. It is mainly used for whole exome
data.
FreeBayes is a haplotype-based Bayesian genetic variant caller designed to find
small polymorphisms, specifically SNPs, indels, MNPs and complex events smaller
than the length of a short-read sequencing alignment.
GATK Haplotype Caller is one of the most popular variant caller. It calls SNPs and
Indels simultaneously using local de novo assembly and a Bayesian statistical
model.
ISAAC, developed by Illumina, is a set of DNA sequence aligner and variant caller
that uses high memory hardware to improve efficiency and accuracy.
Platypus is an efficient variant detection tool, that can detect SNPs, MNPs, short
indels and replacements up to several kb.
Variant Callers
14. ®
14
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.82350 0.94507 0.95825 0.94282 0.89522
BWA 0.97194 0.94550 0.98176 0.93319 0.91955
ISAAC 0.88343 0.93066 0.96390 0.95965 0.90659
SNAP 0.86781 0.93111 0.97531 0.96337 0.91221
0.80000
0.82000
0.84000
0.86000
0.88000
0.90000
0.92000
0.94000
0.96000
0.98000
1.00000
Percentage
SENSITIVITY
SNPs
15. ®
15
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
SNPs
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.99333 0.98662 0.98946 0.98716 0.98762
BWA 0.98641 0.99229 0.98857 0.93550 0.99093
ISAAC 0.99296 0.99244 0.98332 0.98800 0.99128
SNAP 0.96294 0.98893 0.97712 0.97778 0.99114
0.93000
0.94000
0.95000
0.96000
0.97000
0.98000
0.99000
1.00000Percentage
SPECIFICITY
16. ®
16
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
SNPs
Bowtie BWA ISAAC SNAP
Sensitivity 0.91297 0.95039 0.92885 0.93980
Precision 0.98884 0.97874 0.98960 0.97958
0.89000
0.91000
0.93000
0.95000
0.97000
0.99000
1.01000Percentage
AVERAGE Sensitivity and Specificity By Mappers
17. ®
17
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Indels
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.49795 0.78538 0.81924 0.81839 0.85304
BWA 0.76214 0.83467 0.79286 0.78319 0.87780
ISAAC 0.50326 0.73087 0.74374 0.78490 0.83327
SNAP 0.39509 0.71881 0.64430 0.09749 0.68149
0.00000
0.10000
0.20000
0.30000
0.40000
0.50000
0.60000
0.70000
0.80000
0.90000
1.00000
AxisTitle
SENSITIVITY
18. ®
18
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Indels
Atlas Freebayes GATK ISAAC Platypus
Bowrie2 0.78217 0.72941 0.62806 0.81787 0.65759
BWA 0.73470 0.85447 0.73998 0.69092 0.67424
ISAAC 0.62956 0.72843 0.85435 0.84755 0.66588
SNAP 0.57962 0.70933 0.64279 0.07105 0.36763
0.00000
0.10000
0.20000
0.30000
0.40000
0.50000
0.60000
0.70000
0.80000
0.90000
Percentage SPECIFICITY
19. ®
19
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
Bowtie BWA ISAAC SNAP
CPU-hours 308.3 236 94.4 102.7
0
100
200
300
400
CPU-hour
Mappers CPU-hours
Atlas Freebayes GATK ISAAC Platypus
CPU-hours 270.6 60.8 436.4 37.9 10.9
0
100
200
300
400
500
CPU-hour
Variants Callers CPU-hours
20. ®
20
Benchmarking well know bioinformatics aligners and
variant callers using the Pilot Genome (NA12878)
A. B. Diallo, A. Carroll, B. Hannigan, M. Kinsella, S. Ma, N. Thangaraj
DNAnexus, Mountain View, CA. –email: adiallo@dnanexus.com
22. 22
Baylor College of Medicine
Characterizing large genomic variants is essential to expanding the
research & clinical applications of genome sequencing.
Adam
English
Will
Salerno
Narayanan
Veeraraghavan
Singer
Ma
Andrew
Carroll
26. 26
GIAB Inheritance Benhmarks
DNAnexus is working actively with Genome in a Bottle to help
develop high quality benchmark datasets for structural variations
in the Ashkenazi Jewish Trio, applying Parliament alongside to
combine Illumina and PacBio alongside other techniques.