SlideShare une entreprise Scribd logo
1  sur  36
Avoiding Nonsense Results 
in your NGS Variant Studies 
James Lyons-Weiler, PhD 
Scientific Director/ 
Senior Research Scientist 
Bioinformatics Analysis Core 
Genomics & Proteomics Core Laboratories 
University of Pittsburgh 
Pittsburgh, PA 
May 1, 2014
Two Parts 
• Identifying sites with low genotypic signal 
increases concordance among variant callers 
• Hazards in finding differentially expressed 
genes in RNASeq – how to do it more robustly.
23andMe: High risk of RA and psiriosis 
GTL: Low risk of RA and psiriosis
NYTimes Article, etc.
Data were from Illumina hi-seq 2000
Among method average 
Concordance 
57.5% overall; 
32.7% at high coverage 
O’Rawe et al.
Information Theory 
Consensus Analysis 
e.g.,2/3, ¾, set analysis 
(-> modeling) 
Improve Callers 
(fix errors, modeling) Bake Offs 
LOW CONCORDANCE (O’Rawe et al., 2013) 
VARIANT CALLERS 
MAPPER 
SEQUENCER 
TRUTH (BIOLOGICAL MOLECULAR SEQUENCE) 
Simulations 
Spiked Ins
Entropy of Base Distributions 
A T C G A T C G A T C G 
Low entropy 
Low entropy 
High entropy 
High enthalpy 
High enthalpy 
Low enthalpy
Boltzmann Entropy 
• s = k ln w (Planck) 
• w = antiln(s/k) 
http://schneider.ncifcrf.gov/images/boltzmann 
/boltzmann-tomb-4.html
Rank Sorted Distribution of w 
(O’Rawe et al. data) 
Heterozygotes w = 2 
Homozygotes w = 1
Example w Density Distribution
w and FBVC 
A T C G w pw Zygosity Genotype 
200 0 0 0 1 0 Homozygote AA 
16 158 13 13 2.102558 0 Homozygote TT 
100 100 0 0 2 0 Heterozygote AT 
58 30 1 111 2.768507 0 Heterozygote AG 
28 80 14 78 3.303636 0 Heterozygote TG 
76 38 29 57 3.758733 0 Heterozygote AG 
33 49 60 58 3.895496 0.0126 Heterzygote? CG? 
50 50 50 50 4 1 noise unknown
Operational* 
Equiprobable Null Distribution 
{f(A) = f(T) = f(G) = f(C)}
Convergence 
of significance (pw)
What We Expect 
INCREASED CONCORDANCE 
Genotypic Signal Filtering 
VARIANT/BASE CALLERS 
MAPPER 
SEQUENCER 
TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)
Phom Function
gatk 
From the O’Rawe et al. generated results 
FBVC = frequency-based variant caller (Lyons-Weiler et al.) 
Concordance 
w/ FBVC Hom Het 
ALL 0.5762 11868 17670 
pw<=0.05 0.9976 11282 5676 
pw>0.05 0.0074 586 11994 
samtools 
ALL 0.5649 11541 18799 
pw<=0.05 0.9917 11489 5761 
pw>0.05 0.0002 52 13038 
snver 
ALL 0.6006 11904 16729 
pw<=0.05 0.9934 11812 5470 
pw>0.05 0.0007 92 11259
Signal Tx %Concordance 
FBVC_vs_FBVC Marked ALL 85.64 
pw<=0.05 91.08 
pw>0.05 35.66 
FBVC_vs_FBVC Realigned ALL 83.82 
pw<=0.05 91.69 
pw>0.05 28.21 
FBVC_vs_FBVC Recalibrated ALL 93.14 
pw<=0.05 ***99.39 
pw>0.05 48.53 
FBVC_vs_FBVC Reduced ALL 21.54 
pw<=0.05 24.57 
pw>0.05 4.25 
FBVC_vs_FBVC Marked-Realigned ALL 76.91 
pw<=0.05 86.11 
pw>0.05 15.44 
FBVC_vs_FBVC Marked-Realigned-Recalibrated ALL 76.73 
pw<=0.05 85.99 
pw>0.05 15.34 
FBVC_vs_FBVC Marked-Realigned-Recalibrated-Reduced ALL 19.98 
pw<=0.05 22.9 
pw>0.05 2.66
Information Theory 
Consensus Analysis 
e.g.,2/3, ¾, set analysis 
(-> modeling) 
Improve Callers 
(fix errors, modeling) Bake Offs 
LOW CONCORDANCE (O’Rawe et al., 2013) 
VARIANT CALLERS 
MAPPER 
SEQUENCER 
TRUTH (BIOLOGICAL MOLECULAR SEQUENCE) 
Simulations 
Spiked Ins
Lifescope reads (read) 
Shrimp2 reads (blue) 
Mappers must be systematically evaluated
Part 2: Good and Bad News for 
RNASeq (and everything else): 
The Bad News: 
Fold Change is Biased. 
The Good News: 
We have identified a much less biased method.
T-test is not appropriate 
for small N, large P data 
(such as RNASeq)
Fold Change > 2.0 
Delta > 25
FC(A/B) is Blind to Large Portions 
of Your Data 
FC(A/B) 
Delta 
(and J5: Patel & Lyons-Weiler, 2004)
Ratio are Hard to Interpret as 
Biological Differences 
Gene A B delta (A-B) FC(A/B) 
gene1 5 3 2 1.667 
gene2 50 30 20 1.667 
gene3 500 300 200 1.667 
gene4 5000 3000 2000 1.667 
gene5 50000 30000 20000 1.667
A-B is a difference 
A/B is a quotient.
Log2 Transformation 
Does not Help 
Reveals Minor Delta (&J5) Bias 
Pink = FC(A/B) 
Black = Delta
G-Thresholding J5
FC Bias in 
Amyotrophic Lateral Sclerosis 
350000 
300000 
250000 
200000 
150000 
100000 
50000 
0 
0 50000 100000 150000 200000 
Control 
ALS 
DEGy 
FCDEGy 
Black circles = FC(A/B). Pink = Gthr-J5 genes
FC(A/B) Bias in 
Alchohol-Induced Hepatitis 
Black circles = FC(A/B). Pink = Gthr-J5 genes
Conclusions 
• Not all NGS/HTS sites have sufficient genotypic signal to warrant a 
base call. High coverage alone does not provide a solution. 
• By measuring genotypic signal, we can determine which sites we 
can call with confidence. 
• Fold-change(FC(A/B) is blind to highly expressed genes and should 
be abandoned as a measure of differential expression altogether – 
even for single gene or single protein studies! 
• Published microarray data sets analyzed to date using FC(A/B) only 
are a gold-mine for re-analysis using less biased methods.
Credits and Contact 
• pw, pHom, etc: James Lyons-Weiler, Alan Twaddle, Rahil Sethi. 
– (MS in preparation) 
– Our software is called Gconf (not yet available) 
• Fold-Change Bias: James Lyons-Weiler, Tamanna Sultana, Rick 
Jordan, Rahil Sethi 
– (Paper in review) 
– For now, read 
• Mariani TJ, Budhraja V, Mecham BH, Gu CC, Watson MA, Sadovsky Y. 2003. A 
variable fold change threshold determines significance for expression microarrays. 
FASEB J. 17:321-3. doi: 10.1096/fj.02-0351fje 
• Pearson, K. 1897. On a form of spurious correlation that may arise when indices are 
used for the measurement of organs. Proc Roy Soc Lond 60:489-498 doi: 
10.1098/rspl.1896.0076

Contenu connexe

Tendances

E research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystackE research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystackTom Kelly
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldJoe Parker
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Data Science Thailand
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphismsajal shrivastav
 
Protein-DNA Mapping using an AFM
Protein-DNA Mapping using an AFMProtein-DNA Mapping using an AFM
Protein-DNA Mapping using an AFMAnthony Salvagno
 
2009 JCEM Detection of growth hormone doping by gene expression profiling of ...
2009 JCEM Detection of growth hormone doping by gene expression profiling of ...2009 JCEM Detection of growth hormone doping by gene expression profiling of ...
2009 JCEM Detection of growth hormone doping by gene expression profiling of ...Selina Sutton
 
Optimizing the conjugation of c5 y82f aequorin to the
Optimizing the conjugation of c5 y82f aequorin to theOptimizing the conjugation of c5 y82f aequorin to the
Optimizing the conjugation of c5 y82f aequorin to thehfmontague
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphismBipul Das
 
Single nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahuSingle nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahuKAUSHAL SAHU
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Anilkumar C
 
Dna fingerprinting
Dna fingerprintingDna fingerprinting
Dna fingerprintingmantoshrock
 
Investigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilkInvestigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilkFrank Soto
 
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...shabeel pn
 

Tendances (20)

Forensic
ForensicForensic
Forensic
 
E research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystackE research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystack
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
 
Snp
SnpSnp
Snp
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
 
Protein-DNA Mapping using an AFM
Protein-DNA Mapping using an AFMProtein-DNA Mapping using an AFM
Protein-DNA Mapping using an AFM
 
2009 JCEM Detection of growth hormone doping by gene expression profiling of ...
2009 JCEM Detection of growth hormone doping by gene expression profiling of ...2009 JCEM Detection of growth hormone doping by gene expression profiling of ...
2009 JCEM Detection of growth hormone doping by gene expression profiling of ...
 
Optimizing the conjugation of c5 y82f aequorin to the
Optimizing the conjugation of c5 y82f aequorin to theOptimizing the conjugation of c5 y82f aequorin to the
Optimizing the conjugation of c5 y82f aequorin to the
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphism
 
SNP Genotyping Technologies
SNP Genotyping TechnologiesSNP Genotyping Technologies
SNP Genotyping Technologies
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
Grant Proposal 2006
Grant Proposal 2006Grant Proposal 2006
Grant Proposal 2006
 
Single nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahuSingle nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahu
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding
 
Forensic Dna Me
Forensic Dna MeForensic Dna Me
Forensic Dna Me
 
Dna fingerprinting
Dna fingerprintingDna fingerprinting
Dna fingerprinting
 
Investigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilkInvestigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilk
 
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
 
Presentation1
Presentation1Presentation1
Presentation1
 

En vedette (10)

Infor Esti
Infor EstiInfor Esti
Infor Esti
 
Estos Bbs
Estos BbsEstos Bbs
Estos Bbs
 
jjrtyu
jjrtyujjrtyu
jjrtyu
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
RNA editing
RNA editing   RNA editing
RNA editing
 
Rna synthesis and processing
Rna synthesis  and processing Rna synthesis  and processing
Rna synthesis and processing
 
Bachelor Thesis Presentation
Bachelor Thesis PresentationBachelor Thesis Presentation
Bachelor Thesis Presentation
 
State of Tech in Texas
State of Tech in TexasState of Tech in Texas
State of Tech in Texas
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 

Similaire à Avoiding Nonsense Results in your NGS Variant Studies

Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysisyuvraj404
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsGenomeInABottle
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNAUlises Urzua
 
Cleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 geneCleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 geneJingwen Zhang
 
Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Francesco Gadaleta
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Thermo Fisher Scientific
 
2016 Presentation at the University of Hawaii Cancer Center
2016 Presentation at the University of Hawaii Cancer Center2016 Presentation at the University of Hawaii Cancer Center
2016 Presentation at the University of Hawaii Cancer CenterCasey Greene
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyJoaquin Dopazo
 
Simulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesSimulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesKevin Thornton
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyDeepak Kumar
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesProf. Wim Van Criekinge
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...
Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...
Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...Jonathan Clarke
 
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...Chris Southan
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
Variant Calling Workshop: Bioinformatics Tools
Variant Calling Workshop: Bioinformatics ToolsVariant Calling Workshop: Bioinformatics Tools
Variant Calling Workshop: Bioinformatics Toolsrbagnall
 
Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...DrAmitJoshi9
 

Similaire à Avoiding Nonsense Results in your NGS Variant Studies (20)

Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Cleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 geneCleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 gene
 
Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
 
2016 Presentation at the University of Hawaii Cancer Center
2016 Presentation at the University of Hawaii Cancer Center2016 Presentation at the University of Hawaii Cancer Center
2016 Presentation at the University of Hawaii Cancer Center
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
Simulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesSimulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association Studies
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacology
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
20160530 journal club_jqo
20160530 journal club_jqo20160530 journal club_jqo
20160530 journal club_jqo
 
Vivo vitrothingamajig
Vivo vitrothingamajigVivo vitrothingamajig
Vivo vitrothingamajig
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...
Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...
Reverse transcription-quantitative PCR (RT-qPCR): Reporting and minimizing th...
 
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
Variant Calling Workshop: Bioinformatics Tools
Variant Calling Workshop: Bioinformatics ToolsVariant Calling Workshop: Bioinformatics Tools
Variant Calling Workshop: Bioinformatics Tools
 
Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...Genome wide association studies---In genomics, a genome-wide association stud...
Genome wide association studies---In genomics, a genome-wide association stud...
 

Dernier

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 

Dernier (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 

Avoiding Nonsense Results in your NGS Variant Studies

  • 1. Avoiding Nonsense Results in your NGS Variant Studies James Lyons-Weiler, PhD Scientific Director/ Senior Research Scientist Bioinformatics Analysis Core Genomics & Proteomics Core Laboratories University of Pittsburgh Pittsburgh, PA May 1, 2014
  • 2. Two Parts • Identifying sites with low genotypic signal increases concordance among variant callers • Hazards in finding differentially expressed genes in RNASeq – how to do it more robustly.
  • 3. 23andMe: High risk of RA and psiriosis GTL: Low risk of RA and psiriosis
  • 5. Data were from Illumina hi-seq 2000
  • 6. Among method average Concordance 57.5% overall; 32.7% at high coverage O’Rawe et al.
  • 7. Information Theory Consensus Analysis e.g.,2/3, ¾, set analysis (-> modeling) Improve Callers (fix errors, modeling) Bake Offs LOW CONCORDANCE (O’Rawe et al., 2013) VARIANT CALLERS MAPPER SEQUENCER TRUTH (BIOLOGICAL MOLECULAR SEQUENCE) Simulations Spiked Ins
  • 8. Entropy of Base Distributions A T C G A T C G A T C G Low entropy Low entropy High entropy High enthalpy High enthalpy Low enthalpy
  • 9. Boltzmann Entropy • s = k ln w (Planck) • w = antiln(s/k) http://schneider.ncifcrf.gov/images/boltzmann /boltzmann-tomb-4.html
  • 10. Rank Sorted Distribution of w (O’Rawe et al. data) Heterozygotes w = 2 Homozygotes w = 1
  • 11. Example w Density Distribution
  • 12. w and FBVC A T C G w pw Zygosity Genotype 200 0 0 0 1 0 Homozygote AA 16 158 13 13 2.102558 0 Homozygote TT 100 100 0 0 2 0 Heterozygote AT 58 30 1 111 2.768507 0 Heterozygote AG 28 80 14 78 3.303636 0 Heterozygote TG 76 38 29 57 3.758733 0 Heterozygote AG 33 49 60 58 3.895496 0.0126 Heterzygote? CG? 50 50 50 50 4 1 noise unknown
  • 13. Operational* Equiprobable Null Distribution {f(A) = f(T) = f(G) = f(C)}
  • 15. What We Expect INCREASED CONCORDANCE Genotypic Signal Filtering VARIANT/BASE CALLERS MAPPER SEQUENCER TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)
  • 16.
  • 18. gatk From the O’Rawe et al. generated results FBVC = frequency-based variant caller (Lyons-Weiler et al.) Concordance w/ FBVC Hom Het ALL 0.5762 11868 17670 pw<=0.05 0.9976 11282 5676 pw>0.05 0.0074 586 11994 samtools ALL 0.5649 11541 18799 pw<=0.05 0.9917 11489 5761 pw>0.05 0.0002 52 13038 snver ALL 0.6006 11904 16729 pw<=0.05 0.9934 11812 5470 pw>0.05 0.0007 92 11259
  • 19. Signal Tx %Concordance FBVC_vs_FBVC Marked ALL 85.64 pw<=0.05 91.08 pw>0.05 35.66 FBVC_vs_FBVC Realigned ALL 83.82 pw<=0.05 91.69 pw>0.05 28.21 FBVC_vs_FBVC Recalibrated ALL 93.14 pw<=0.05 ***99.39 pw>0.05 48.53 FBVC_vs_FBVC Reduced ALL 21.54 pw<=0.05 24.57 pw>0.05 4.25 FBVC_vs_FBVC Marked-Realigned ALL 76.91 pw<=0.05 86.11 pw>0.05 15.44 FBVC_vs_FBVC Marked-Realigned-Recalibrated ALL 76.73 pw<=0.05 85.99 pw>0.05 15.34 FBVC_vs_FBVC Marked-Realigned-Recalibrated-Reduced ALL 19.98 pw<=0.05 22.9 pw>0.05 2.66
  • 20.
  • 21. Information Theory Consensus Analysis e.g.,2/3, ¾, set analysis (-> modeling) Improve Callers (fix errors, modeling) Bake Offs LOW CONCORDANCE (O’Rawe et al., 2013) VARIANT CALLERS MAPPER SEQUENCER TRUTH (BIOLOGICAL MOLECULAR SEQUENCE) Simulations Spiked Ins
  • 22. Lifescope reads (read) Shrimp2 reads (blue) Mappers must be systematically evaluated
  • 23. Part 2: Good and Bad News for RNASeq (and everything else): The Bad News: Fold Change is Biased. The Good News: We have identified a much less biased method.
  • 24. T-test is not appropriate for small N, large P data (such as RNASeq)
  • 25. Fold Change > 2.0 Delta > 25
  • 26. FC(A/B) is Blind to Large Portions of Your Data FC(A/B) Delta (and J5: Patel & Lyons-Weiler, 2004)
  • 27. Ratio are Hard to Interpret as Biological Differences Gene A B delta (A-B) FC(A/B) gene1 5 3 2 1.667 gene2 50 30 20 1.667 gene3 500 300 200 1.667 gene4 5000 3000 2000 1.667 gene5 50000 30000 20000 1.667
  • 28. A-B is a difference A/B is a quotient.
  • 29. Log2 Transformation Does not Help Reveals Minor Delta (&J5) Bias Pink = FC(A/B) Black = Delta
  • 31. FC Bias in Amyotrophic Lateral Sclerosis 350000 300000 250000 200000 150000 100000 50000 0 0 50000 100000 150000 200000 Control ALS DEGy FCDEGy Black circles = FC(A/B). Pink = Gthr-J5 genes
  • 32.
  • 33.
  • 34. FC(A/B) Bias in Alchohol-Induced Hepatitis Black circles = FC(A/B). Pink = Gthr-J5 genes
  • 35. Conclusions • Not all NGS/HTS sites have sufficient genotypic signal to warrant a base call. High coverage alone does not provide a solution. • By measuring genotypic signal, we can determine which sites we can call with confidence. • Fold-change(FC(A/B) is blind to highly expressed genes and should be abandoned as a measure of differential expression altogether – even for single gene or single protein studies! • Published microarray data sets analyzed to date using FC(A/B) only are a gold-mine for re-analysis using less biased methods.
  • 36. Credits and Contact • pw, pHom, etc: James Lyons-Weiler, Alan Twaddle, Rahil Sethi. – (MS in preparation) – Our software is called Gconf (not yet available) • Fold-Change Bias: James Lyons-Weiler, Tamanna Sultana, Rick Jordan, Rahil Sethi – (Paper in review) – For now, read • Mariani TJ, Budhraja V, Mecham BH, Gu CC, Watson MA, Sadovsky Y. 2003. A variable fold change threshold determines significance for expression microarrays. FASEB J. 17:321-3. doi: 10.1096/fj.02-0351fje • Pearson, K. 1897. On a form of spurious correlation that may arise when indices are used for the measurement of organs. Proc Roy Soc Lond 60:489-498 doi: 10.1098/rspl.1896.0076