SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Single Nucleotide Polymorphism
Analysis
Asst. Prof. Vitara Pungpapong, Ph.D.
Department of Statistics
Faculty of Commerce and Accountancy
Chulalongkorn University
1
Outline
• What is SNP array?
• Typical SNP analysis
• Challenges
• The ICM/M Method
• Results
Vitara Pungpapong 2
Microarray
• Usually known as Chip-chip.
• First publication in 1999
• Each known gene is a one spot on the
chip.
• Laser induced fluorescence (LIF) is used to
obtain color and intensity of each gene.
• Varying colors show varying levels of gene
activity.
• A microarray chip can contain 10,000 –
20,000 genes.
Vitara Pungpapong 3
Single Nucleotide Polymorphism
• Usually called chip-seq or SNP.
Vitara Pungpapong 4
(https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism)
Microarray vs SNPs
• Microarray is more suitable for small genomes
• More bias in microarray
• SNPs generally produces profiles with a better
signal-to-noise ratio, and allows detection of
more peaks and narrower peaks.
• SNPs generate more high-throughput data (>
1Tb) which requires more effort in analysis.
Vitara Pungpapong 5
1000 Genome Project
• http://www.1000genomes.org/
• The 1000 Genome Project provide the largest public catalog of
human genetic variation.
• The Project ran from 2008 and completed in 2015.
• The human genome consists of approximately 3 billion DNA
base pairs and is estimated to carry around 20,000 protein
coding genes.
• The samples for the 1000 Genomes Project are anonymous
and have no associated medical or phenotype data.
• The project holds self-reported ethnicity and gender.
• All participants declared themselves to be healthy at the time
the samples were collected.
Vitara Pungpapong 6
1000 Genome Project
Vitara Pungpapong 7
(http://www.1000genomes.org/)
dbGaP database
Vitara Pungpapong 8
• http://www.ncbi.nlm.nih.gov/gap
Maize Genome
• http://www.panzea.org/
Vitara Pungpapong 9
Vitara Pungpapong 10
Genome-wide Association Study (GWAS)
Goal:
Identify genetic variants associated with phenotype of interest.
Typical(Simple) GWAS Analysis
Vitara Pungpapong 11
Preprocessing
Data
Univariate
Analysis
Controlling
FWER or FDR
GWAS Gold Standard: 5 x 10-8
Preprocessing Data in GWAS
• SNP Call Rate (98-99%)
• Sample Call Rate (98-99%)
• Data Imputation
• Minor Allele Frequency (Remove extremely rare
SNPs, i.e., <5% frequency)
• Hardy-Weinberg Equilibrium
• Recode SNPs to the count of minor allele (0, 1, 2)
• For more information, refer to Turner et. Al.
(2011).
Vitara Pungpapong 12
Biological Pathways
Vitara Pungpapong 13
Databases:
- KEGG
(http://www.genome.jp/kegg
/pathway.html)
- Ingenuity
(http://www.ingenuity.com/)
- etc.
Challenges in GWAS
Vitara Pungpapong 14
- Want to incorporate biological pathway in GWAS
- Want to analyze all SNPs at once
High-dimensional Regression
• Regression with n < p
• Challenges in high-dimensional regression
– Large p small n problem
– Multicollinearity
– Sparsity
Vitara Pungpapong 15
Bayesian Model Setup
Vitara Pungpapong 16
𝐘 = 𝐗𝛽 + 𝜀, 𝜀~𝑁 0, 𝜎2
𝐼 𝑛
Consider a normal regression model:
Prior to capture sparsity in regression coefficient:
𝛽𝑗|𝜏𝑗 ~ 1 − 𝜏𝑗 𝛿0 𝛽𝑗 + 𝜏𝑗 𝛾𝛼 𝛽𝑗 𝜎 .
where 𝛿0 . is a Dirac delta function at zero
𝜏𝑗 = 1 𝛽 𝑗≠0
𝛾𝛼 𝛽𝑗 𝜎 =
𝛼 𝑛 − 1
2𝜎
exp −
𝛼 𝑛 − 1
𝜎
|𝛽𝑗|
Bayesian Model Setup
Vitara Pungpapong 17
Bayesian Model Setup
• The Ising model is employed to model relationship among
SNPs.
• The Ising model assumes that the relationship lies in an
undirected graph G = (V, E) where V is a set of vertices and E is
a set of edges.
• The Ising prior for 𝜏 = 𝜏1, … , 𝜏 𝑝
𝑡
where 𝜏𝑗 = 1 𝛽 𝑗≠0
Vitara Pungpapong 18
𝑃 𝜏 =
1
𝑍(𝑎, 𝑏
exp 𝑎
𝑗
𝜏𝑗 + 𝑏
<𝑗,𝑘>∈𝐸
𝜏𝑗 𝜏 𝑘 𝜏1
𝜏2
𝜏3 𝜏4
𝜏5
The ICM/M Algorithm
• Pungpapong et. al. (2015).
• Idea: The conditional distributions are used to obtain
parameters
• The ICM/M consists of two main parts:
– Conditional median for each regression coefficient
– Conditional mode for hyperparameters and auxiliary parameters
Vitara Pungpapong 19
The ICM/M Algorithm
Vitara Pungpapong 20
Obtain initial estimate ( 𝛽, 𝜎2
)
Obtain 𝜏 = 𝜏1, … , 𝜏 𝑝
𝑡
where 𝜏𝑗 = 1 𝛽 𝑗≠0
𝑎, 𝑏 = mode 𝑖=1
𝑝
𝑃 𝜏𝑗 𝜏−𝑗; 𝑎, 𝑏 ) =
mode 𝑖=1
𝑝
𝑃( 𝜏𝑗|{ 𝜏 𝑘: < 𝜏𝑗, 𝜏 𝑘 >∈ 𝐸}; 𝑎, 𝑏
𝛽𝑗 = median 𝛽𝑗 𝐘, 𝐗, 𝛽−𝑗, 𝜎2
, 𝑎, 𝑏 , 𝑗 = 1, … , 𝑝,
where 𝛽−𝑗 = ( 𝛽1, . . , 𝛽𝑗−1, 𝛽𝑗+1, … , 𝛽 𝑝 .
𝜎2 = mode(𝜎2|𝐘, 𝐗, 𝛽, 𝑎, 𝑏
Convergence in 𝜷? Stop
The ICM/M Algorithm
Vitara Pungpapong 21
Obtain initial estimate ( 𝛽, 𝜎2
)
Obtain 𝜏 = 𝜏1, … , 𝜏 𝑝
𝑡
where 𝜏𝑗 = 1 𝛽 𝑗≠0
𝑎, 𝑏 = mode 𝑖=1
𝑝
𝑃 𝜏𝑗 𝜏−𝑗; 𝑎, 𝑏 ) =
mode 𝑖=1
𝑝
𝑃( 𝜏𝑗|{ 𝜏 𝑘: < 𝜏𝑗, 𝜏 𝑘 >∈ 𝐸}; 𝑎, 𝑏
𝛽𝑗 = median 𝛽𝑗 𝐘, 𝐗, 𝛽−𝑗, 𝜎2
, 𝑎, 𝑏 , 𝑗 = 1, … , 𝑝,
where 𝛽−𝑗 = ( 𝛽1, . . , 𝛽𝑗−1, 𝛽𝑗+1, … , 𝛽 𝑝 .
𝜎2 = mode(𝜎2|𝐘, 𝐗, 𝛽, 𝑎, 𝑏
Convergence in 𝜷? Stop
Generalized Linear Models (GLMs)
Vitara Pungpapong 22
Iteratively Reweighted Least Squares
Vitara Pungpapong 23
Extension of the ICM/M to GLMs
• Borrow the idea of an iteratively reweighted least squares
(IRLS).
Vitara Pungpapong 24
Simulation Studies
• A total of 1,782 SNPs were randomly selected from
the Framingham dataset (Cupples et. al. 2007)
• 24 human regulatory pathways were retrieved from
KEGG database which involved 1,502 genes.
• 311 SNPS involved in 5 pathways were assumed to
have nonzero effect where the effect sizes were
randomly generated from Uniform[0.5, 3].
• Phenotype were simulated from the normal
regression model with the error variance = 5.
Vitara Pungpapong 25
Simulation Studies
• Results
Vitara Pungpapong 26
Method Prediction
Error
False
Positive
False
Negative
Lasso 30.7 (.41) .69 (.0004) .02 (.0004)
Adaptive
Lasso
206.2 (.57) .07 (.0017) .13 (.0002)
ICM/M 21.7 (.23) .03 (.0015) .04 (.0003)
Framingham Data Analysis
• Dataset: Framingham heart study (Cupples et. al. 2007)
• Phenotype: log transformation of vitamin D level
• Sample size: 952 for training set and 519 for test set
• The gene-pathway information relevant to vitamin D
level is obtained from the KEGG database
• There are 84,834 SNPs resided in 2,167 genetic regions in
112 pathways.
• Univariate tests were applied for screening process
resulting in 7,824 SNPs left for the analysis.
Vitara Pungpapong 27
Framingham Data Analysis
• Prediction errors and no. of identified SNPs
Vitara Pungpapong 28
Method Prediction
Error
No. of Identified
SNPs
Lasso .2560 14
Adaptive
Lasso
.2085 5
ICM/M .2121 5
Framingham Data Analysis
Chromosome - SNP
1-3887 4-0894 4-1174 5-2773 8-5143 17-
3907
17-
9089
𝛽 Lasso .0412 0 .0355 .0402 0 0 0
Adaptive
Lasso
.1521 0 .0434 .1539 -.0200 0 .0167
ICM/M .2417 -0.0512 0 .3047 -.0857 .1093 0
P-value* Lasso .2694 1 1 .6050 1 1 1
Adaptive
Lasso
.2060 1 1 .0031 1 1 1
ICM/M .0837 1 1 .0034 1 1 1
Vitara Pungpapong 29
* From multi-sample split method (Meinhausen et. al. (2009))
Parkinson’s Disease Data Analysis
• Data come from 3 different studies on PD
– Autopsy-Confirmed Parkinson Disease GWAS Consortium
(APDGC) (dbGaP Study Accession: phs000394.v1.p1)
– Genome-Wide Association Study of Parkinson Disease:
Genes and Environment (dbGaP Study Accession:
phs000196.v2.p1)
– NINDS-Genome-Wide Genotyping in Parkinson's Disease:
First Stage Analysis and Public Release of Data (n=1741)
– dbGaP Study Accession: phs000089.v3.p2
• Combined three data sets and obtained overlapping
SNPs (𝑛 = 6,704, 𝑝 = 888,398
Vitara Pungpapong 30
Parkinson’s Disease Data Analysis
• Pathway related to PD were retrieved from Ingenuity© IPA.
Vitara Pungpapong 31
Parkinson’s Disease Data Analysis
• ICM/M found 46 SNPs having nonzero
regression coefficients across 22
chromosomes.
• 8 genes known to PD were identified (e.g.,
TLR4, TNF, …).
Vitara Pungpapong 32
References
• Cupples, L. A.et al. (2007). The framingham heart study 100k snp genome-
wide association study resource: Overview of 17 phenotype working group
reports. BMC Medical Genetics, 8(Suppl 1):S1.
• Ho et. al. (2011). ChIP-chip versus ChIP-seq: Lessons for experimental
design and data analysis, BMC Genomics 2011 12:134.
• Meinshausen et. al. (2009). P-values for high-dimensional regression.
Journal of the American Statistical Association, 104:1671–1681.
• Pungpapong et. al. (2015). Selecting Massive Variables Using An Iterated
Conditional Modes/Medians Algorithm, Electronic Journal of Statistics 9 :
1243-1266.
• Turner, S. (2011). Quality control procedures for genome-wide association
studies. Curr Protoc Hum Genet 2011;68:1–19.1.18.
Vitara Pungpapong 33
Vitara Pungpapong 34

Contenu connexe

Tendances

Microarray technology and applications
Microarray technology and applicationsMicroarray technology and applications
Microarray technology and applicationsPurnima Kartha
 
Single strand conformation polymorphism
Single strand conformation polymorphismSingle strand conformation polymorphism
Single strand conformation polymorphismNivethitha T
 
Structural genomics
Structural genomicsStructural genomics
Structural genomicsAshfaq Ahmad
 
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Prasenjit Mitra
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Karan Veer Singh
 
Snp and its role in diseases
Snp and its role in diseasesSnp and its role in diseases
Snp and its role in diseasesAshfaq Ahmad
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNAmaryamshah13
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing pptAshwini R
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
DNA microarray ppt
DNA microarray pptDNA microarray ppt
DNA microarray pptMohit Kohli
 
Electrophoretic mobility shift assay
Electrophoretic mobility shift assay Electrophoretic mobility shift assay
Electrophoretic mobility shift assay iqraakbar8
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and toolsKAUSHAL SAHU
 

Tendances (20)

Microarray technology and applications
Microarray technology and applicationsMicroarray technology and applications
Microarray technology and applications
 
Single strand conformation polymorphism
Single strand conformation polymorphismSingle strand conformation polymorphism
Single strand conformation polymorphism
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 
Snp and its role in diseases
Snp and its role in diseasesSnp and its role in diseases
Snp and its role in diseases
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNA
 
Gene expression profiling
Gene expression profilingGene expression profiling
Gene expression profiling
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Microarray
MicroarrayMicroarray
Microarray
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing ppt
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
DNA microarray ppt
DNA microarray pptDNA microarray ppt
DNA microarray ppt
 
Electrophoretic mobility shift assay
Electrophoretic mobility shift assay Electrophoretic mobility shift assay
Electrophoretic mobility shift assay
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 

Similaire à SNP Analysis Pathway ID

a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detectionHyun-hwan Jeong
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Fatma Sayed Ibrahim
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data AnalysisSetia Pramana
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Thermo Fisher Scientific
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...Setia Pramana
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Oxford Gene Technology
 
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...Setia Pramana
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Christos Argyropoulos
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Setia Pramana
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06Computer Science Club
 
Predictive Analytics of Cell Types Using Single Cell Gene Expression Profiles
Predictive Analytics of Cell Types Using Single Cell Gene Expression ProfilesPredictive Analytics of Cell Types Using Single Cell Gene Expression Profiles
Predictive Analytics of Cell Types Using Single Cell Gene Expression ProfilesAli Al Hamadani
 

Similaire à SNP Analysis Pathway ID (20)

a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
 
Pharmacometrics
PharmacometricsPharmacometrics
Pharmacometrics
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection
 
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
 
Axt microarrays
Axt microarraysAxt microarrays
Axt microarrays
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
Upgma
UpgmaUpgma
Upgma
 
Predictive Analytics of Cell Types Using Single Cell Gene Expression Profiles
Predictive Analytics of Cell Types Using Single Cell Gene Expression ProfilesPredictive Analytics of Cell Types Using Single Cell Gene Expression Profiles
Predictive Analytics of Cell Types Using Single Cell Gene Expression Profiles
 

Plus de Data Science Thailand

Drawing Your career in business analytics and data science
Drawing Your career in business analytics and data scienceDrawing Your career in business analytics and data science
Drawing Your career in business analytics and data scienceData Science Thailand
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
Electronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data InitiativeElectronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data InitiativeData Science Thailand
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processingData Science Thailand
 
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...Data Science Thailand
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 
Precision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcarePrecision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcareData Science Thailand
 
Big Data Analytics to Enhance Security
Big Data Analytics to Enhance SecurityBig Data Analytics to Enhance Security
Big Data Analytics to Enhance SecurityData Science Thailand
 
Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics Data Science Thailand
 
Predictive Analytics in Manufacturing
Predictive Analytics in ManufacturingPredictive Analytics in Manufacturing
Predictive Analytics in ManufacturingData Science Thailand
 

Plus de Data Science Thailand (20)

Data Science Thailand Meetup#11
Data Science Thailand Meetup#11Data Science Thailand Meetup#11
Data Science Thailand Meetup#11
 
Define Your Data (Science) Career
Define Your Data (Science) CareerDefine Your Data (Science) Career
Define Your Data (Science) Career
 
Drawing Your career in business analytics and data science
Drawing Your career in business analytics and data scienceDrawing Your career in business analytics and data science
Drawing Your career in business analytics and data science
 
Data Science fuels Creativity
Data Science fuels CreativityData Science fuels Creativity
Data Science fuels Creativity
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Electronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data InitiativeElectronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data Initiative
 
Text Mining and Thai NLP
Text Mining and Thai NLP Text Mining and Thai NLP
Text Mining and Thai NLP
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
 
Bioinformatics in a Nutshell
Bioinformatics in a NutshellBioinformatics in a Nutshell
Bioinformatics in a Nutshell
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
Myths of Data Science
Myths of Data ScienceMyths of Data Science
Myths of Data Science
 
Hr Analytics
Hr AnalyticsHr Analytics
Hr Analytics
 
Marketing analytics
Marketing analyticsMarketing analytics
Marketing analytics
 
Precision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcarePrecision Medicine - The Future of Healthcare
Precision Medicine - The Future of Healthcare
 
Big Data Analytics to Enhance Security
Big Data Analytics to Enhance SecurityBig Data Analytics to Enhance Security
Big Data Analytics to Enhance Security
 
Using hadoop for big data
Using hadoop for big dataUsing hadoop for big data
Using hadoop for big data
 
My Spark Journey
My Spark JourneyMy Spark Journey
My Spark Journey
 
Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics
 
Predictive Analytics in Manufacturing
Predictive Analytics in ManufacturingPredictive Analytics in Manufacturing
Predictive Analytics in Manufacturing
 

Dernier

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Dernier (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

SNP Analysis Pathway ID

  • 1. Single Nucleotide Polymorphism Analysis Asst. Prof. Vitara Pungpapong, Ph.D. Department of Statistics Faculty of Commerce and Accountancy Chulalongkorn University 1
  • 2. Outline • What is SNP array? • Typical SNP analysis • Challenges • The ICM/M Method • Results Vitara Pungpapong 2
  • 3. Microarray • Usually known as Chip-chip. • First publication in 1999 • Each known gene is a one spot on the chip. • Laser induced fluorescence (LIF) is used to obtain color and intensity of each gene. • Varying colors show varying levels of gene activity. • A microarray chip can contain 10,000 – 20,000 genes. Vitara Pungpapong 3
  • 4. Single Nucleotide Polymorphism • Usually called chip-seq or SNP. Vitara Pungpapong 4 (https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism)
  • 5. Microarray vs SNPs • Microarray is more suitable for small genomes • More bias in microarray • SNPs generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. • SNPs generate more high-throughput data (> 1Tb) which requires more effort in analysis. Vitara Pungpapong 5
  • 6. 1000 Genome Project • http://www.1000genomes.org/ • The 1000 Genome Project provide the largest public catalog of human genetic variation. • The Project ran from 2008 and completed in 2015. • The human genome consists of approximately 3 billion DNA base pairs and is estimated to carry around 20,000 protein coding genes. • The samples for the 1000 Genomes Project are anonymous and have no associated medical or phenotype data. • The project holds self-reported ethnicity and gender. • All participants declared themselves to be healthy at the time the samples were collected. Vitara Pungpapong 6
  • 7. 1000 Genome Project Vitara Pungpapong 7 (http://www.1000genomes.org/)
  • 8. dbGaP database Vitara Pungpapong 8 • http://www.ncbi.nlm.nih.gov/gap
  • 10. Vitara Pungpapong 10 Genome-wide Association Study (GWAS) Goal: Identify genetic variants associated with phenotype of interest.
  • 11. Typical(Simple) GWAS Analysis Vitara Pungpapong 11 Preprocessing Data Univariate Analysis Controlling FWER or FDR GWAS Gold Standard: 5 x 10-8
  • 12. Preprocessing Data in GWAS • SNP Call Rate (98-99%) • Sample Call Rate (98-99%) • Data Imputation • Minor Allele Frequency (Remove extremely rare SNPs, i.e., <5% frequency) • Hardy-Weinberg Equilibrium • Recode SNPs to the count of minor allele (0, 1, 2) • For more information, refer to Turner et. Al. (2011). Vitara Pungpapong 12
  • 13. Biological Pathways Vitara Pungpapong 13 Databases: - KEGG (http://www.genome.jp/kegg /pathway.html) - Ingenuity (http://www.ingenuity.com/) - etc.
  • 14. Challenges in GWAS Vitara Pungpapong 14 - Want to incorporate biological pathway in GWAS - Want to analyze all SNPs at once
  • 15. High-dimensional Regression • Regression with n < p • Challenges in high-dimensional regression – Large p small n problem – Multicollinearity – Sparsity Vitara Pungpapong 15
  • 16. Bayesian Model Setup Vitara Pungpapong 16 𝐘 = 𝐗𝛽 + 𝜀, 𝜀~𝑁 0, 𝜎2 𝐼 𝑛 Consider a normal regression model: Prior to capture sparsity in regression coefficient: 𝛽𝑗|𝜏𝑗 ~ 1 − 𝜏𝑗 𝛿0 𝛽𝑗 + 𝜏𝑗 𝛾𝛼 𝛽𝑗 𝜎 . where 𝛿0 . is a Dirac delta function at zero 𝜏𝑗 = 1 𝛽 𝑗≠0 𝛾𝛼 𝛽𝑗 𝜎 = 𝛼 𝑛 − 1 2𝜎 exp − 𝛼 𝑛 − 1 𝜎 |𝛽𝑗|
  • 18. Bayesian Model Setup • The Ising model is employed to model relationship among SNPs. • The Ising model assumes that the relationship lies in an undirected graph G = (V, E) where V is a set of vertices and E is a set of edges. • The Ising prior for 𝜏 = 𝜏1, … , 𝜏 𝑝 𝑡 where 𝜏𝑗 = 1 𝛽 𝑗≠0 Vitara Pungpapong 18 𝑃 𝜏 = 1 𝑍(𝑎, 𝑏 exp 𝑎 𝑗 𝜏𝑗 + 𝑏 <𝑗,𝑘>∈𝐸 𝜏𝑗 𝜏 𝑘 𝜏1 𝜏2 𝜏3 𝜏4 𝜏5
  • 19. The ICM/M Algorithm • Pungpapong et. al. (2015). • Idea: The conditional distributions are used to obtain parameters • The ICM/M consists of two main parts: – Conditional median for each regression coefficient – Conditional mode for hyperparameters and auxiliary parameters Vitara Pungpapong 19
  • 20. The ICM/M Algorithm Vitara Pungpapong 20 Obtain initial estimate ( 𝛽, 𝜎2 ) Obtain 𝜏 = 𝜏1, … , 𝜏 𝑝 𝑡 where 𝜏𝑗 = 1 𝛽 𝑗≠0 𝑎, 𝑏 = mode 𝑖=1 𝑝 𝑃 𝜏𝑗 𝜏−𝑗; 𝑎, 𝑏 ) = mode 𝑖=1 𝑝 𝑃( 𝜏𝑗|{ 𝜏 𝑘: < 𝜏𝑗, 𝜏 𝑘 >∈ 𝐸}; 𝑎, 𝑏 𝛽𝑗 = median 𝛽𝑗 𝐘, 𝐗, 𝛽−𝑗, 𝜎2 , 𝑎, 𝑏 , 𝑗 = 1, … , 𝑝, where 𝛽−𝑗 = ( 𝛽1, . . , 𝛽𝑗−1, 𝛽𝑗+1, … , 𝛽 𝑝 . 𝜎2 = mode(𝜎2|𝐘, 𝐗, 𝛽, 𝑎, 𝑏 Convergence in 𝜷? Stop
  • 21. The ICM/M Algorithm Vitara Pungpapong 21 Obtain initial estimate ( 𝛽, 𝜎2 ) Obtain 𝜏 = 𝜏1, … , 𝜏 𝑝 𝑡 where 𝜏𝑗 = 1 𝛽 𝑗≠0 𝑎, 𝑏 = mode 𝑖=1 𝑝 𝑃 𝜏𝑗 𝜏−𝑗; 𝑎, 𝑏 ) = mode 𝑖=1 𝑝 𝑃( 𝜏𝑗|{ 𝜏 𝑘: < 𝜏𝑗, 𝜏 𝑘 >∈ 𝐸}; 𝑎, 𝑏 𝛽𝑗 = median 𝛽𝑗 𝐘, 𝐗, 𝛽−𝑗, 𝜎2 , 𝑎, 𝑏 , 𝑗 = 1, … , 𝑝, where 𝛽−𝑗 = ( 𝛽1, . . , 𝛽𝑗−1, 𝛽𝑗+1, … , 𝛽 𝑝 . 𝜎2 = mode(𝜎2|𝐘, 𝐗, 𝛽, 𝑎, 𝑏 Convergence in 𝜷? Stop
  • 22. Generalized Linear Models (GLMs) Vitara Pungpapong 22
  • 23. Iteratively Reweighted Least Squares Vitara Pungpapong 23
  • 24. Extension of the ICM/M to GLMs • Borrow the idea of an iteratively reweighted least squares (IRLS). Vitara Pungpapong 24
  • 25. Simulation Studies • A total of 1,782 SNPs were randomly selected from the Framingham dataset (Cupples et. al. 2007) • 24 human regulatory pathways were retrieved from KEGG database which involved 1,502 genes. • 311 SNPS involved in 5 pathways were assumed to have nonzero effect where the effect sizes were randomly generated from Uniform[0.5, 3]. • Phenotype were simulated from the normal regression model with the error variance = 5. Vitara Pungpapong 25
  • 26. Simulation Studies • Results Vitara Pungpapong 26 Method Prediction Error False Positive False Negative Lasso 30.7 (.41) .69 (.0004) .02 (.0004) Adaptive Lasso 206.2 (.57) .07 (.0017) .13 (.0002) ICM/M 21.7 (.23) .03 (.0015) .04 (.0003)
  • 27. Framingham Data Analysis • Dataset: Framingham heart study (Cupples et. al. 2007) • Phenotype: log transformation of vitamin D level • Sample size: 952 for training set and 519 for test set • The gene-pathway information relevant to vitamin D level is obtained from the KEGG database • There are 84,834 SNPs resided in 2,167 genetic regions in 112 pathways. • Univariate tests were applied for screening process resulting in 7,824 SNPs left for the analysis. Vitara Pungpapong 27
  • 28. Framingham Data Analysis • Prediction errors and no. of identified SNPs Vitara Pungpapong 28 Method Prediction Error No. of Identified SNPs Lasso .2560 14 Adaptive Lasso .2085 5 ICM/M .2121 5
  • 29. Framingham Data Analysis Chromosome - SNP 1-3887 4-0894 4-1174 5-2773 8-5143 17- 3907 17- 9089 𝛽 Lasso .0412 0 .0355 .0402 0 0 0 Adaptive Lasso .1521 0 .0434 .1539 -.0200 0 .0167 ICM/M .2417 -0.0512 0 .3047 -.0857 .1093 0 P-value* Lasso .2694 1 1 .6050 1 1 1 Adaptive Lasso .2060 1 1 .0031 1 1 1 ICM/M .0837 1 1 .0034 1 1 1 Vitara Pungpapong 29 * From multi-sample split method (Meinhausen et. al. (2009))
  • 30. Parkinson’s Disease Data Analysis • Data come from 3 different studies on PD – Autopsy-Confirmed Parkinson Disease GWAS Consortium (APDGC) (dbGaP Study Accession: phs000394.v1.p1) – Genome-Wide Association Study of Parkinson Disease: Genes and Environment (dbGaP Study Accession: phs000196.v2.p1) – NINDS-Genome-Wide Genotyping in Parkinson's Disease: First Stage Analysis and Public Release of Data (n=1741) – dbGaP Study Accession: phs000089.v3.p2 • Combined three data sets and obtained overlapping SNPs (𝑛 = 6,704, 𝑝 = 888,398 Vitara Pungpapong 30
  • 31. Parkinson’s Disease Data Analysis • Pathway related to PD were retrieved from Ingenuity© IPA. Vitara Pungpapong 31
  • 32. Parkinson’s Disease Data Analysis • ICM/M found 46 SNPs having nonzero regression coefficients across 22 chromosomes. • 8 genes known to PD were identified (e.g., TLR4, TNF, …). Vitara Pungpapong 32
  • 33. References • Cupples, L. A.et al. (2007). The framingham heart study 100k snp genome- wide association study resource: Overview of 17 phenotype working group reports. BMC Medical Genetics, 8(Suppl 1):S1. • Ho et. al. (2011). ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis, BMC Genomics 2011 12:134. • Meinshausen et. al. (2009). P-values for high-dimensional regression. Journal of the American Statistical Association, 104:1671–1681. • Pungpapong et. al. (2015). Selecting Massive Variables Using An Iterated Conditional Modes/Medians Algorithm, Electronic Journal of Statistics 9 : 1243-1266. • Turner, S. (2011). Quality control procedures for genome-wide association studies. Curr Protoc Hum Genet 2011;68:1–19.1.18. Vitara Pungpapong 33