SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
∞ Infinite Mixtures of Infinite Factor Analysers ∞
Nonparametric Model-Based Clustering via Latent Gaussian Models
Keefe Murphy 1, 2
Dr. Isobel Claire Gormley 1, 2
Dr. Cinzia Viroli 3
1School of Mathematics and Statistics, UCD
2Insight Centre for Data Analytics, UCD
3Department of Statistical Sciences, University of Bologna
keefe.murphy@ucd.ie
May 17th 2017
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 1 / 21
Motivation
Clustering problems are becoming increasingly high-dimensional; in such cases,
many common techniques perform poorly and may even be intractable
IMIFA offers a ‘choice-free’, Bayesian nonparametric approach to fitting
mixture models which assume FA covariance within each component. . .
. . . with particular focus on clustering N p data sets
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 2 / 21
Agenda
Mixtures of Factor Analysers
Mixtures of Infinite Factor Analysers
Examples & Results
Infinite Mixtures of Infinite Factor Analysers
Examples & Results
Discussion
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 3 / 21
(Finite) Mixtures of (Finite) Factor Analysers (MFA)
MFA is a Gaussian latent variable model, simultaneously achieving
dimension reduction and clustering in high-dimensional data settings
Supposes p-dimensional feature vector xi ∀ i = 1, . . . , N arises with
probability πg ∀ g = 1, . . . , G from a cluster-specific FA model:
xi − µg = Λgηi + ig
where:
µg
= mean vector
ηi
= q-vector of unobserved latent factors where q p
Λg = p × q loadings matrix
ig = error vector ∼ MVNp 0, Ψg
Ψg = diagonal matrix with non-zero entries known as uniquenesses
π = cluster mixing proportions
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 4 / 21
(Finite) Mixtures of (Finite) Factor Analysers (MFA)
A latent indicator zi is introduced, s.t.
zig =
1 if i ∈ cluster g
0 otherwise
Provides parsimonious covariance structure:
P (xi) =
G
g=1
πgMVNp µg, Λg Λg + Ψg
Conjugate priors assumed: facilitates MCMC sampling via Gibbs updates
Isotropic constraint: Ψg = σg
2
Ip provides link to MPPCA1
1
Tipping & Bishop (1999)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 5 / 21
(Finite) Mixtures of (Finite) Factor Analysers (MFA)
Limitation of this approach is that ranges of values for the numbers of clusters
G and factors q must be specified in advance
Usually the pair which optimises some model selection criterion are chosen
Computationally intensive and generally only reasonable to fit models where
q is the same across clusters, e.g. PGMM2
However, unlike PGMM, q = 0 is allowed
2
McNicholas & Murphy (2008)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 6 / 21
(Finite) Mixtures of Infinite Factor Analysers (MIFA)
Allows each Λg matrix to theoretically have infinitely many factors, using a
multiplicative gamma process shrinkage prior3
(MGP)
Loadings: λjkg ∼ N 0, φ−1
jkgτ−1
kg
Local Shrinkage: φjkg ∼ Ga (ν + 1, ν) (sparsity)
Global Shrinkage: τkg =
k
h=1
δhg δ1g ∼ Ga (α1, 1) δhg ∼ Ga (α2, 1) ∀ h ≥ 2
Increasingly shrinks loadings toward zero as column index k → ∞
3
Bhattacharya & Dunson (2011)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 7 / 21
(Finite) Mixtures of Infinite Factor Analysers (MIFA)
Prior distribution of loadings in 1st
, 2nd
& 3rd
columns of typical Λg matrix
−4 −2 0 2 4
0.00.20.40.60.81.0
Density
λ
−4 −2 0 2 4
0.00.20.40.60.81.0
Density
λ
−4 −2 0 2 4
0.00.20.40.60.81.0
Density
λ
MIFA allows different clusters be modelled by different numbers of factors
MIFA significantly reduces model search to one for G only, as qg is
estimated automatically during model fitting
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 8 / 21
Practical Considerations. . .
Diminishing AGS balances keeping important factors vs. discarding redun-
dant ones: uses posterior mode of # of effective factors to estimate each ˆqg
Identifiability issues addressed via Procrustean methods4
; each sampled Λg
mapped to a template matrix at burn-in to ensure sensible posterior means
Label switching addressed offline by also mapping zi samples to a template,
using the square-assignment algorithm5
Optimal G chosen via BICM6
, which is particularly useful for nonparametric
models where the # of ‘free’ parameters is hard to quantify
4
Ghosh & Dunson (2008)
5
Carpaneto & Toth (1980)
6
Raftery et al. (2007)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 9 / 21
Spectral Metabolomic Data7
189 spectral peaks from urine samples of 18 subjects (N p)
Half have epilepsy, half are control subjects
Chemical Shift (ppm)
9 8 7 6 5 4 3 2 1
Control
Epileptic
With fixed G = 2, BICM chooses qg = 3 for each MFA cluster:
4 subjects clustered incorrectly
For G ∈ {1, . . . , 5}, BICM chooses G = 2 MIFA components:
1 subject clustered incorrectly
7
Carmody et al. (2010)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 10 / 21
Spectral Metabolomic Data
Uncertainty in ˆqg for MIFA clusters quantified by 95% credible intervals:
ˆq1 ∈ [1, 4] and ˆq2 ∈ [2, 5] – greater complexity in the epileptic cluster
1 2 3 4 5 6
qg
02000400060008000
Frequency
Cluster 1
Cluster 2
Cluster 1: q1 = 2
1 2
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Factors
ChemicalShift(ppm)
Cluster 2: q2 = 3
1 2 3
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Factors
ChemicalShift(ppm)Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 11 / 21
Infinite Mixtures of Infinite Factor Analysers (IMIFA)
Need to choose optimal number of latent factors in a MFA model has been
obviated using MIFA, but issue of model choice still not entirely resolved. . .
Extension: estimate G in a similarly ‘choice-free’ manner:
Overfitted Mixtures (OMIFA/OMFA)
(Poisson -) Dirichlet Processes (IMIFA/IMFA)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 12 / 21
Dirichlet Processes (DP)
Probability distribution H ∼ DP (α, H0) if every marginal of H on finite
partitions of the domain Ω are Dirichlet distributed8
:
H (A1) , . . . , H (Ar) ∼ Dir αH0 (A1) , . . . , αH0 (Ar)
A1 ∪ . . . ∪ Ar = Ω
Base distribution: H0 – interpreted as the prior guess for the parameters of
the model or the mean of the DP:
E [H(A)] = H0(A)
Concentration parameter: α – expresses strength of belief in H0:
V [H(A)] =
H0(A)(1 − H0(A))
α + 1
8
Ferguson (1973)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 13 / 21
Pitman-Yor Processes (PYP)9
PYP (α, d, H0) with ‘discount’ d ∈ [0, 1) : d = 0 reduces to DP (α, H0)
d > 0 obtains heavy-tailed less informative prior with no tractability sacrifices
Spike-&-slab prior on d: κδ0 + (1 − κ) Beta (d | a , b ) used to assess whether
data arose from DP or more general PYP at little extra computational cost
0.000.050.100.15
No. of Clusters (g)
P(G=g)
1 5 10 15 20 25 30 35 40 45 50
α = 3
α = 10
α = 25
0.000.050.100.15
No. of Clusters (g)
P(G=g)
1 5 10 15 20 25 30 35 40 45 50
α
19.23
6.47
1
d
0
0.47
0.73
9
Perman (1992)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 14 / 21
Sampling Infinite Mixture Models
Stick-Breaking10
Representation:
vg ∼ Beta(1 − d, α + gd), θg ∼ H0, f xi ∝
∞
g=1
πg MVNp xi; θg
πg = vg
g−1
l=1
(1 − vl), H =
∞
g=1
πg δθg ∼ PYP (α, d, H0)
θ = {µ, Λ, Ψ} denotes the whole set of parameters of the FA models
Draws are composed of a weighted sum of infinitely many point masses
‘Independent’ Slice-Efficient Sampler11
facilitates adaptively truncating sufficient
# ‘active‘ components needed to be sampled at each iteration (variable, finite)
10
Sethuraman (1994)
11
Kalli et al. (2011)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 15 / 21
Infinite Mixtures of Infinite Factor Analysers (IMIFA)
f(X, η, Z, u, V , θ) ∝ f(X | η, Z, u, V , θ)f(η | u)f(Z, u | V , π)f(V | α, d)f(θ)
=



N
i=1 g∈Aξ(ui)
MVNp(xi; µg + Λgηi, Ψg)zig






N
i=1 g∈Aξ(ui)
MVNq(ηi; 0, Iq)



N
i=1
∞
g=1
πg
ξg
1 ui < ξg
zig ∞
g=1
(1 − vg)α+gd−1
vd
g B(1 − d, α + gd)
f(θ)
Where f(θ) is the product of the conjugate priors
True G estimated by number of non-empty clusters visited most often;
cluster-specific inference conducted only on those visits
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 16 / 21
Olive Oil Benchmarking
Data on %-composition of 8 fatty acids found in 572 Italian olive oils, from
three Areas: Southern Italy, Sardinia, and Northern Italy12
Within each area there are a number of different Regions:
True G hypothesised to correspond to either 3 Areas or 9 Regions
Full suite of models fitted13
: results for the methods relying on ranges of
G &/or q values being supplied based on G = 1, . . . , 9 & q = 0, . . . , 6,
respectively, with the optimal model chosen by BICM where necessary
12
Forina et al. (1983)
13
Iterations=50,000, Burn-in=10,000, Thinning=2nd
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 17 / 21
Olive Oil Benchmarking
Model
#
Models
Time
(s)
Rel.
Time α d G Q
Adj.
Rand
Error
(%)
IMIFA 1 782 1.00 0.75 0.01 4 6, 2, 3, 2 0.93 8.57
IMFA 7 3661 4.68 0.88 0.01 5 6, 6, 6, 6, 6 0.92 10.31
OMIFA 1 818 1.05 0.02 – 5 6, 2, 2, 2, 2 0.90 15.56
OMFA 7 3336 4.27 0.02 – 5 6, 6, 6, 6, 6 0.85 15.91
MIFA 9 2608 3.33 1 – 5 5, 2, 2, 3, 2 0.91 14.86
MFA 63 11083 14.17 1 – 2 5, 5 0.82 17.13
IFA 1 78 0.10 – – 1 6 – –
FA 7 287 0.37 – – 1 6 – –
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 18 / 21
Olive Oil Benchmarking
Model
#
Models
Time
(s)
Rel.
Time α d G Q
Adj.
Rand
Error
(%)
IMIFA 1 782 1.00 0.75 0.01 4 6, 2, 3, 2 0.93 8.57
IMFA 7 3661 4.68 0.88 0.01 5 6, 6, 6, 6, 6 0.92 10.31
OMIFA 1 818 1.05 0.02 – 5 6, 2, 2, 2, 2 0.90 15.56
OMFA 7 3336 4.27 0.02 – 5 6, 6, 6, 6, 6 0.85 15.91
MIFA 9 2608 3.33 1 – 5 5, 2, 2, 3, 2 0.91 14.86
MFA 63 11083 14.17 1 – 2 5, 5 0.82 17.13
IFA 1 78 0.10 – – 1 6 – –
FA 7 287 0.37 – – 1 6 – –
Large Southern Italy cluster requires large amount of factors (6): flexibility to
model other clusters with less factors greatly improves clustering performance
Optimal models chosen by BICM were not all optimal in a clustering sense:
candidate G = 4 MIFA model yields Adj. Rand=0.94 & Error Rate=8.39%
Fully automatic IMIFA requires only one quick run, gives best clustering
performance, and does not rely on any model selection criteria
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 18 / 21
Olive Oil Benchmarking
Comparing IMIFA’s MAP clustering against Area suggests possible 4th
group,
separating Umbria from Liguria in the North, which makes geographic sense
Table: 3 Areas
Adj. Rand = 0.9345
Error Rate = 8.57%
1 2 3 4
South 323 0 0 0
Sardinia 0 97 1 0
North 0 0 103 48
Table: 4 Areas
Adj. Rand = 0.9961
Error Rate = 0.7%
1 2 3 4
South 323 0 0 0
Sardinia 0 97 1 0
Liguria 0 0 100 0
Umbria 0 0 3 48
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 19 / 21
Metabolomic IMIFA
IMIFA with isotropic uniquenesses finds a 3-cluster solution:
Each cluster has 2 latent factors
Outlying observation misclassified by MIFA now given its own component
Table: Urine Data
Adj. Rand = 0.8944
Error Rate = 5.56%
1 2 3
Control 9 0 0
Epileptic 0 8 1
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 20 / 21
References
________ __________________
/_ __/ |/ /_ __/ ___/ _ 
/ / / /|_// / / / / /__/ /_ 
_/ /_/ / / /_/ /_/ ___/ /___ 
/____/_/ /_/_____/_/ /_/ _ version 1.2.0
Murphy, K., Gormley, I. C. and Viroli, C. (2017) Infinite Mixtures of
Infinite Factor Analysers: Nonparametric Model-Based Clustering via
Latent Gaussian Models, arXiv:1701.07010v3
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 21 / 21

Contenu connexe

Similaire à Keefe Murphy - IMIFA - CASI 2017

Evaluating Graph Signal Processing for Neuroimaging Through Classification an...
Evaluating Graph Signal Processing for Neuroimaging Through Classification an...Evaluating Graph Signal Processing for Neuroimaging Through Classification an...
Evaluating Graph Signal Processing for Neuroimaging Through Classification an...Nicolas Farrugia
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishtuxette
 
Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Sean Golliher
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...Konstantinos Giannakis
 
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesMatt Moores
 
Goodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseGoodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseNeuroMat
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Chiheb Ben Hammouda
 
Bayesian selection of best subsets in high-dimensional regression
Bayesian selection of best subsets in high-dimensional regressionBayesian selection of best subsets in high-dimensional regression
Bayesian selection of best subsets in high-dimensional regressionCaleb (Shiqiang) Jin
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Rafael Nogueras
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingVictor Zhorin
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 

Similaire à Keefe Murphy - IMIFA - CASI 2017 (20)

Evaluating Graph Signal Processing for Neuroimaging Through Classification an...
Evaluating Graph Signal Processing for Neuroimaging Through Classification an...Evaluating Graph Signal Processing for Neuroimaging Through Classification an...
Evaluating Graph Signal Processing for Neuroimaging Through Classification an...
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
 
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Variable Selection in ...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Variable Selection in ...MUMS: Bayesian, Fiducial, and Frequentist Conference - Variable Selection in ...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Variable Selection in ...
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
 
Arch & Garch Processes
Arch & Garch ProcessesArch & Garch Processes
Arch & Garch Processes
 
Goodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseGoodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data case
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayesian selection of best subsets in high-dimensional regression
Bayesian selection of best subsets in high-dimensional regressionBayesian selection of best subsets in high-dimensional regression
Bayesian selection of best subsets in high-dimensional regression
 
AROPUB-IJPGE-14-30
AROPUB-IJPGE-14-30AROPUB-IJPGE-14-30
AROPUB-IJPGE-14-30
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial Modeling
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 

Dernier

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 

Dernier (20)

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 

Keefe Murphy - IMIFA - CASI 2017

  • 1. ∞ Infinite Mixtures of Infinite Factor Analysers ∞ Nonparametric Model-Based Clustering via Latent Gaussian Models Keefe Murphy 1, 2 Dr. Isobel Claire Gormley 1, 2 Dr. Cinzia Viroli 3 1School of Mathematics and Statistics, UCD 2Insight Centre for Data Analytics, UCD 3Department of Statistical Sciences, University of Bologna keefe.murphy@ucd.ie May 17th 2017 Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 1 / 21
  • 2. Motivation Clustering problems are becoming increasingly high-dimensional; in such cases, many common techniques perform poorly and may even be intractable IMIFA offers a ‘choice-free’, Bayesian nonparametric approach to fitting mixture models which assume FA covariance within each component. . . . . . with particular focus on clustering N p data sets Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 2 / 21
  • 3. Agenda Mixtures of Factor Analysers Mixtures of Infinite Factor Analysers Examples & Results Infinite Mixtures of Infinite Factor Analysers Examples & Results Discussion Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 3 / 21
  • 4. (Finite) Mixtures of (Finite) Factor Analysers (MFA) MFA is a Gaussian latent variable model, simultaneously achieving dimension reduction and clustering in high-dimensional data settings Supposes p-dimensional feature vector xi ∀ i = 1, . . . , N arises with probability πg ∀ g = 1, . . . , G from a cluster-specific FA model: xi − µg = Λgηi + ig where: µg = mean vector ηi = q-vector of unobserved latent factors where q p Λg = p × q loadings matrix ig = error vector ∼ MVNp 0, Ψg Ψg = diagonal matrix with non-zero entries known as uniquenesses π = cluster mixing proportions Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 4 / 21
  • 5. (Finite) Mixtures of (Finite) Factor Analysers (MFA) A latent indicator zi is introduced, s.t. zig = 1 if i ∈ cluster g 0 otherwise Provides parsimonious covariance structure: P (xi) = G g=1 πgMVNp µg, Λg Λg + Ψg Conjugate priors assumed: facilitates MCMC sampling via Gibbs updates Isotropic constraint: Ψg = σg 2 Ip provides link to MPPCA1 1 Tipping & Bishop (1999) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 5 / 21
  • 6. (Finite) Mixtures of (Finite) Factor Analysers (MFA) Limitation of this approach is that ranges of values for the numbers of clusters G and factors q must be specified in advance Usually the pair which optimises some model selection criterion are chosen Computationally intensive and generally only reasonable to fit models where q is the same across clusters, e.g. PGMM2 However, unlike PGMM, q = 0 is allowed 2 McNicholas & Murphy (2008) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 6 / 21
  • 7. (Finite) Mixtures of Infinite Factor Analysers (MIFA) Allows each Λg matrix to theoretically have infinitely many factors, using a multiplicative gamma process shrinkage prior3 (MGP) Loadings: λjkg ∼ N 0, φ−1 jkgτ−1 kg Local Shrinkage: φjkg ∼ Ga (ν + 1, ν) (sparsity) Global Shrinkage: τkg = k h=1 δhg δ1g ∼ Ga (α1, 1) δhg ∼ Ga (α2, 1) ∀ h ≥ 2 Increasingly shrinks loadings toward zero as column index k → ∞ 3 Bhattacharya & Dunson (2011) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 7 / 21
  • 8. (Finite) Mixtures of Infinite Factor Analysers (MIFA) Prior distribution of loadings in 1st , 2nd & 3rd columns of typical Λg matrix −4 −2 0 2 4 0.00.20.40.60.81.0 Density λ −4 −2 0 2 4 0.00.20.40.60.81.0 Density λ −4 −2 0 2 4 0.00.20.40.60.81.0 Density λ MIFA allows different clusters be modelled by different numbers of factors MIFA significantly reduces model search to one for G only, as qg is estimated automatically during model fitting Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 8 / 21
  • 9. Practical Considerations. . . Diminishing AGS balances keeping important factors vs. discarding redun- dant ones: uses posterior mode of # of effective factors to estimate each ˆqg Identifiability issues addressed via Procrustean methods4 ; each sampled Λg mapped to a template matrix at burn-in to ensure sensible posterior means Label switching addressed offline by also mapping zi samples to a template, using the square-assignment algorithm5 Optimal G chosen via BICM6 , which is particularly useful for nonparametric models where the # of ‘free’ parameters is hard to quantify 4 Ghosh & Dunson (2008) 5 Carpaneto & Toth (1980) 6 Raftery et al. (2007) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 9 / 21
  • 10. Spectral Metabolomic Data7 189 spectral peaks from urine samples of 18 subjects (N p) Half have epilepsy, half are control subjects Chemical Shift (ppm) 9 8 7 6 5 4 3 2 1 Control Epileptic With fixed G = 2, BICM chooses qg = 3 for each MFA cluster: 4 subjects clustered incorrectly For G ∈ {1, . . . , 5}, BICM chooses G = 2 MIFA components: 1 subject clustered incorrectly 7 Carmody et al. (2010) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 10 / 21
  • 11. Spectral Metabolomic Data Uncertainty in ˆqg for MIFA clusters quantified by 95% credible intervals: ˆq1 ∈ [1, 4] and ˆq2 ∈ [2, 5] – greater complexity in the epileptic cluster 1 2 3 4 5 6 qg 02000400060008000 Frequency Cluster 1 Cluster 2 Cluster 1: q1 = 2 1 2 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Factors ChemicalShift(ppm) Cluster 2: q2 = 3 1 2 3 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Factors ChemicalShift(ppm)Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 11 / 21
  • 12. Infinite Mixtures of Infinite Factor Analysers (IMIFA) Need to choose optimal number of latent factors in a MFA model has been obviated using MIFA, but issue of model choice still not entirely resolved. . . Extension: estimate G in a similarly ‘choice-free’ manner: Overfitted Mixtures (OMIFA/OMFA) (Poisson -) Dirichlet Processes (IMIFA/IMFA) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 12 / 21
  • 13. Dirichlet Processes (DP) Probability distribution H ∼ DP (α, H0) if every marginal of H on finite partitions of the domain Ω are Dirichlet distributed8 : H (A1) , . . . , H (Ar) ∼ Dir αH0 (A1) , . . . , αH0 (Ar) A1 ∪ . . . ∪ Ar = Ω Base distribution: H0 – interpreted as the prior guess for the parameters of the model or the mean of the DP: E [H(A)] = H0(A) Concentration parameter: α – expresses strength of belief in H0: V [H(A)] = H0(A)(1 − H0(A)) α + 1 8 Ferguson (1973) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 13 / 21
  • 14. Pitman-Yor Processes (PYP)9 PYP (α, d, H0) with ‘discount’ d ∈ [0, 1) : d = 0 reduces to DP (α, H0) d > 0 obtains heavy-tailed less informative prior with no tractability sacrifices Spike-&-slab prior on d: κδ0 + (1 − κ) Beta (d | a , b ) used to assess whether data arose from DP or more general PYP at little extra computational cost 0.000.050.100.15 No. of Clusters (g) P(G=g) 1 5 10 15 20 25 30 35 40 45 50 α = 3 α = 10 α = 25 0.000.050.100.15 No. of Clusters (g) P(G=g) 1 5 10 15 20 25 30 35 40 45 50 α 19.23 6.47 1 d 0 0.47 0.73 9 Perman (1992) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 14 / 21
  • 15. Sampling Infinite Mixture Models Stick-Breaking10 Representation: vg ∼ Beta(1 − d, α + gd), θg ∼ H0, f xi ∝ ∞ g=1 πg MVNp xi; θg πg = vg g−1 l=1 (1 − vl), H = ∞ g=1 πg δθg ∼ PYP (α, d, H0) θ = {µ, Λ, Ψ} denotes the whole set of parameters of the FA models Draws are composed of a weighted sum of infinitely many point masses ‘Independent’ Slice-Efficient Sampler11 facilitates adaptively truncating sufficient # ‘active‘ components needed to be sampled at each iteration (variable, finite) 10 Sethuraman (1994) 11 Kalli et al. (2011) Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 15 / 21
  • 16. Infinite Mixtures of Infinite Factor Analysers (IMIFA) f(X, η, Z, u, V , θ) ∝ f(X | η, Z, u, V , θ)f(η | u)f(Z, u | V , π)f(V | α, d)f(θ) =    N i=1 g∈Aξ(ui) MVNp(xi; µg + Λgηi, Ψg)zig       N i=1 g∈Aξ(ui) MVNq(ηi; 0, Iq)    N i=1 ∞ g=1 πg ξg 1 ui < ξg zig ∞ g=1 (1 − vg)α+gd−1 vd g B(1 − d, α + gd) f(θ) Where f(θ) is the product of the conjugate priors True G estimated by number of non-empty clusters visited most often; cluster-specific inference conducted only on those visits Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 16 / 21
  • 17. Olive Oil Benchmarking Data on %-composition of 8 fatty acids found in 572 Italian olive oils, from three Areas: Southern Italy, Sardinia, and Northern Italy12 Within each area there are a number of different Regions: True G hypothesised to correspond to either 3 Areas or 9 Regions Full suite of models fitted13 : results for the methods relying on ranges of G &/or q values being supplied based on G = 1, . . . , 9 & q = 0, . . . , 6, respectively, with the optimal model chosen by BICM where necessary 12 Forina et al. (1983) 13 Iterations=50,000, Burn-in=10,000, Thinning=2nd Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 17 / 21
  • 18. Olive Oil Benchmarking Model # Models Time (s) Rel. Time α d G Q Adj. Rand Error (%) IMIFA 1 782 1.00 0.75 0.01 4 6, 2, 3, 2 0.93 8.57 IMFA 7 3661 4.68 0.88 0.01 5 6, 6, 6, 6, 6 0.92 10.31 OMIFA 1 818 1.05 0.02 – 5 6, 2, 2, 2, 2 0.90 15.56 OMFA 7 3336 4.27 0.02 – 5 6, 6, 6, 6, 6 0.85 15.91 MIFA 9 2608 3.33 1 – 5 5, 2, 2, 3, 2 0.91 14.86 MFA 63 11083 14.17 1 – 2 5, 5 0.82 17.13 IFA 1 78 0.10 – – 1 6 – – FA 7 287 0.37 – – 1 6 – – Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 18 / 21
  • 19. Olive Oil Benchmarking Model # Models Time (s) Rel. Time α d G Q Adj. Rand Error (%) IMIFA 1 782 1.00 0.75 0.01 4 6, 2, 3, 2 0.93 8.57 IMFA 7 3661 4.68 0.88 0.01 5 6, 6, 6, 6, 6 0.92 10.31 OMIFA 1 818 1.05 0.02 – 5 6, 2, 2, 2, 2 0.90 15.56 OMFA 7 3336 4.27 0.02 – 5 6, 6, 6, 6, 6 0.85 15.91 MIFA 9 2608 3.33 1 – 5 5, 2, 2, 3, 2 0.91 14.86 MFA 63 11083 14.17 1 – 2 5, 5 0.82 17.13 IFA 1 78 0.10 – – 1 6 – – FA 7 287 0.37 – – 1 6 – – Large Southern Italy cluster requires large amount of factors (6): flexibility to model other clusters with less factors greatly improves clustering performance Optimal models chosen by BICM were not all optimal in a clustering sense: candidate G = 4 MIFA model yields Adj. Rand=0.94 & Error Rate=8.39% Fully automatic IMIFA requires only one quick run, gives best clustering performance, and does not rely on any model selection criteria Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 18 / 21
  • 20. Olive Oil Benchmarking Comparing IMIFA’s MAP clustering against Area suggests possible 4th group, separating Umbria from Liguria in the North, which makes geographic sense Table: 3 Areas Adj. Rand = 0.9345 Error Rate = 8.57% 1 2 3 4 South 323 0 0 0 Sardinia 0 97 1 0 North 0 0 103 48 Table: 4 Areas Adj. Rand = 0.9961 Error Rate = 0.7% 1 2 3 4 South 323 0 0 0 Sardinia 0 97 1 0 Liguria 0 0 100 0 Umbria 0 0 3 48 Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 19 / 21
  • 21. Metabolomic IMIFA IMIFA with isotropic uniquenesses finds a 3-cluster solution: Each cluster has 2 latent factors Outlying observation misclassified by MIFA now given its own component Table: Urine Data Adj. Rand = 0.8944 Error Rate = 5.56% 1 2 3 Control 9 0 0 Epileptic 0 8 1 Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 20 / 21
  • 22. References ________ __________________ /_ __/ |/ /_ __/ ___/ _ / / / /|_// / / / / /__/ /_ _/ /_/ / / /_/ /_/ ___/ /___ /____/_/ /_/_____/_/ /_/ _ version 1.2.0 Murphy, K., Gormley, I. C. and Viroli, C. (2017) Infinite Mixtures of Infinite Factor Analysers: Nonparametric Model-Based Clustering via Latent Gaussian Models, arXiv:1701.07010v3 Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th 2017 21 / 21