1. ∞ Infinite Mixtures of Infinite Factor Analysers ∞
Nonparametric Model-Based Clustering via Latent Gaussian Models
Keefe Murphy 1, 2
Dr. Isobel Claire Gormley 1, 2
Dr. Cinzia Viroli 3
1School of Mathematics and Statistics, UCD
2Insight Centre for Data Analytics, UCD
3Department of Statistical Sciences, University of Bologna
keefe.murphy@ucd.ie
May 17th 2017
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 1 / 21
2. Motivation
Clustering problems are becoming increasingly high-dimensional; in such cases,
many common techniques perform poorly and may even be intractable
IMIFA offers a ‘choice-free’, Bayesian nonparametric approach to fitting
mixture models which assume FA covariance within each component. . .
. . . with particular focus on clustering N p data sets
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 2 / 21
3. Agenda
Mixtures of Factor Analysers
Mixtures of Infinite Factor Analysers
Examples & Results
Infinite Mixtures of Infinite Factor Analysers
Examples & Results
Discussion
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 3 / 21
4. (Finite) Mixtures of (Finite) Factor Analysers (MFA)
MFA is a Gaussian latent variable model, simultaneously achieving
dimension reduction and clustering in high-dimensional data settings
Supposes p-dimensional feature vector xi ∀ i = 1, . . . , N arises with
probability πg ∀ g = 1, . . . , G from a cluster-specific FA model:
xi − µg = Λgηi + ig
where:
µg
= mean vector
ηi
= q-vector of unobserved latent factors where q p
Λg = p × q loadings matrix
ig = error vector ∼ MVNp 0, Ψg
Ψg = diagonal matrix with non-zero entries known as uniquenesses
π = cluster mixing proportions
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 4 / 21
5. (Finite) Mixtures of (Finite) Factor Analysers (MFA)
A latent indicator zi is introduced, s.t.
zig =
1 if i ∈ cluster g
0 otherwise
Provides parsimonious covariance structure:
P (xi) =
G
g=1
πgMVNp µg, Λg Λg + Ψg
Conjugate priors assumed: facilitates MCMC sampling via Gibbs updates
Isotropic constraint: Ψg = σg
2
Ip provides link to MPPCA1
1
Tipping & Bishop (1999)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 5 / 21
6. (Finite) Mixtures of (Finite) Factor Analysers (MFA)
Limitation of this approach is that ranges of values for the numbers of clusters
G and factors q must be specified in advance
Usually the pair which optimises some model selection criterion are chosen
Computationally intensive and generally only reasonable to fit models where
q is the same across clusters, e.g. PGMM2
However, unlike PGMM, q = 0 is allowed
2
McNicholas & Murphy (2008)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 6 / 21
7. (Finite) Mixtures of Infinite Factor Analysers (MIFA)
Allows each Λg matrix to theoretically have infinitely many factors, using a
multiplicative gamma process shrinkage prior3
(MGP)
Loadings: λjkg ∼ N 0, φ−1
jkgτ−1
kg
Local Shrinkage: φjkg ∼ Ga (ν + 1, ν) (sparsity)
Global Shrinkage: τkg =
k
h=1
δhg δ1g ∼ Ga (α1, 1) δhg ∼ Ga (α2, 1) ∀ h ≥ 2
Increasingly shrinks loadings toward zero as column index k → ∞
3
Bhattacharya & Dunson (2011)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 7 / 21
8. (Finite) Mixtures of Infinite Factor Analysers (MIFA)
Prior distribution of loadings in 1st
, 2nd
& 3rd
columns of typical Λg matrix
−4 −2 0 2 4
0.00.20.40.60.81.0
Density
λ
−4 −2 0 2 4
0.00.20.40.60.81.0
Density
λ
−4 −2 0 2 4
0.00.20.40.60.81.0
Density
λ
MIFA allows different clusters be modelled by different numbers of factors
MIFA significantly reduces model search to one for G only, as qg is
estimated automatically during model fitting
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 8 / 21
9. Practical Considerations. . .
Diminishing AGS balances keeping important factors vs. discarding redun-
dant ones: uses posterior mode of # of effective factors to estimate each ˆqg
Identifiability issues addressed via Procrustean methods4
; each sampled Λg
mapped to a template matrix at burn-in to ensure sensible posterior means
Label switching addressed offline by also mapping zi samples to a template,
using the square-assignment algorithm5
Optimal G chosen via BICM6
, which is particularly useful for nonparametric
models where the # of ‘free’ parameters is hard to quantify
4
Ghosh & Dunson (2008)
5
Carpaneto & Toth (1980)
6
Raftery et al. (2007)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 9 / 21
10. Spectral Metabolomic Data7
189 spectral peaks from urine samples of 18 subjects (N p)
Half have epilepsy, half are control subjects
Chemical Shift (ppm)
9 8 7 6 5 4 3 2 1
Control
Epileptic
With fixed G = 2, BICM chooses qg = 3 for each MFA cluster:
4 subjects clustered incorrectly
For G ∈ {1, . . . , 5}, BICM chooses G = 2 MIFA components:
1 subject clustered incorrectly
7
Carmody et al. (2010)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 10 / 21
11. Spectral Metabolomic Data
Uncertainty in ˆqg for MIFA clusters quantified by 95% credible intervals:
ˆq1 ∈ [1, 4] and ˆq2 ∈ [2, 5] – greater complexity in the epileptic cluster
1 2 3 4 5 6
qg
02000400060008000
Frequency
Cluster 1
Cluster 2
Cluster 1: q1 = 2
1 2
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Factors
ChemicalShift(ppm)
Cluster 2: q2 = 3
1 2 3
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Factors
ChemicalShift(ppm)Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 11 / 21
12. Infinite Mixtures of Infinite Factor Analysers (IMIFA)
Need to choose optimal number of latent factors in a MFA model has been
obviated using MIFA, but issue of model choice still not entirely resolved. . .
Extension: estimate G in a similarly ‘choice-free’ manner:
Overfitted Mixtures (OMIFA/OMFA)
(Poisson -) Dirichlet Processes (IMIFA/IMFA)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 12 / 21
13. Dirichlet Processes (DP)
Probability distribution H ∼ DP (α, H0) if every marginal of H on finite
partitions of the domain Ω are Dirichlet distributed8
:
H (A1) , . . . , H (Ar) ∼ Dir αH0 (A1) , . . . , αH0 (Ar)
A1 ∪ . . . ∪ Ar = Ω
Base distribution: H0 – interpreted as the prior guess for the parameters of
the model or the mean of the DP:
E [H(A)] = H0(A)
Concentration parameter: α – expresses strength of belief in H0:
V [H(A)] =
H0(A)(1 − H0(A))
α + 1
8
Ferguson (1973)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 13 / 21
14. Pitman-Yor Processes (PYP)9
PYP (α, d, H0) with ‘discount’ d ∈ [0, 1) : d = 0 reduces to DP (α, H0)
d > 0 obtains heavy-tailed less informative prior with no tractability sacrifices
Spike-&-slab prior on d: κδ0 + (1 − κ) Beta (d | a , b ) used to assess whether
data arose from DP or more general PYP at little extra computational cost
0.000.050.100.15
No. of Clusters (g)
P(G=g)
1 5 10 15 20 25 30 35 40 45 50
α = 3
α = 10
α = 25
0.000.050.100.15
No. of Clusters (g)
P(G=g)
1 5 10 15 20 25 30 35 40 45 50
α
19.23
6.47
1
d
0
0.47
0.73
9
Perman (1992)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 14 / 21
15. Sampling Infinite Mixture Models
Stick-Breaking10
Representation:
vg ∼ Beta(1 − d, α + gd), θg ∼ H0, f xi ∝
∞
g=1
πg MVNp xi; θg
πg = vg
g−1
l=1
(1 − vl), H =
∞
g=1
πg δθg ∼ PYP (α, d, H0)
θ = {µ, Λ, Ψ} denotes the whole set of parameters of the FA models
Draws are composed of a weighted sum of infinitely many point masses
‘Independent’ Slice-Efficient Sampler11
facilitates adaptively truncating sufficient
# ‘active‘ components needed to be sampled at each iteration (variable, finite)
10
Sethuraman (1994)
11
Kalli et al. (2011)
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 15 / 21
16. Infinite Mixtures of Infinite Factor Analysers (IMIFA)
f(X, η, Z, u, V , θ) ∝ f(X | η, Z, u, V , θ)f(η | u)f(Z, u | V , π)f(V | α, d)f(θ)
=
N
i=1 g∈Aξ(ui)
MVNp(xi; µg + Λgηi, Ψg)zig
N
i=1 g∈Aξ(ui)
MVNq(ηi; 0, Iq)
N
i=1
∞
g=1
πg
ξg
1 ui < ξg
zig ∞
g=1
(1 − vg)α+gd−1
vd
g B(1 − d, α + gd)
f(θ)
Where f(θ) is the product of the conjugate priors
True G estimated by number of non-empty clusters visited most often;
cluster-specific inference conducted only on those visits
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 16 / 21
17. Olive Oil Benchmarking
Data on %-composition of 8 fatty acids found in 572 Italian olive oils, from
three Areas: Southern Italy, Sardinia, and Northern Italy12
Within each area there are a number of different Regions:
True G hypothesised to correspond to either 3 Areas or 9 Regions
Full suite of models fitted13
: results for the methods relying on ranges of
G &/or q values being supplied based on G = 1, . . . , 9 & q = 0, . . . , 6,
respectively, with the optimal model chosen by BICM where necessary
12
Forina et al. (1983)
13
Iterations=50,000, Burn-in=10,000, Thinning=2nd
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 17 / 21
19. Olive Oil Benchmarking
Model
#
Models
Time
(s)
Rel.
Time α d G Q
Adj.
Rand
Error
(%)
IMIFA 1 782 1.00 0.75 0.01 4 6, 2, 3, 2 0.93 8.57
IMFA 7 3661 4.68 0.88 0.01 5 6, 6, 6, 6, 6 0.92 10.31
OMIFA 1 818 1.05 0.02 – 5 6, 2, 2, 2, 2 0.90 15.56
OMFA 7 3336 4.27 0.02 – 5 6, 6, 6, 6, 6 0.85 15.91
MIFA 9 2608 3.33 1 – 5 5, 2, 2, 3, 2 0.91 14.86
MFA 63 11083 14.17 1 – 2 5, 5 0.82 17.13
IFA 1 78 0.10 – – 1 6 – –
FA 7 287 0.37 – – 1 6 – –
Large Southern Italy cluster requires large amount of factors (6): flexibility to
model other clusters with less factors greatly improves clustering performance
Optimal models chosen by BICM were not all optimal in a clustering sense:
candidate G = 4 MIFA model yields Adj. Rand=0.94 & Error Rate=8.39%
Fully automatic IMIFA requires only one quick run, gives best clustering
performance, and does not rely on any model selection criteria
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 18 / 21
20. Olive Oil Benchmarking
Comparing IMIFA’s MAP clustering against Area suggests possible 4th
group,
separating Umbria from Liguria in the North, which makes geographic sense
Table: 3 Areas
Adj. Rand = 0.9345
Error Rate = 8.57%
1 2 3 4
South 323 0 0 0
Sardinia 0 97 1 0
North 0 0 103 48
Table: 4 Areas
Adj. Rand = 0.9961
Error Rate = 0.7%
1 2 3 4
South 323 0 0 0
Sardinia 0 97 1 0
Liguria 0 0 100 0
Umbria 0 0 3 48
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 19 / 21
21. Metabolomic IMIFA
IMIFA with isotropic uniquenesses finds a 3-cluster solution:
Each cluster has 2 latent factors
Outlying observation misclassified by MIFA now given its own component
Table: Urine Data
Adj. Rand = 0.8944
Error Rate = 5.56%
1 2 3
Control 9 0 0
Epileptic 0 8 1
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 20 / 21
22. References
________ __________________
/_ __/ |/ /_ __/ ___/ _
/ / / /|_// / / / / /__/ /_
_/ /_/ / / /_/ /_/ ___/ /___
/____/_/ /_/_____/_/ /_/ _ version 1.2.0
Murphy, K., Gormley, I. C. and Viroli, C. (2017) Infinite Mixtures of
Infinite Factor Analysers: Nonparametric Model-Based Clustering via
Latent Gaussian Models, arXiv:1701.07010v3
Keefe Murphy et al. - CASI 2017 Infinite Mixtures of Infinite Factor Analysers May 17th
2017 21 / 21