SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Divergence-based center clustering and their
applications
Frank Nielsen
´Ecole Polytechnique
Sony Computer Science Laboratories, Inc
ICMS International Center for Mathematical Sciences
Edinburgh, Sep. 21-25, 2015
Computational information geometry for image and signal processing
c 2015 Frank Nielsen 1
Center-based clustering [12]: Setting up the context
Countless applications of clustering: quantization (coding), finding
categories (unsupervised-clustering), technique for speeding-up
computations (e.g., distances), and so on.
Minimize objective/energy/loss function:
E(X = {x1, ..., xk}; C = {c1, ..., ck}) = min
C
n
i=1
min
j∈[k]
D(xi : cj )
Initialize k cluster centers (seeds): random (Forgy), global
k-means (discrete k-means), randomized k-means++
(expected guarantee ˜O(log k))
Famous heuristics: Lloyd’s batched allocation
(assignment/center relocation), Hartigan’s single point
reassignment. Guarantees monotone convergence
variational k-means: When centroids arg min n
i=1 D(xi : c)
not in closed form, center relocation just need to be better
(not best) to still guarantee monotone convergence
c 2015 Frank Nielsen 2
The trick of mixed
divergences [13, 12]: Dual
centroids per cluster
c 2015 Frank Nielsen 3
Mixed divergences [12]
Defined on three parameters p, q and r:
Mλ(p : q : r)
eq
= λD(p : q) + (1 − λ)D(q : r)
for λ ∈ [0, 1].
Mixed divergences include:
the sided divergences for λ ∈ {0, 1},
the symmetrized (arithmetic mean) divergence for λ = 1
2, or
skew symmetrized for λ ∈ (0, 1), λ = 1
2.
c 2015 Frank Nielsen 4
Symmetrizing α-divergences
Sα(p, q) =
1
2
(Dα(p : q) + Dα(q : p)) = S−α(p, q),
= M1
2
(p : q : p),
For α = ±1, we get half of Jeffreys divergence:
S±1(p, q) =
1
2
d
i=1
(pi
− qi
) log
pi
qi
same formula for probability/positive measures.
Centroids for symmetrized α-divergence usually not in closed
form.
How to perform center-based clustering without closed form
centroids?
c 2015 Frank Nielsen 5
Closed-form formula for Jeffreys positive centroid [7]
Jeffreys divergence is symmetrized α = ±1 divergences.
The Jeffreys positive centroid c = (c1, ..., cd ) of a set
{h1, ..., hn} of n weighted positive histograms with d bins can
be calculated component-wise exactly using the Lambert W
analytic function:
ci
=
ai
W ai
gi e
where ai = n
j=1 πj hi
j denotes the coordinate-wise arithmetic
weighted means and gi = n
j=1(hi
j )πj the coordinate-wise
geometric weighted means.
The Lambert analytic function W (positive branch) is defined
by W (x)eW (x) = x for x ≥ 0.
→ Jeffreys k-means clustering . But for α = 1, how to
cluster?
c 2015 Frank Nielsen 6
Mixed α-divergences/α-Jeffreys symmetrized divergence
Mixed α-divergence between a histogram x to two
histograms p and q:
Mλ,α(p : x : q) = λDα(p : x) + (1 − λ)Dα(x : q),
= λD−α(x : p) + (1 − λ)D−α(q : x),
= M1−λ,−α(q : x : p),
α-Jeffreys symmetrized divergence is obtained for λ = 1
2:
Sα(p, q) = M1
2
,α(q : p : q) = M1
2
,α(p : q : p)
skew symmetrized α-divergence is defined by:
Sλ,α(p : q) = λDα(p : q) + (1 − λ)Dα(q : p)
c 2015 Frank Nielsen 7
Mixed divergence-based k-means clustering
Initially, k distinct seeds from the dataset with li = ri .
Input: Weighted histogram set H, divergence D(·, ·), integer
k > 0, real λ ∈ [0, 1];
Initialize left-sided/right-sided seeds C = {(li , ri )}k
i=1;
repeat
// Assignment (as usual)
for i = 1, 2, ..., k do
Ci ← {h ∈ H : i = arg minj Mλ(lj : h : rj )};
end
// Dual-sided centroid relocation (the trick!)
for i = 1, 2, ..., k do
ri ← arg minx D(Ci : x) = h∈Ci
wj D(h : x);
li ← arg minx D(x : Ci ) = h∈Ci
wj D(x : h);
end
until convergence;
c 2015 Frank Nielsen 8
Mixed α-hard clustering: MAhC(H, k, λ, α)
Input: Weighted histogram set H, integer k > 0, real λ ∈ [0, 1],
real α ∈ R;
Let C = {(li , ri )}k
i=1 ← MAS(H, k, λ, α);
repeat
// Assignment
for i = 1, 2, ..., k do
Ai ← {h ∈ H : i = arg minj Mλ,α(lj : h : rj )};
end
// Centroid relocation
for i = 1, 2, ..., k do
ri ← h∈Ai
wi h
1−α
2
2
1−α
;
li ← h∈Ai
wi h
1+α
2
2
1+α
;
end
until convergence;
c 2015 Frank Nielsen 9
Coupled k-Means++ α-Seeding (extending k-means++)
Algorithm 1: Mixed α-seeding; MAS(H, k, λ, α)
Input: Weighted histogram set H, integer k ≥ 1, real λ ∈ [0, 1],
real α ∈ R;
Let C ← hj with uniform probability ;
for i = 2, 3, ..., k do
Pick at random histogram h ∈ H with probability:
πH(h)
eq
=
whMλ,α(ch : h : ch)
y∈H wy Mλ,α(cy : y : cy )
, (1)
// where (ch, ch)
eq
= arg min(z,z)∈C Mλ,α(z : h : z);
C ← C ∪ {(h, h)};
end
Output: Set of initial cluster centers C;
→ Guaranteed probabilistic bound. Just need to initialize! No
centroid computations as iterations not theoretically required
c 2015 Frank Nielsen 10
Learning statistical
mixtures with hard EM
k-GMLE [6]: fast,
guaranteed, low memory
footprint
c 2015 Frank Nielsen 11
Learning MMs: A geometric hard clustering viewpoint
Learn the parameters of a mixture m(x) = k
i=1 wi p(x|θi )
Maximize the complete data likelihood=clustering objective
function
max
W ,Λ
lc(W , Λ) =
n
i=1
k
j=1
zi,j log(wj p(xi |θj ))
= max
Λ
n
i=1
max
j∈[k]
log(wj p(xi |θj ))
≡ min
W ,Λ
n
i=1
min
j∈[k]
Dj (xi ) ,
where cj = (wj , θj ) (cluster prototype) and
Dj (xi ) = − log p(xi |θj ) − log wj are potential distance-like
functions.
⇒ further attach to each cluster (mixture component) a different
family of probability distributions.
c 2015 Frank Nielsen 12
Generalized k-MLE: learning statistical EF
mixtures [?, 16, 15, 1, 8]
Model-based clustering: Assignment of points to clusters:
Dwj ,θj ,Fj
(x) = − log pFj
(x; θj ) − log wj
k-GMLE :
1. Initialize weight W ∈ ∆k and family type (F1, ..., Fk) for each
cluster
2. Solve minΛ i minj Dj (xi ) (center-based clustering for W
fixed) with potential functions:
Dj (xi ) = − log pFj
(xi |θj ) − log wj
3. Solve family types maximizing the MLE in each cluster Cj by
choosing the parametric family of distributions Fj = F(γj )
that yields the best likelihood:
minF1=F(γ1),...,Fk =F(γk )∈F(γ) i minj Dwj ,θj ,Fj
(xi ).
∀l, γl = maxj F∗
j (ˆηl = 1
nl x∈Cl
tj (x)) + 1
nl x∈Cl
k(x).
4. Update weight W as the cluster point proportion
5. Test for convergence and go to step 2) otherwise.
Drawback = biased, non-consistent estimator due to Voronoic 2015 Frank Nielsen 13
Conformal divergences and
clustering. (by analogy to
Riemannian tensor metric)
c 2015 Frank Nielsen 14
Geometrically designed divergences
Plot of the convex generator F: Bregman [10], Jensen
(Burbea-Rao [9]), total Bregman [5].
q p
p+q
2
B(p : q)
J(p, q)
tB(p : q)
F : (x, F(x))
(p, F(p))
(q, F(q))
c 2015 Frank Nielsen 15
Divergences: Distortion measures
F a smooth convex function, the generator.
Skew Jensen divergences:
Jα(p : q) = αF(p) + (1 − α)F(q) − F(αp + (1 − α)q),
= (F(p)F(q))α − F((pq)α),
where (pq)γ = γp + (1 − γ)q = q + γ(p − q) and
(F(p)F(q))γ = γF(p)+(1−γ)F(q) = F(q)+γ(F(p)−F(q)).
Bregman divergences = limit cases of skew Jensen
B(p : q) = F(p) − F(q) − p − q, F(q) ,
lim
α→0
Jα(p : q) = B(p : q),
lim
α→1
Jα(p : q) = B(q : p).
Statistical Bhattacharrya divergence = Jensen for exponential
families [9]
Bhat(p1 : p2) = − log p1(x)α
p2(x)1−α
dν(x) = Jα(θ1 : θ2)
c 2015 Frank Nielsen 16
Total Bregman divergences
Conformal divergence, conformal factor ρ:
D (p : q) = ρ(p, q)D(p : q)
plays the rˆole of “regularizer” [17] and ensures robustness
Invariance by rotation of the axes of the design space
tB(p : q) =
B(p : q)
1 + F(q), F(q)
= ρB(q)B(p : q),
ρB(q) =
1
1 + F(q), F(q)
.
Total squared Euclidean divergence:
tE(p, q) =
1
2
p − q, p − q
1 + q, q
.
c 2015 Frank Nielsen 17
Total Jensen divergence: Illustration of the principle
p q(pq)α
F(p)
F(q)
(F(p)F(q))α
(F(p)F(q))β
Jα(p : q)
F((pq)α)
tJα(p : q)
F(p )
F(q )
(F(p )F(q ))α
(F(p )F(q ))β
Jα(p : q )
F((p q )α)
tJα(p : q )
p (p q )α
qO
O
c 2015 Frank Nielsen 18
Total Jensen divergences
tB(p : q) = ρB(q)B(p : q), ρB(q) =
1
1 + F(q), F(q)
tJα(p : q) = ρJ(p, q)Jα(p : q), ρJ(p, q) =
1
1 + (F(p)−F(q))2
p−q,p−q
Jensen-Shannon divergence, square root is a metric [3]:
JS(p, q) =
1
2
d
i=1
pi log
2pi
pi + qi
+
1
2
d
i=1
qi log
2qi
pi + qi
Lemma
The square root of the total Jensen-Shannon divergence is not a
metric.
c 2015 Frank Nielsen 19
Total Jensen divergences/Total Bregman divergences
Total Jensen is not a generalization of total Bregman.
limit cases α ∈ {0, 1}, we have:
lim
α→0
tJα(p : q) = ρJ(p, q)B(p : q) = ρB(q)B(p : q),
lim
α→1
tJα(p : q) = ρJ(p, q)B(q : p) = ρB(p)B(q : p),
since conformal factors ρJ(p, q) = ρB(q).
c 2015 Frank Nielsen 20
Conformal factor from mean value theorem
When p q, ρJ(p, q) ρB(q), and the total Jensen divergence
tends to the total Bregman divergence for any value of α.
ρJ(p, q) =
1
1 + F( ), F( )
= ρB( ),
for ∈ [p, q].
For univariate generators, explicitly the value of :
= F−1 ∆F
∆
= F∗ ∆F
∆
,
where F∗ is the Legendre convex conjugate [9].
c 2015 Frank Nielsen 21
Centroids and statistical robustness
Centroids (barycenters) are minimizers of average (weighted)
divergences:
L(x; w) =
n
i=1
wi × tJα(pi : x),
cα = arg min
x∈X
L(x; w),
Is it unique?
Is it robust to outliers [4]?
Iterative convex-concave procedure (CCCP) [9]
c 2015 Frank Nielsen 22
Clustering: No closed-form centroid, no cry!
k-means++ [2] picks up randomly seeds, no centroid calculation.
Algorithm 2: Total Jensen k-means++ seeding
Input: Number of clusters k ≥ 1;
Let C ← {hj } with uniform probability ;
for i = 2, 3, ..., k do
Pick at random h ∈ H with probability:
πH(h) =
tJα(ch : h)
y∈H tJα(cy : y)
where ch = arg minz∈C tJα(z : h);
C ← C ∪ {h};
end
Output: Set of initial cluster centers C;
c 2015 Frank Nielsen 23
Total Jensen divergences: Recap
Total Jensen divergence = conformal divergence with
non-separable double-sided conformal factor.
Invariant to axis rotation of “design space“
Equivalent to total Bregman divergences [17, 5] only when
p q
Square root of total Jensen-Shannon divergence is not a
metric but square root of total JS is a metric.
Total Jensen k-means++ do not require centroid
computations and guaranteed approximation
Interest of conformal divergences in SVM [18] (double-sided
separable), in information geometry [14] (flattening).
c 2015 Frank Nielsen 24
Novel heuristics for
NP-hard center-based
clustering: merge-and-split
and (k, l)-means [11]
c 2015 Frank Nielsen 25
The k-means merge-and-split heuristic
Generalize Hartigan’s single-point relocation heuristic...
Consider pairs of clusters (Ci , Cj ) with centers ci and cj ,
merge them and split them again in two clusters using new
centers ci and cj . Accept when the sum of these two cluster
variance decreases:
∆(Ci , Cj ) = V (Ci , ci ) + V (Cj , cj ) − (V (Ci , ci ) + V (Cj , cj ))
How to split again two merged clusters (best splitting is
NP-hard)?
a discrete 2-means: We choose among the ni,j = ni + nj points
of Ci,j the two best centers (naively implemented in O(n3
)).
This yields a 2-approximation of 2-means.
a 2-means++ heuristic: We pick ci at random, then pick cj
randomly according to the normalized distribution of the
squared distances of the points in Ci,j to ci , see k-means++.
We repeat a given number α of rounds this initialization (say,
α = 1 + 0.01 ni,j
2 ) and keeps the best one.
c 2015 Frank Nielsen 26
The k-means merge-and-split heuristic
ops=number of pivot operations
Data set Hartigan Discrete Hartigan Merge&Split
cost #ops cost #ops cost #ops
Iris(d=4,n=150,k=3) 112.35 35.11 101.69 33.54 83.95 31.36
Wine(d=13,n=178,k=3) 607303 97.88 593319 100.02 570283 100.47
Yeast(d=8,n=1484,k=10) 47.10 1364.0 57.34 807.83 50.20 190.58
Data set Hartigan++ Discrete Hartigan++ Merge&Split++
cost #ops cost #ops cost #ops
Iris(d=4,n=150,k=3) 101.49 19.40 90.48 18.93 88.56 8.84
Wine(d=13,n=178,k=3) 3152616 18.76 2525803 24.61 2498107 9.67
Yeast(d=8,n=1484,k=10) 47.41 1192.38 54.96 640.89 51.82 66.30
c 2015 Frank Nielsen 27
The (k, l)-means heuristic: navigating on the local minima!
Associate to each pi to its l nearest cluster centers
NNl (pi ; K) (with iNNl = cluster center indexes), and
minimize the (k, l)-means objective function (with 1 ≤ l ≤ k):
e(P, K; l) =
n
i=1 a∈iNNl (pi ;K)
pi − ca
2
.
Assignment/relocation guarantees monotonous decrease.
Higher l means = local optima in optimization landscape
conversion to k-means
(k, l) ↓-means: convert a (k, l)-means by assigning to each
point pi its closest neighbor (among the l assigned at the end
of the (k, l)-means), and then compute the centroids and
launch a regular Lloyd’s k-means to finalize.
(k, l)-means: cascading conversion of (k, l)-means to
k-means: After convergence of (k, l)-means, initialize a
(k, l − 1) means by dropping for each point pi its farthest
cluster and perform a Lloyd’s (k, l − 1)-means, etc until we get
a (k, 1)-means=k-means. .
c 2015 Frank Nielsen 28
The (k, l)-means heuristic: 10000 trials
Data-set: Iris
(k, l) ↓-means: convert a (k, l)-means by assigning to each point pi its closest neighbor (among the l
assigned at the end of the (k, l)-means), and then compute the centroids and launch a regular Lloyd’s
k-means to finalize.
(k, l)-means: cascading conversion of (k, l)-means to k-means: After convergence of (k, l)-means,
initialize a (k, l − 1) means by dropping for each point pi its farthest cluster and perform a Lloyd’s
(k, l − 1)-means, etc until we get a (k, 1)-means=k-means. .
k win k-means (k, 2) ↓-means
min avg min avg
3 20.8 78.94 92.39 78.94 78.94
4 24.29 57.31 63.15 57.31 70.33
5 57.76 46.53 52.88 49.74 51.10
6 80.55 38.93 45.60 38.93 41.63
7 76.67 34.18 40.00 34.29 36.85
8 80.36 29.87 36.05 29.87 32.52
9 78.85 27.76 32.91 27.91 30.15
10 79.88 25.81 30.24 25.97 28.02
k l win k-means (k, l)-means
min avg min avg
5 2 58.3 46.53 52.72 49.74 51.24
5 4 62.4 46.53 52.55 49.74 49.74
8 2 80.8 29.87 36.40 29.87 32.54
8 3 61.1 29.87 36.19 32.76 34.04
8 6 55.5 29.88 36.189 32.75 35.26
10 2 78.8 25.81 30.61 25.97 28.23
10 3 82.5 25.95 30.23 26.47 27.76
10 5 64.7 25.90 30.32 26.99 28.61
On average better cost, but better local minima found by normal
k-means...
c 2015 Frank Nielsen 29
Thank you!
c 2015 Frank Nielsen 30
Bibliography I
Hartigans method for k-MLE]: Mixture Modeling with Wishart Distributions and Its Application to Motion
Retrieval, url=http://dx.doi.org/10.1007/978-3-319-05317-2 11, publisher=Springer International
Publishing, author=Saint-Jean, Christophe and Nielsen, Frank, pages=301-330.
In Frank Nielsen, editor, Geometric Theory of Information, Signals and Communication Technology. 2014.
David Arthur and Sergei Vassilvitskii.
k-means++: the advantages of careful seeding.
In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages
1027–1035. Society for Industrial and Applied Mathematics, 2007.
Bent Fuglede and Flemming Topsoe.
Jensen-Shannon divergence and Hilbert space embedding.
In IEEE International Symposium on Information Theory, pages 31–31, 2004.
F. R. Hampel, P. J. Rousseeuw, E. Ronchetti, and W. A. Stahel.
Robust Statistics: The Approach Based on Influence Functions.
Wiley Series in Probability and Mathematical Statistics, 1986.
Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen.
Shape retrieval using hierarchical total Bregman soft clustering.
Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012.
Frank Nielsen.
k-MLE: A fast algorithm for learning statistical mixture models.
CoRR, abs/1203.5181, 2012.
preliminary version in ICASSP.
Frank Nielsen.
Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation
for frequency histograms.
Signal Processing Letters, IEEE, 20(7):657–660, 2013.
c 2015 Frank Nielsen 31
Bibliography II
Frank Nielsen.
On learning statistical mixtures maximizing the complete likelihood.
Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2014),
1641:238–245, 2014.
Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, August 2011.
Frank Nielsen and Richard Nock.
Sided and symmetrized Bregman centroids.
Information Theory, IEEE Transactions on, 55(6):2882–2904, 2009.
Frank Nielsen and Richard Nock.
Further heuristics for k-means: The merge-and-split heuristic and the (k, l)-means.
arXiv preprint arXiv:1406.6314, 2014.
Frank Nielsen, Richard Nock, and Shun-ichi Amari.
On clustering histograms with k-means by using mixed α-divergences.
Entropy, 16(6):3273–3301, 2014.
Richard Nock, Panu Luosto, and Jyrki Kivinen.
Mixed Bregman clustering with approximation guarantees.
In Machine Learning and Knowledge Discovery in Databases, pages 154–169. Springer, 2008.
Atsumi Ohara, Hiroshi Matsuzoe, and Shun-ichi Amari.
A dually flat structure on the space of escort distributions.
Journal of Physics: Conference Series, 201(1):012012, 2010.
c 2015 Frank Nielsen 32
Bibliography III
Olivier Schwander and Frank Nielsen.
Fast learning of gamma mixture models with k-mle.
In Similarity-Based Pattern Recognition, pages 235–249. Springer, 2013.
Olivier Schwander, Aurelien J Schutz, Frank Nielsen, and Yannick Berthoumieu.
k-mle for mixtures of generalized Gaussians.
In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2825–2828. IEEE, 2012.
Baba Vemuri, Meizhu Liu, Shun-ichi Amari, and Frank Nielsen.
Total Bregman divergence and its applications to DTI analysis.
IEEE Transactions on Medical Imaging, pages 475–483, 2011.
Si Wu and Shun-ichi Amari.
Conformal transformation of kernel functions a data dependent way to improve support vector machine
classifiers.
Neural Processing Letters, 15(1):59–67, 2002.
c 2015 Frank Nielsen 33

Contenu connexe

Tendances

Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryFrank Nielsen
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningFrank Nielsen
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsVjekoslavKovac1
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsVjekoslavKovac1
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsVjekoslavKovac1
 
Estimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersEstimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersVjekoslavKovac1
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureVjekoslavKovac1
 

Tendances (20)

Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometry
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli... Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learning
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurations
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurations
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Estimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersEstimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliers
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

En vedette

INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massivesFrank Nielsen
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016Frank Nielsen
 
(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)Frank Nielsen
 
Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)Frank Nielsen
 
Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)Frank Nielsen
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Frank Nielsen
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Frank Nielsen
 
Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)Frank Nielsen
 
Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Frank Nielsen
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)Frank Nielsen
 
Computational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningComputational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningFrank Nielsen
 

En vedette (11)

INF442: Traitement des données massives
INF442: Traitement des données massivesINF442: Traitement des données massives
INF442: Traitement des données massives
 
Traitement massif des données 2016
Traitement massif des données 2016Traitement massif des données 2016
Traitement massif des données 2016
 
(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)(ISIA 5) Cours d'algorithmique (1995)
(ISIA 5) Cours d'algorithmique (1995)
 
Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)Traitement des données massives (INF442, A3)
Traitement des données massives (INF442, A3)
 
Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)Traitement des données massives (INF442, A2)
Traitement des données massives (INF442, A2)
 
Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)Traitement des données massives (INF442, A6)
Traitement des données massives (INF442, A6)
 
Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)Traitement des données massives (INF442, A5)
Traitement des données massives (INF442, A5)
 
Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)Traitement des données massives (INF442, A1)
Traitement des données massives (INF442, A1)
 
Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)Traitement des données massives (INF442, A7)
Traitement des données massives (INF442, A7)
 
On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)On representing spherical videos (Frank Nielsen, CVPR 2001)
On representing spherical videos (Frank Nielsen, CVPR 2001)
 
Computational Information Geometry for Machine Learning
Computational Information Geometry for Machine LearningComputational Information Geometry for Machine Learning
Computational Information Geometry for Machine Learning
 

Similaire à Divergence-based clustering and applications of total Jensen divergences

Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Frank Nielsen
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Frank Nielsen
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCEFrank Nielsen
 
Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Jagadeeswaran Rathinavel
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Slides: The Centroids of Symmetrized Bregman Divergences
Slides: The Centroids of Symmetrized Bregman DivergencesSlides: The Centroids of Symmetrized Bregman Divergences
Slides: The Centroids of Symmetrized Bregman DivergencesFrank Nielsen
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 

Similaire à Divergence-based clustering and applications of total Jensen divergences (20)

Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Igv2008
Igv2008Igv2008
Igv2008
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
 
Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Slides: The Centroids of Symmetrized Bregman Divergences
Slides: The Centroids of Symmetrized Bregman DivergencesSlides: The Centroids of Symmetrized Bregman Divergences
Slides: The Centroids of Symmetrized Bregman Divergences
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 

Dernier

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Dernier (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 

Divergence-based clustering and applications of total Jensen divergences

  • 1. Divergence-based center clustering and their applications Frank Nielsen ´Ecole Polytechnique Sony Computer Science Laboratories, Inc ICMS International Center for Mathematical Sciences Edinburgh, Sep. 21-25, 2015 Computational information geometry for image and signal processing c 2015 Frank Nielsen 1
  • 2. Center-based clustering [12]: Setting up the context Countless applications of clustering: quantization (coding), finding categories (unsupervised-clustering), technique for speeding-up computations (e.g., distances), and so on. Minimize objective/energy/loss function: E(X = {x1, ..., xk}; C = {c1, ..., ck}) = min C n i=1 min j∈[k] D(xi : cj ) Initialize k cluster centers (seeds): random (Forgy), global k-means (discrete k-means), randomized k-means++ (expected guarantee ˜O(log k)) Famous heuristics: Lloyd’s batched allocation (assignment/center relocation), Hartigan’s single point reassignment. Guarantees monotone convergence variational k-means: When centroids arg min n i=1 D(xi : c) not in closed form, center relocation just need to be better (not best) to still guarantee monotone convergence c 2015 Frank Nielsen 2
  • 3. The trick of mixed divergences [13, 12]: Dual centroids per cluster c 2015 Frank Nielsen 3
  • 4. Mixed divergences [12] Defined on three parameters p, q and r: Mλ(p : q : r) eq = λD(p : q) + (1 − λ)D(q : r) for λ ∈ [0, 1]. Mixed divergences include: the sided divergences for λ ∈ {0, 1}, the symmetrized (arithmetic mean) divergence for λ = 1 2, or skew symmetrized for λ ∈ (0, 1), λ = 1 2. c 2015 Frank Nielsen 4
  • 5. Symmetrizing α-divergences Sα(p, q) = 1 2 (Dα(p : q) + Dα(q : p)) = S−α(p, q), = M1 2 (p : q : p), For α = ±1, we get half of Jeffreys divergence: S±1(p, q) = 1 2 d i=1 (pi − qi ) log pi qi same formula for probability/positive measures. Centroids for symmetrized α-divergence usually not in closed form. How to perform center-based clustering without closed form centroids? c 2015 Frank Nielsen 5
  • 6. Closed-form formula for Jeffreys positive centroid [7] Jeffreys divergence is symmetrized α = ±1 divergences. The Jeffreys positive centroid c = (c1, ..., cd ) of a set {h1, ..., hn} of n weighted positive histograms with d bins can be calculated component-wise exactly using the Lambert W analytic function: ci = ai W ai gi e where ai = n j=1 πj hi j denotes the coordinate-wise arithmetic weighted means and gi = n j=1(hi j )πj the coordinate-wise geometric weighted means. The Lambert analytic function W (positive branch) is defined by W (x)eW (x) = x for x ≥ 0. → Jeffreys k-means clustering . But for α = 1, how to cluster? c 2015 Frank Nielsen 6
  • 7. Mixed α-divergences/α-Jeffreys symmetrized divergence Mixed α-divergence between a histogram x to two histograms p and q: Mλ,α(p : x : q) = λDα(p : x) + (1 − λ)Dα(x : q), = λD−α(x : p) + (1 − λ)D−α(q : x), = M1−λ,−α(q : x : p), α-Jeffreys symmetrized divergence is obtained for λ = 1 2: Sα(p, q) = M1 2 ,α(q : p : q) = M1 2 ,α(p : q : p) skew symmetrized α-divergence is defined by: Sλ,α(p : q) = λDα(p : q) + (1 − λ)Dα(q : p) c 2015 Frank Nielsen 7
  • 8. Mixed divergence-based k-means clustering Initially, k distinct seeds from the dataset with li = ri . Input: Weighted histogram set H, divergence D(·, ·), integer k > 0, real λ ∈ [0, 1]; Initialize left-sided/right-sided seeds C = {(li , ri )}k i=1; repeat // Assignment (as usual) for i = 1, 2, ..., k do Ci ← {h ∈ H : i = arg minj Mλ(lj : h : rj )}; end // Dual-sided centroid relocation (the trick!) for i = 1, 2, ..., k do ri ← arg minx D(Ci : x) = h∈Ci wj D(h : x); li ← arg minx D(x : Ci ) = h∈Ci wj D(x : h); end until convergence; c 2015 Frank Nielsen 8
  • 9. Mixed α-hard clustering: MAhC(H, k, λ, α) Input: Weighted histogram set H, integer k > 0, real λ ∈ [0, 1], real α ∈ R; Let C = {(li , ri )}k i=1 ← MAS(H, k, λ, α); repeat // Assignment for i = 1, 2, ..., k do Ai ← {h ∈ H : i = arg minj Mλ,α(lj : h : rj )}; end // Centroid relocation for i = 1, 2, ..., k do ri ← h∈Ai wi h 1−α 2 2 1−α ; li ← h∈Ai wi h 1+α 2 2 1+α ; end until convergence; c 2015 Frank Nielsen 9
  • 10. Coupled k-Means++ α-Seeding (extending k-means++) Algorithm 1: Mixed α-seeding; MAS(H, k, λ, α) Input: Weighted histogram set H, integer k ≥ 1, real λ ∈ [0, 1], real α ∈ R; Let C ← hj with uniform probability ; for i = 2, 3, ..., k do Pick at random histogram h ∈ H with probability: πH(h) eq = whMλ,α(ch : h : ch) y∈H wy Mλ,α(cy : y : cy ) , (1) // where (ch, ch) eq = arg min(z,z)∈C Mλ,α(z : h : z); C ← C ∪ {(h, h)}; end Output: Set of initial cluster centers C; → Guaranteed probabilistic bound. Just need to initialize! No centroid computations as iterations not theoretically required c 2015 Frank Nielsen 10
  • 11. Learning statistical mixtures with hard EM k-GMLE [6]: fast, guaranteed, low memory footprint c 2015 Frank Nielsen 11
  • 12. Learning MMs: A geometric hard clustering viewpoint Learn the parameters of a mixture m(x) = k i=1 wi p(x|θi ) Maximize the complete data likelihood=clustering objective function max W ,Λ lc(W , Λ) = n i=1 k j=1 zi,j log(wj p(xi |θj )) = max Λ n i=1 max j∈[k] log(wj p(xi |θj )) ≡ min W ,Λ n i=1 min j∈[k] Dj (xi ) , where cj = (wj , θj ) (cluster prototype) and Dj (xi ) = − log p(xi |θj ) − log wj are potential distance-like functions. ⇒ further attach to each cluster (mixture component) a different family of probability distributions. c 2015 Frank Nielsen 12
  • 13. Generalized k-MLE: learning statistical EF mixtures [?, 16, 15, 1, 8] Model-based clustering: Assignment of points to clusters: Dwj ,θj ,Fj (x) = − log pFj (x; θj ) − log wj k-GMLE : 1. Initialize weight W ∈ ∆k and family type (F1, ..., Fk) for each cluster 2. Solve minΛ i minj Dj (xi ) (center-based clustering for W fixed) with potential functions: Dj (xi ) = − log pFj (xi |θj ) − log wj 3. Solve family types maximizing the MLE in each cluster Cj by choosing the parametric family of distributions Fj = F(γj ) that yields the best likelihood: minF1=F(γ1),...,Fk =F(γk )∈F(γ) i minj Dwj ,θj ,Fj (xi ). ∀l, γl = maxj F∗ j (ˆηl = 1 nl x∈Cl tj (x)) + 1 nl x∈Cl k(x). 4. Update weight W as the cluster point proportion 5. Test for convergence and go to step 2) otherwise. Drawback = biased, non-consistent estimator due to Voronoic 2015 Frank Nielsen 13
  • 14. Conformal divergences and clustering. (by analogy to Riemannian tensor metric) c 2015 Frank Nielsen 14
  • 15. Geometrically designed divergences Plot of the convex generator F: Bregman [10], Jensen (Burbea-Rao [9]), total Bregman [5]. q p p+q 2 B(p : q) J(p, q) tB(p : q) F : (x, F(x)) (p, F(p)) (q, F(q)) c 2015 Frank Nielsen 15
  • 16. Divergences: Distortion measures F a smooth convex function, the generator. Skew Jensen divergences: Jα(p : q) = αF(p) + (1 − α)F(q) − F(αp + (1 − α)q), = (F(p)F(q))α − F((pq)α), where (pq)γ = γp + (1 − γ)q = q + γ(p − q) and (F(p)F(q))γ = γF(p)+(1−γ)F(q) = F(q)+γ(F(p)−F(q)). Bregman divergences = limit cases of skew Jensen B(p : q) = F(p) − F(q) − p − q, F(q) , lim α→0 Jα(p : q) = B(p : q), lim α→1 Jα(p : q) = B(q : p). Statistical Bhattacharrya divergence = Jensen for exponential families [9] Bhat(p1 : p2) = − log p1(x)α p2(x)1−α dν(x) = Jα(θ1 : θ2) c 2015 Frank Nielsen 16
  • 17. Total Bregman divergences Conformal divergence, conformal factor ρ: D (p : q) = ρ(p, q)D(p : q) plays the rˆole of “regularizer” [17] and ensures robustness Invariance by rotation of the axes of the design space tB(p : q) = B(p : q) 1 + F(q), F(q) = ρB(q)B(p : q), ρB(q) = 1 1 + F(q), F(q) . Total squared Euclidean divergence: tE(p, q) = 1 2 p − q, p − q 1 + q, q . c 2015 Frank Nielsen 17
  • 18. Total Jensen divergence: Illustration of the principle p q(pq)α F(p) F(q) (F(p)F(q))α (F(p)F(q))β Jα(p : q) F((pq)α) tJα(p : q) F(p ) F(q ) (F(p )F(q ))α (F(p )F(q ))β Jα(p : q ) F((p q )α) tJα(p : q ) p (p q )α qO O c 2015 Frank Nielsen 18
  • 19. Total Jensen divergences tB(p : q) = ρB(q)B(p : q), ρB(q) = 1 1 + F(q), F(q) tJα(p : q) = ρJ(p, q)Jα(p : q), ρJ(p, q) = 1 1 + (F(p)−F(q))2 p−q,p−q Jensen-Shannon divergence, square root is a metric [3]: JS(p, q) = 1 2 d i=1 pi log 2pi pi + qi + 1 2 d i=1 qi log 2qi pi + qi Lemma The square root of the total Jensen-Shannon divergence is not a metric. c 2015 Frank Nielsen 19
  • 20. Total Jensen divergences/Total Bregman divergences Total Jensen is not a generalization of total Bregman. limit cases α ∈ {0, 1}, we have: lim α→0 tJα(p : q) = ρJ(p, q)B(p : q) = ρB(q)B(p : q), lim α→1 tJα(p : q) = ρJ(p, q)B(q : p) = ρB(p)B(q : p), since conformal factors ρJ(p, q) = ρB(q). c 2015 Frank Nielsen 20
  • 21. Conformal factor from mean value theorem When p q, ρJ(p, q) ρB(q), and the total Jensen divergence tends to the total Bregman divergence for any value of α. ρJ(p, q) = 1 1 + F( ), F( ) = ρB( ), for ∈ [p, q]. For univariate generators, explicitly the value of : = F−1 ∆F ∆ = F∗ ∆F ∆ , where F∗ is the Legendre convex conjugate [9]. c 2015 Frank Nielsen 21
  • 22. Centroids and statistical robustness Centroids (barycenters) are minimizers of average (weighted) divergences: L(x; w) = n i=1 wi × tJα(pi : x), cα = arg min x∈X L(x; w), Is it unique? Is it robust to outliers [4]? Iterative convex-concave procedure (CCCP) [9] c 2015 Frank Nielsen 22
  • 23. Clustering: No closed-form centroid, no cry! k-means++ [2] picks up randomly seeds, no centroid calculation. Algorithm 2: Total Jensen k-means++ seeding Input: Number of clusters k ≥ 1; Let C ← {hj } with uniform probability ; for i = 2, 3, ..., k do Pick at random h ∈ H with probability: πH(h) = tJα(ch : h) y∈H tJα(cy : y) where ch = arg minz∈C tJα(z : h); C ← C ∪ {h}; end Output: Set of initial cluster centers C; c 2015 Frank Nielsen 23
  • 24. Total Jensen divergences: Recap Total Jensen divergence = conformal divergence with non-separable double-sided conformal factor. Invariant to axis rotation of “design space“ Equivalent to total Bregman divergences [17, 5] only when p q Square root of total Jensen-Shannon divergence is not a metric but square root of total JS is a metric. Total Jensen k-means++ do not require centroid computations and guaranteed approximation Interest of conformal divergences in SVM [18] (double-sided separable), in information geometry [14] (flattening). c 2015 Frank Nielsen 24
  • 25. Novel heuristics for NP-hard center-based clustering: merge-and-split and (k, l)-means [11] c 2015 Frank Nielsen 25
  • 26. The k-means merge-and-split heuristic Generalize Hartigan’s single-point relocation heuristic... Consider pairs of clusters (Ci , Cj ) with centers ci and cj , merge them and split them again in two clusters using new centers ci and cj . Accept when the sum of these two cluster variance decreases: ∆(Ci , Cj ) = V (Ci , ci ) + V (Cj , cj ) − (V (Ci , ci ) + V (Cj , cj )) How to split again two merged clusters (best splitting is NP-hard)? a discrete 2-means: We choose among the ni,j = ni + nj points of Ci,j the two best centers (naively implemented in O(n3 )). This yields a 2-approximation of 2-means. a 2-means++ heuristic: We pick ci at random, then pick cj randomly according to the normalized distribution of the squared distances of the points in Ci,j to ci , see k-means++. We repeat a given number α of rounds this initialization (say, α = 1 + 0.01 ni,j 2 ) and keeps the best one. c 2015 Frank Nielsen 26
  • 27. The k-means merge-and-split heuristic ops=number of pivot operations Data set Hartigan Discrete Hartigan Merge&Split cost #ops cost #ops cost #ops Iris(d=4,n=150,k=3) 112.35 35.11 101.69 33.54 83.95 31.36 Wine(d=13,n=178,k=3) 607303 97.88 593319 100.02 570283 100.47 Yeast(d=8,n=1484,k=10) 47.10 1364.0 57.34 807.83 50.20 190.58 Data set Hartigan++ Discrete Hartigan++ Merge&Split++ cost #ops cost #ops cost #ops Iris(d=4,n=150,k=3) 101.49 19.40 90.48 18.93 88.56 8.84 Wine(d=13,n=178,k=3) 3152616 18.76 2525803 24.61 2498107 9.67 Yeast(d=8,n=1484,k=10) 47.41 1192.38 54.96 640.89 51.82 66.30 c 2015 Frank Nielsen 27
  • 28. The (k, l)-means heuristic: navigating on the local minima! Associate to each pi to its l nearest cluster centers NNl (pi ; K) (with iNNl = cluster center indexes), and minimize the (k, l)-means objective function (with 1 ≤ l ≤ k): e(P, K; l) = n i=1 a∈iNNl (pi ;K) pi − ca 2 . Assignment/relocation guarantees monotonous decrease. Higher l means = local optima in optimization landscape conversion to k-means (k, l) ↓-means: convert a (k, l)-means by assigning to each point pi its closest neighbor (among the l assigned at the end of the (k, l)-means), and then compute the centroids and launch a regular Lloyd’s k-means to finalize. (k, l)-means: cascading conversion of (k, l)-means to k-means: After convergence of (k, l)-means, initialize a (k, l − 1) means by dropping for each point pi its farthest cluster and perform a Lloyd’s (k, l − 1)-means, etc until we get a (k, 1)-means=k-means. . c 2015 Frank Nielsen 28
  • 29. The (k, l)-means heuristic: 10000 trials Data-set: Iris (k, l) ↓-means: convert a (k, l)-means by assigning to each point pi its closest neighbor (among the l assigned at the end of the (k, l)-means), and then compute the centroids and launch a regular Lloyd’s k-means to finalize. (k, l)-means: cascading conversion of (k, l)-means to k-means: After convergence of (k, l)-means, initialize a (k, l − 1) means by dropping for each point pi its farthest cluster and perform a Lloyd’s (k, l − 1)-means, etc until we get a (k, 1)-means=k-means. . k win k-means (k, 2) ↓-means min avg min avg 3 20.8 78.94 92.39 78.94 78.94 4 24.29 57.31 63.15 57.31 70.33 5 57.76 46.53 52.88 49.74 51.10 6 80.55 38.93 45.60 38.93 41.63 7 76.67 34.18 40.00 34.29 36.85 8 80.36 29.87 36.05 29.87 32.52 9 78.85 27.76 32.91 27.91 30.15 10 79.88 25.81 30.24 25.97 28.02 k l win k-means (k, l)-means min avg min avg 5 2 58.3 46.53 52.72 49.74 51.24 5 4 62.4 46.53 52.55 49.74 49.74 8 2 80.8 29.87 36.40 29.87 32.54 8 3 61.1 29.87 36.19 32.76 34.04 8 6 55.5 29.88 36.189 32.75 35.26 10 2 78.8 25.81 30.61 25.97 28.23 10 3 82.5 25.95 30.23 26.47 27.76 10 5 64.7 25.90 30.32 26.99 28.61 On average better cost, but better local minima found by normal k-means... c 2015 Frank Nielsen 29
  • 30. Thank you! c 2015 Frank Nielsen 30
  • 31. Bibliography I Hartigans method for k-MLE]: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval, url=http://dx.doi.org/10.1007/978-3-319-05317-2 11, publisher=Springer International Publishing, author=Saint-Jean, Christophe and Nielsen, Frank, pages=301-330. In Frank Nielsen, editor, Geometric Theory of Information, Signals and Communication Technology. 2014. David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1027–1035. Society for Industrial and Applied Mathematics, 2007. Bent Fuglede and Flemming Topsoe. Jensen-Shannon divergence and Hilbert space embedding. In IEEE International Symposium on Information Theory, pages 31–31, 2004. F. R. Hampel, P. J. Rousseeuw, E. Ronchetti, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. Wiley Series in Probability and Mathematical Statistics, 1986. Meizhu Liu, Baba C. Vemuri, Shun-ichi Amari, and Frank Nielsen. Shape retrieval using hierarchical total Bregman soft clustering. Transactions on Pattern Analysis and Machine Intelligence, 34(12):2407–2419, 2012. Frank Nielsen. k-MLE: A fast algorithm for learning statistical mixture models. CoRR, abs/1203.5181, 2012. preliminary version in ICASSP. Frank Nielsen. Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms. Signal Processing Letters, IEEE, 20(7):657–660, 2013. c 2015 Frank Nielsen 31
  • 32. Bibliography II Frank Nielsen. On learning statistical mixtures maximizing the complete likelihood. Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2014), 1641:238–245, 2014. Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, August 2011. Frank Nielsen and Richard Nock. Sided and symmetrized Bregman centroids. Information Theory, IEEE Transactions on, 55(6):2882–2904, 2009. Frank Nielsen and Richard Nock. Further heuristics for k-means: The merge-and-split heuristic and the (k, l)-means. arXiv preprint arXiv:1406.6314, 2014. Frank Nielsen, Richard Nock, and Shun-ichi Amari. On clustering histograms with k-means by using mixed α-divergences. Entropy, 16(6):3273–3301, 2014. Richard Nock, Panu Luosto, and Jyrki Kivinen. Mixed Bregman clustering with approximation guarantees. In Machine Learning and Knowledge Discovery in Databases, pages 154–169. Springer, 2008. Atsumi Ohara, Hiroshi Matsuzoe, and Shun-ichi Amari. A dually flat structure on the space of escort distributions. Journal of Physics: Conference Series, 201(1):012012, 2010. c 2015 Frank Nielsen 32
  • 33. Bibliography III Olivier Schwander and Frank Nielsen. Fast learning of gamma mixture models with k-mle. In Similarity-Based Pattern Recognition, pages 235–249. Springer, 2013. Olivier Schwander, Aurelien J Schutz, Frank Nielsen, and Yannick Berthoumieu. k-mle for mixtures of generalized Gaussians. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2825–2828. IEEE, 2012. Baba Vemuri, Meizhu Liu, Shun-ichi Amari, and Frank Nielsen. Total Bregman divergence and its applications to DTI analysis. IEEE Transactions on Medical Imaging, pages 475–483, 2011. Si Wu and Shun-ichi Amari. Conformal transformation of kernel functions a data dependent way to improve support vector machine classifiers. Neural Processing Letters, 15(1):59–67, 2002. c 2015 Frank Nielsen 33