SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Non-exhaustive,
overlapping K-means
clustering
David F. Gleich!
Purdue University!
Real-world graph and point data
have overlapping clusters. 
GeneRa
10 20 30 40 50 60 70
NM_003748NM_003862Contig32125_RCU82987AB037863NM_020974Contig55377_RCNM_003882NM_000849Contig48328_RCContig46223_RCNM_006117NM_003239NM_018401AF257175AF201951NM_001282Contig63102_RCNM_000286Contig34634_RCNM_000320AB033007AL355708NM_000017NM_006763AF148505Contig57595NM_001280AJ224741U45975Contig49670_RCContig753_RCContig25055_RCContig53646_RCContig42421_RCContig51749_RCAL137514NM_004911NM_000224NM_013262Contig41887_RCNM_004163AB020689NM_015416Contig43747_RCNM_012429AB033043AL133619NM_016569NM_004480NM_004798Contig37063_RCNM_000507AB037745Contig50802_RCNM_001007Contig53742_RCNM_018104Contig51963Contig53268_RCNM_012261NM_020244Contig55813_RCContig27312_RCContig44064_RCNM_002570NM_002900AL050090NM_015417Contig47405_RCNM_016337Contig55829_RCContig37598Contig45347_RCNM_020675NM_003234AL080110AL137295Contig17359_RCNM_013296NM_019013AF052159Contig55313_RCNM_002358NM_004358Contig50106_RCNM_005342NM_014754U58033Contig64688NM_001827Contig3902_RCContig41413_RCNM_015434NM_014078NM_018120NM_001124L27560Contig45816_RCAL050021NM_006115NM_001333NM_005496Contig51519_RCContig1778_RCNM_014363NM_001905NM_018454NM_002811NM_004603AB032973NM_006096D25328Contig46802_RCX94232NM_018004Contig8581_RCContig55188_RCContig50410Contig53226_RCNM_012214NM_006201NM_006372Contig13480_RCAL137502Contig40128_RCNM_003676NM_013437Contig2504_RCAL133603NM_012177R70506_RCNM_003662NM_018136NM_000158NM_018410Contig21812_RCNM_004052Contig4595Contig60864_RCNM_003878U96131NM_005563NM_018455Contig44799_RCNM_003258NM_004456NM_003158NM_014750Contig25343_RCNM_005196Contig57864_RCNM_014109NM_002808Contig58368_RCContig46653_RCNM_004504M21551NM_014875NM_001168NM_003376NM_018098AF161553NM_020166NM_017779NM_018265AF155117NM_004701NM_006281Contig44289_RCNM_004336Contig33814_RCNM_003600NM_006265NM_000291NM_000096NM_001673NM_001216NM_014968NM_018354NM_007036NM_004702Contig2399_RCNM_001809Contig20217_RCNM_003981NM_007203NM_006681AF055033NM_014889NM_020386NM_000599Contig56457_RCNM_005915Contig24252_RCContig55725_RCNM_002916NM_014321NM_006931AL080079Contig51464_RCNM_000788NM_016448X05610NM_014791Contig40831_RCAK000745NM_015984NM_016577Contig32185_RCAF052162AF073519NM_003607NM_006101NM_003875Contig25991Contig35251_RCNM_004994NM_000436NM_002073NM_002019NM_000127NM_020188AL137718Contig28552_RCContig38288_RCAA555029_RCNM_016359Contig46218_RCContig63649_RCAL080059
Social networks have overlapping
clusters because of social circles 
Genes have overlapping clusters
due to their role in multiple functions
SILO Seminar
David Gleich · Purdue
Overlapping research projects
are what got me here too!
PhD Thesis
on Google’s
PageRank
MSR Intern
and
Overlapping
Clusters for
Distributed
Computation
Accelerated
NCP plots
and locally
minimal
communities
Neighborhood
inflated seed
expansion for
overlapping
communities
Non-
exhaustive
overlapping "
K-means
SILO Seminar
David Gleich · Purdue 
1.  NISE Clustering - Whang, Gleich, Dhillon, CIKM 2013
2.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015
3.  NEO-K-means SDP "
Hou, Whang, Gleich, Dhillon, KDD 2015
4.  Multiplier Methods for Overlapping K-Means"
Hou, Whang, Gleich, Dhillon, Submitted
SILO Seminar
 David Gleich · Purdue
es around the seed sets
Overlapping communities via
seed set expansion works nicely. 
Filtering Phase
Seeding Phase
Seed Set Expansion Phase
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
Coverage (percentage)
M
Student Version of MATLAB
(a) AstroPh
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Coverage (percentage)
MaximumConductance
egonet
graclus centers
spread hubs
random
bigclam
(d) Flickr
Figure 2: Conductance vs. graph cov
centers” outperforms other seeding str
We can cover 95% of network with
communities of cond. ~0.15.
Flickr social network
2M vertices, 22M edges
cond(S)
= cut(S)/“size”(S)
SILO Seminar
David Gleich · Purdue
We wanted a more principled
approach to achieve these results.
SILO Seminar
David Gleich · Purdue
The state of the art for clustering
SILO Seminar
David Gleich · Purdue 
K-Means
Problem 1
 Problem 2
 Problem 3
 Problem 4
😀
 😊
 😟
 😢
K-Means
The state of the art for clustering
SILO Seminar
David Gleich · Purdue 
K-Means
Problem 1
 Problem 2
 Problem 3
 Problem 4
😀
 😊
K-Means
 NEO-K-Means
 NEO K-Means
😊
 😊
m1
m2
|| xi – m1 ||
|| xi – m2 ||
K-means as
optimization.
SILO Seminar
David Gleich · Purdue 
minimize
P
ij Uij kxi mj k
2
subject to U is an assignment to clusters
mj = 1P
i Ui j Uij xi
minimize
P
ij Uij kxi mj k
2
subject to U is an multi-assignment to clusters
mj = 1P
i Ui j Uij xi
Input Points x1, ... , xn
Find an assignment
matrix U that gives
cluster assignments
to minimize
x1
x2
x3
x4
U =
2
6
6
4
1 0
1 0
0 1
0 1
3
7
7
5
c1 c2
K-means objective!
K-means’ objective with overlap?!
Overlap is not a natural addition
to optimization based clustering. 
SILO Seminar
David Gleich · Purdue
The NEO-K-means objective
balances overlap and outliers.
SILO Seminar
David Gleich · Purdue 
minimize
P
ij Uij kxi mj k
2
subject to Uij is binary
trace(UT
U) = (1 + ↵)n (↵n overlap)
eT
Ind[Ue] (1 )n (up to n outliers)
mj = 1P
i Ui j Uij xi
· If ↵, = 0, then we get back to K-means.
· Automatically choose ↵, based on K-means.
😊
1. Make (1 + ↵)n total assignments.
2. Allow up to n outliers.
−8
−6
−4
−2
0
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
Lloyd’s algorithm for NEO-K-means
is just a wee-bit more complex.
SILO Seminar
David Gleich · Purdue 
Until done
1. Update centroids.
2. Assign (1 )n nodes to closest centroid
3. Make (↵ + )n assignments based on minimizing distance.
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
This algorithm
correctly assigns our
example case and
even determines
overlap and outlier
parameters!
THEOREM Lloyds
algorithm decrease the
objective monotonically.
The non-exhaustiveness is
necessary for assignments.
SILO Seminar
David Gleich · Purdue 
−6 −4 −2 0 2 4 6 8
Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
b) First extension of k-means
−8 −6 −4 −2 0 2 4
−8
−6
−4
−2
0
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
(c) NEO-K-Means
nerated (n=1,000, ↵=0.1, =0.005). Green points indicate o
−4 −2 0 2 4 6 8
& 2
gned
st extension of k-means
−8 −6 −4 −2 0 2 4 6 8
−8
−6
−4
−2
0
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
(c) NEO-K-Means
d (n=1,000, ↵=0.1, =0.005). Green points indicate overlap4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
Output without assignment constraint.
(beta = 1)
NEO-K-means output (correct)
The Weighted, Kernel "
NEO-K-Means objective. 
•  Introduce weights for each data point.
•  Introduce feature maps for each data point too.
SILO Seminar
David Gleich · Purdue 
minimize
P
ij Uij wi k (xi ) mj k
2
subject to Uij is binary
trace(UT
U) = (1 + ↵)n (↵n overlap)
eT
Ind[Ue] (1 )n (up to n outliers)
mj = 1P
i Uij wi
wi Uij xi
X
ij
Uij wi k (xi ) mj k
2
=
X
ij
Uij wi Kii
uj WKWuj
uT
j Wuj
!
Theorem If K = D 1
+ D 1
AD 1
, then the NEO-K-Means objective
is equivalent to overlapping conductance.
NOTE
This means that NEO-K-Means was
the principled objective we were after!
SILO Seminar
David Gleich · Purdue
Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:


Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
David Gleich · Purdue 
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
SILO Seminar
Our theorem means that NEO-K-Means
can optimize the sum-conductance obj.
SILO Seminar
David Gleich · Purdue 
(S) 
cut(S)
vol(S)
+
cut( ¯S)
vol( ¯S)
X
S2C
cut(S)
vol(S)
=
X
S2C
(S) if vol(S)  vol( ¯S)
Conductance
 Normalized cut bi-partition
NEO-K-Means"
objective
When we use this
method to partition the
Karate club network, we
get reasonable solutions.
•  Inspired by Dhillon et al.’s
work on Graclus 
•  We have a multilevel
method to optimize the
graph case.
We get state of the art clustering
perf. on vector and graph datasets. 
SILO Seminar
David Gleich · Purdue 
F1 scores on vector datasets from the Mulan repository. 
moc fuzzy esp isp okm rokm NEO
synth1 0.833 0.959 0.977 0.985 0.989 0.969 0.996
synth2 0.836 0.957 0.952 0.973 0.967 0.975 0.996
synth3 0.547 0.919 0.968 0.952 0.970 0.928 0.996
yeast - 0.308 0.289 0.203 0.311 0.203 0.366
music 0.534 0.533 0.527 0.508 0.527 0.454 0.550
scene 0.467 0.431 0.572 0.586 0.571 0.593 0.626
n dim. ¯|C| outliers k
synth1 5,000 2 2,750 0 2
synth2 1,000 2 550 5 2
synth3 6,000 2 3,600 6 2
yeast 2,417 103 731.5 0 14
music 593 72 184.7 0 6
scene 2,407 294 430.8 0 6
The Mulan testset has
a number of
appropriate datasets
NEO-K-Means with Lloyds is fast and
usually accurate but inconsistent. 
SILO Seminar
David Gleich · Purdue 
−6 −4 −2 0 2 4 6
−2
0
2
4
6
8
10
Cluster 1
Cluster 2
Cluster 1 & 2
Cluster 3
Not assigned
−4 −2 0 2 4 6
ster 1
ster 2
ster 1 & 2
ster 3
assigned
−6 −4 −2
−2
0
2
4
6
8
10
Cluster 1
Cluster 2
Cluster 1 & 2
Cluster 3
Not assigned
A more complicated
overlapping test case
The output from NEO-K-
Means with Lloyd’s method
Can we get a more robust method?

Yes! 
SILO Seminar
David Gleich · Purdue
Towards better optimization
of the objective
1.  An SDP relaxation of the objective.
2.  A practical low-rank SDP heuristic.
3.  Faster optimization methods for the heuristic.
SILO Seminar
David Gleich · Purdue
From assignments to co-
occurrence matrices 
SILO Seminar
David Gleich · Purdue 
There are three key variables in our formulation
1. The co-occurrence matrix
Z =
X
j
Wuj uT
j W/uT
j Wuj
2. The overlap vector f
3. The assignment indicator g
U =
2
6
6
4
1 0
1 1
0 1
0 0
3
7
7
5
f =
2
6
6
4
1
2
1
0
3
7
7
5 g =
2
6
6
4
1
1
1
0
3
7
7
5
We can convert our objective into a
trace minimization problem. 
SILO Seminar
David Gleich · Purdue 
Kij = (xi )T
(xj )
di = wi Kii
X
ij
Uij wi k (xi ) mj k
2
=
X
ij
Uij wi Kii
uj WKWuj
uT
j Wuj
!
=
X
ij
Uij wi Kii
X
j
uj WKWuj
uT
j Wuj
= fT
d trace(KZ)
Z = normalized co-occurrence
f = overlap count
g = assignment indicator
The objective function
There is an SDP-like framework to
solve NEO-K-means.
SILO Seminar
David Gleich · Purdue 
maximize
Z,f,g
trace(KZ) fT
d
subject to trace(W 1
Z) = k, (a)
Zij 0, (b)
Z ⌫ 0, Z = ZT
(c)
Ze = Wf, (d)
eT
f = (1 + ↵)n, (e)
eT
g (1 )n, (f)
f g, (g)
rank(Z) = k, (h)
f 2 Zn
0, g 2 {0, 1}n
. (i)
Z must come from
an assignment matrix
Overlap and assignment
constraints
Combinatorial constraints
There is an SDP-relaxation to
approximate NEO-K-means.
SILO Seminar
David Gleich · Purdue 
Z must come from
an assignment matrix
Overlap and assignment
constraints
maximize
Z,f,g
trace(KZ) fT
d
subject to trace(W 1
Z) = k, (a)
Zij 0, (b)
Z ⌫ 0, Z = ZT
(c)
Ze = Wf, (d)
eT
f = (1 + ↵)n, (e)
eT
g (1 )n, (f)
f g, (g)
0  g  1 Relaxed constraints
This SDP can easily solve
simple problems.
SILO Seminar
David Gleich · Purdue 
NEO-K-Means
SDP
Solution Z from CVX is even rank 2!
But SDP methods have a number of
issues for large-scale problems. 
1.  The number of variables is quadratic in the number of
data points
2.  The best solvers can only solve problems with a few
hundred or thousand points.
So like many before us (e.g. Burer & Monteiro, Kulis
Surendran, and Platt 2007, and more) 
we optimize a low-rank factorization of the solution


 SILO Seminar
David Gleich · Purdue
Using the NEO-K-Means Low-Rank
SDP, we can find assignments directly. 
SILO Seminar
David Gleich · Purdue 
NEO-K-Means
Low-rank SDP
Y YT
kZ YYT
k = 2.3 ⇥ 10 4
maximize
Y,f,g,s,r
trace(YT
KY) fT
d
subject to k = trace(YT
W 1
Y)
0 = YYT
e Wf
0 = eT
f (1 + ↵)n
0 = f g s
0 = eT
g (1 )n r
Yij 0, s 0, r 0
0  f  ke, 0  g  1
The Low-Rank NEO-K-Means SDP 
We lose convexity but gain practicality.
We introduce slacks at this point. 
SILO Seminar
David Gleich · Purdue 
icky non-convex term
simple bound constraints
We use an augmented Lagrangian
method to optimize this problem
SILO Seminar
David Gleich · Purdue 
Journal on Optimization, 18(1):186–205, 2007.
[29] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P.
Vlahavas. Multi-label classification of music into
emotions. In International Conference on Music
Information Retrieval, pages 325–330, 2008.
[30] J. J. Whang, I. S. Dhillon, and D. F. Gleich.
Non-exhaustive, overlapping k-means. In Proceedings
of the SIAM International Conference on Data
Mining, pages 936–944, 2015.
[31] J. J. Whang, D. Gleich, and I. S. Dhillon. Overlapping
community detection using seed set expansion. In
ACM International Conference on Information and
Knowledge Management, pages 2099–2108, 2013.
[32] L. F. Wu, T. R. Hughes, A. P. Davierwala, M. D.
Robinson, R. Stoughton, and S. J. Altschuler.
Large-scale prediction of saccharomyces cerevisiae
gene function using overlapping transcriptional
clusters. Nature Genetics, 31(3):255–265, June 2002.
[33] E. P. Xing and M. I. Jordan. On semidefinite
relaxations for normalized k-cut and connections to
spectral clustering. Technical Report
UCB/USD-3-1265, University of California, Berkeley,
2003.
[34] J. Yang and J. Leskovec. Overlapping community
detection at scale: a nonnegative matrix factorization
approach. In ACM International Conference on Web
Search and Data Mining, pages 587–596, 2013.
[35] S. X. Yu and J. Shi. Multiclass spectral clustering. In
IEEE International Conference on Computer Vision -
Volume 2, 2003.
APPENDIX
A. AUGMENTED LAGRANGIANS
The augmented Lagrangian framework is a general strat-
egy to solve nonlinear optimization problems with equality
tion and the gradient vector.
B. GRADIENTS FOR NEO-LR
We now describe the analytic form of the gradients for the
augmented Lagrangian of the NEO-LR objective and a brief
validation that these are correct. Consider the augmented
Lagrangian (5). The gradient has five components for the
five sets of variables: Y , f, g, s and r:
rY LA(Y , f, g, s, r; , µ, , ) =
2KY eµT
Y µeT
Y
2( 1 (tr(Y T
W 1
Y ) k))W 1
Y
+ (Y Y T
eeT
Y + eeT
Y Y T
Y ) (W feT
Y + efT
W Y )
rf LA(Y , f, g, s, r; , µ, , ) =
d + W µ (W Y Y T
e W 2
f) 2e + (eT
f (1 + ↵)n)e
+ (f g s)
rgLA(Y , f, g, s, r; , µ, , ) =
(f g s) 3e + (eT
g (1 )n r)e
rsLA(Y , f, g, s, r; , µ, , ) = (f g s)
rrLA(Y , f, g, s, r; , µ, , ) = 3 (eT
g (1 )n r)
Using analytic gradients in a black-box solver such as L-
BFGS-B is problematic if the gradients are even slightly in-
correctly computed. To guarantee the analytic gradients we
derive are correct, we use forward finite di↵erence method
to get numerical approximation of the gradients based on
the objective function. We compare these with our analytic
gradient and expect to see small relative di↵erences on the
order of 10 5
or 10 6
. This is exactly what Figure 4 shows.
ous studies of low-rank sdp approximations [6].
Let = [ 1; 2; 3] be the Lagrange multipliers associated
th the three scalar constraints (s), (u), (w), and µ and
be the Lagrange multipliers associated with the vector
nstraints (t) and (v), respectively. Let 0 be a penalty
rameter. The augmented Lagrangian for (4) is:
LA(Y, f, g, s, r; , µ, , ) =
fT
d trace(Y T
KY )
| {z }
the objective
1(trace(Y T
W 1
Y ) k)
+
2
(trace(Y T
W 1
Y ) k)2
µT
(Y Y T
e W f)
+
2
(Y Y T
e W f)T
(Y Y T
e W f)
2(eT
f (1 + ↵)n) +
2
(eT
f (1 + ↵)n)2
T
(f g s) +
2
(f g s)T
(f g s)
3(eT
g (1 )n r)
+
2
(eT
g (1 )n r)2
(5)
t each step in the augmented Lagrangian solution frame-
ork, we solve the following subproblem:
minimize LA(Y , f, g, s, r; , µ, , )
We use an augmented Lagrangian
method to optimize this problem
•  Use L-BFGS-B to optimize each step.
•  Update the multiplier estimates in the standard way.
•  Pick parameters in a modestly standard way.
•  Some variability between problems to show best results, only
a little variation in time/performance.
•  Faster than the NEOS solvers


SILO Seminar
David Gleich · Purdue 
Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP
mparison with Solvers on NEOS Server
NEOS Server 1: State-of-the-Art Solvers for Numerical Optimization
Our solver with ALM approach is much faster than theirs (e.g.,
SNOPT which is suitable for large nonlinearly constrained problems
with a modest number of degrees of freedom).
Our Solver ALM (obj/time) SNOPT solver (obj/time)
MUSIC 79514.130/92s 79515.156/306s
SCENE 18534.030/3798s 18534.021/8910s
YEAST 8902.253/4331s Not solved
We win with our LRSDP solver vs. "
the CVX default solver
•  Dolphins (n=62) and Les Mis (n=77) are graph probs
•  LRSDP is much faster and just as accurate.
SILO Seminar
David Gleich · Purdue 
LRSDP is roughly an order of magnitude faster than cvx.
LRSDP generates solutions as good as the global optimal from cvx.
The objective value are di↵erent in light of the solution tolerances.
dolphins 1
: 62 nodes, 159 edges, les miserables 2
: 77 nodes, 254 edges
Objective value Run time
SDP LRSDP SDP LRSDP
dolphins
k=2, ↵=0.2, =0 -1.968893 -1.968329 107.03 secs 2.55 secs
k=2, ↵=0.2, =0.05 -1.969080 -1.968128 56.99 secs 2.96 secs
k=3, ↵=0.3, =0 -2.913601 -2.915384 160.57 secs 5.39 secs
k=3, ↵=0.3, =0.05 -2.921634 -2.922252 71.83 secs 8.39 secs
les miserables
k=2, ↵=0.2, =0 -1.937268 -1.935365 453.96 secs 7.10 secs
k=2, ↵=0.3, =0 -1.949212 -1.945632 447.20 secs 10.24 secs
k=3, ↵=0.2, =0.05 -2.845720 -2.845070 261.64 secs 13.53 secs
k=3, ↵=0.3, =0.05 -2.859959 -2.859565 267.07 secs 19.31 secs
1
D. Lusseau et al., Behavioral Ecology and Sociobiology, 2003.
2
D. E. Knuth. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley, 1993.
Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 26 / 61
Dolphins from Lusseau et al. 2003; 
Les Mis from Knuth GraphBase
Rounding and Improvement
are both important.
SILO Seminar
David Gleich · Purdue 
Input ! Relaxed solution ! Rounded solution ! Improved solution
Rounding
f gives the number of clusters
g gives the set of assignments
Option 1
Use g and f to determine
the number of assignments and go greedy.
Option 2
Just greedily assign based on W 1
Y.
Improvement
Run NEO-K-Means on the output.
Initialization
Run NEO-K-Means on the intput.
The new method is more
robust, even in simple tests.
Consider clustering a cycle graph
SILO Seminar
David Gleich · Purdue
We use disconnected nodes to
measure the cluster quality.
SILO Seminar
David Gleich · Purdue 
disconnected nodes
0 0.5 1 1.5 2 2.5 3 3.5 4
0
10
20
30
40
50
60
70
80
90
100
Noise
No.ofdisconnectednodes
random+onelevel neo
multilevel neo
lrsdp As we increase the noise,
only the LRSDP method
can reliably find the true
clustering.
We get improved vector and
graph clustering results too. 
SILO Seminar
David Gleich · Purdue 
Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP
mental Results on Data Clustering
parison of NEO-K-Means objective function values
Real-world datasets from Mulan1
By using the LRSDP solution as the initialization of the iterative
algorithm, we can achieve better (smaller) objective function values.
worst best avg.
yeast
kmeans+neo 9611 9495 9549
lrsdp+neo 9440 9280 9364
slrsdp+neo 9471 9231 9367
music
kmeans+neo 87779 70158 77015
lrsdp+neo 82323 70157 75923
slrsdp+neo 82336 70159 75926
scene
kmeans+neo 18905 18745 18806
lrsdp+neo 18904 18759 18811
slrsdp+neo 18895 18760 18810
mulan.sourceforge.net/datasets.html
ou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 31 / 61
Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP
Experimental Results on Data Clustering
F1 scores on real-world vector datasets (the larger, the better)
NEO-K-Means-based methods outperform other methods.
Low-rank SDP method improves the clustering results.
moc esp isp okm kmeans+neo lrsdp+neo slrsdp+neo
yeast
worst - 0.274 0.232 0.311 0.356 0.390 0.369
best - 0.289 0.256 0.323 0.366 0.391 0.391
avg. - 0.284 0.248 0.317 0.360 0.391 0.382
music
worst 0.530 0.514 0.506 0.524 0.526 0.537 0.541
best 0.544 0.539 0.539 0.531 0.551 0.552 0.552
avg. 0.538 0.526 0.517 0.527 0.543 0.545 0.547
scene
worst 0.466 0.569 0.586 0.571 0.597 0.610 0.605
best 0.470 0.582 0.609 0.576 0.627 0.614 0.625
avg. 0.467 0.575 0.598 0.573 0.610 0.613 0.613
Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 32 / 61
We have improved results – impressively so on the yeast dataset –
and only slightly worse on the scene data.
We get improved vector and
graph clustering results too. 
SILO Seminar
David Gleich · Purdue 
Facebook1 Facebook2 HepPh AstroPh
bigclam 0.830 0.640 0.625 0.645
demon 0.495 0.318 0.503 0.570
oslom 0.319 0.445 0.465 0.580
nise 0.297 0.293 0.102 0.153
m-neo 0.285 0.269 0.206 0.190
LRSDP 0.222 0.148 0.091 0.137
No. of vertices No. of edges
Facebook1 348 2,866
Facebook2 756 30,780
HepPh 11,204 117,619
AstroPh 17,903 196,972
For these graphs, we
dramatically improve
the conductance-vs-
coverage plots.
Lloyd’s iterative method takes O(1 second)
LRSDP method takes O(1 hour)

Now we want to improve the LRSDP time.
SILO Seminar
David Gleich · Purdue
We can improve the
optimization beyond ALM.
1.  Proximal augmented Lagrangian (PALM)"

Add a regularization term to the augmented Lagrangian"
"
"
"

Solve with L-BFGS-B
2.  ADMM method (5 blocks) 
SILO Seminar
David Gleich · Purdue 
x(k+1)
= argmin LA(x(k)
; (k)
...) +
1
2⌧
kx x(k)
k
Yk+1
= argmin
Y
LA(Y, fk
, gk
, sk
, rk
; k
, µk
, k
, )
fk+1
= argmin
f
LA(Yk+1
, f, gk
, sk
, rk k
, µk
, k
, )
gk+1
= argmin
g
LA(Yk+1
, fk+1
, g, sk
, rk k
, µk
, k
, )
sk+1
= argmin
s
LA(Yk+1
, fk+1
, gk+1
, s, rk k
, µk
, k
, )
rk+1
= argmin
r
LA(Yk+1
, fk+1
, gk+1
, sk+1
, r k
, µk
, k
, )
Convex J
Non-convex L
We had to get a new convergence result
for the proximal method 
Results for bound-constrained sub-problems?
Ours is a a small adaptation of a general result
due to Pennanen (2002). 
SILO Seminar
David Gleich · Purdue 
Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP
Convergence analysis of PALM 1
Theorem 1
Let (¯x, ¯) be a KKT pair satisfying the strongly second order su cient condition and
assume the gradients rc(¯x) are linearly independent. If the { k } are large enough with
k ! ¯  1 and if k(x0, 0) (¯x, ¯)k is small enough, then there exists a sequence
{(xk , k )} conforming to Algorithm 1 along with open neighborhoods Ck such that for
each k, xk+1 is the unique solution in Ck to (Pk
). Then also, the sequence {(xk , k )}
converges linearly and Fej´er monotonically to ¯x, ¯ with rate r(¯) < 1 that is decreasing
in ¯ and r(¯) ! 0 as ¯ ! 1.
On the yeast dataset, we see no
difference in objective, but faster solves
SILO Seminar
David Gleich · Purdue 
0
500
1000
1500
2000
2500
3000
3500
4000
4500
iterative ALM PALM ADMM
Runtimes on YEAST
8700
8800
8900
9000
9100
9200
ALM PALM ADMM
f(x) values on YEAST
On yeast, we see much better
discrete objectives and F1 scores. 
SILO Seminar
David Gleich · Purdue 
9000
9100
9200
9300
9400
9500
9600
9700
iterative ALM PALM ADMM
NEO−K−Means objectives on YEAST
0.34
0.345
0.35
0.355
0.36
0.365
0.37
0.375
0.38
0.385
0.39
iterative ALM PALM ADMM
F1 Scores on YEAST
Recap
For overlapping clustering of data and
overlapping community detection of graphs, we
have a new objective
•  Fast Lloyd-like iterative algorithm
•  SDP relaxation
•  Low-rank SDP relaxation
•  Proximal and ADMM acceleration techniques
SILO Seminar
David Gleich · Purdue 
1.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015
2.  NEO-K-means SDP + Aug. Lagrangian"
Hou, Whang, Gleich, Dhillon, KDD 2015
3.  Multiplier Methods for Overlapping K-Means"
Hou, Whang, Gleich, Dhillon, Submitted
SILO Seminar
David Gleich · Purdue 
plot(x)
0 2 4 6 8 10
x 10
5
0
0.02
0.04
0.06
0.08
0.1
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
nonzeros
Crawl of flickr from 2006 ~800k nodes, 6M edges, beta=1/2
(I P)x = (1 )s
nnz(x) ⇡ 800k
kD1
(xx⇤
)k1"
Localized solutions of diffusion equations in large graphs. 
Joint with Kyle Kloster. WAW2013, KDD2014, WAW2015; J. Internet Math. 
the answer [5]. Thus, just as in scientific
computing, marrying the method to
the model is key for the best scientific
computing on social networks.
Ultimately, none of these steps dif-
fer from the practice of physical sci-
entific computing. The challenges in
creating models, devising algorithms,
validating results, and comparing
models just take on different chal-
lenges when the problems come from
social data instead of physical mod-
els. Thus, let us return to our starting
question: What does the matrix have
to do with the social network? Just as
in scientific computing, many inter-
esting problems, models, and meth-
ods for social networks boil down to
matrix computations. Yet, as in the
expander example above, the types of
matrix questions change dramatical-
ly in order to fit social network mod-
els. Let’s see what’s been done that’s
enticingly and refreshingly different
from the types of matrix computa-
tions encountered in physical scien-
tific computing.
EXPANDER GRAPHS AND
PARALLEL COMPUTING
Recently, a coalition of folks from aca-
demia, national labs, and industry set
out to tackle the problems in parallel
computing and expander graphs. They
established the Graph 500 benchmark
(http://www.graph500.org) to measure
the performance of a parallel com-
puter on a standard graph computa-
tion with an expander graph. Over the
past three years, they’ve seen perfor-
mance grow by more than 1,000-times
Diffusion
in a plate
Movie
interest in
diffusion
The network, or mesh, from a typical problem in scientific computing
n a low dimensional space—think of two or three dimensions. These physical
ut limits on the size of the boundary or “surface area” of the space given its
No such limits exist in social networks and these two sets are usually about
size. A network with this property is called an expander network.
Size of set » Size of boundary
“Networks”
from PDEs
are usually
physical
Social networks
are expanders
SILO Seminar
David Gleich · Purdue 
Higher order organization of complex networks
Joint with Austin Benson and Jure Leskovec
9
10
8
7
2
0
4
3
11
6
5
1
CEPDR
CEPVR
IL2R
OLLR
RIAL
RIAR
RIVL
RIVR
RMDDR
RMDL
RMDR
RMDVL
RMFL
SMDDL
SMDDR
SMDVR
URBR
By using a new generalization of
spectral clustering methods, we are
able to find completely novel and
relevant structures in complex systems
such as the connectome and transport
networks.
SILO Seminar
David Gleich · Purdue 
SIAM Annual Meeting !
(AN16)!

July 11-15, 2016
The Westin Waterfront"
Boston, Massachusetts

David Gleich, Purdue
Mary Silber, Northwestern
Big Data, Data Science, and Privacy 
Education, Communication, and Policy 
Reproducibility and Ethics 
Efficiency and Optimization 

Integrating Models and Data (incl. "
computational social science, PDEs) 
Dynamic Networks (learning, evolution, "
adaptation, and cooperation) 
Applied Math, Statistics, and "
Machine Learning 
Earth systems; environmental/ecological
applications 
Epidemiology
Future work
Even faster solvers 

Understand why the
solution seems to be
rank-2. 
Better init for Lloyds.
SILO Seminar
David Gleich · Purdue 
Solution Z from CVX is even rank 2!
1.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015
2.  NEO-K-means SDP + Aug. Lagrangian"
Hou, Whang, Gleich, Dhillon, KDD 2015
3.  Multiplier Methods for Overlapping K-Means"
Hou, Whang, Gleich, Dhillon, Submitted

Contenu connexe

En vedette

Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Overlapping clusters for distributed computation
Overlapping clusters for distributed computationOverlapping clusters for distributed computation
Overlapping clusters for distributed computationDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
Graph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcGraph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...David Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignmentDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 

En vedette (20)

Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Overlapping clusters for distributed computation
Overlapping clusters for distributed computationOverlapping clusters for distributed computation
Overlapping clusters for distributed computation
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
Graph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcGraph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimc
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 

Similaire à Non-exhaustive, Overlapping K-means

Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Alexander Litvinenko
 
Algebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manualkejeqadaqo
 
3130005 cvpde gtu_study_material_e-notes_all_18072019070728_am
3130005 cvpde gtu_study_material_e-notes_all_18072019070728_am3130005 cvpde gtu_study_material_e-notes_all_18072019070728_am
3130005 cvpde gtu_study_material_e-notes_all_18072019070728_amdataniyaarunkumar
 
Yet another statistical analysis of the data of the ‘loophole free’ experime...
Yet another statistical analysis of the data of the  ‘loophole free’ experime...Yet another statistical analysis of the data of the  ‘loophole free’ experime...
Yet another statistical analysis of the data of the ‘loophole free’ experime...Richard Gill
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentRaphael Reitzig
 
Statistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docxStatistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docxrafaelaj1
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionAlexander Litvinenko
 
Data sparse approximation of Karhunen-Loeve Expansion
Data sparse approximation of Karhunen-Loeve ExpansionData sparse approximation of Karhunen-Loeve Expansion
Data sparse approximation of Karhunen-Loeve ExpansionAlexander Litvinenko
 
Trial kedah 2014 spm add math k2 skema
Trial kedah 2014 spm add math k2 skemaTrial kedah 2014 spm add math k2 skema
Trial kedah 2014 spm add math k2 skemaCikgu Pejal
 
DeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep LearningDeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep LearningMasahiro Sakai
 
Numerical Methods: Solution of system of equations
Numerical Methods: Solution of system of equationsNumerical Methods: Solution of system of equations
Numerical Methods: Solution of system of equationsNikolai Priezjev
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Vladimir Bakhrushin
 

Similaire à Non-exhaustive, Overlapping K-means (20)

Talk litvinenko prior_cov
Talk litvinenko prior_covTalk litvinenko prior_cov
Talk litvinenko prior_cov
 
Cs262 2006 lecture6
Cs262 2006 lecture6Cs262 2006 lecture6
Cs262 2006 lecture6
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Maths04
Maths04Maths04
Maths04
 
Algebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manual
 
Muchtadi
MuchtadiMuchtadi
Muchtadi
 
3130005 cvpde gtu_study_material_e-notes_all_18072019070728_am
3130005 cvpde gtu_study_material_e-notes_all_18072019070728_am3130005 cvpde gtu_study_material_e-notes_all_18072019070728_am
3130005 cvpde gtu_study_material_e-notes_all_18072019070728_am
 
Yet another statistical analysis of the data of the ‘loophole free’ experime...
Yet another statistical analysis of the data of the  ‘loophole free’ experime...Yet another statistical analysis of the data of the  ‘loophole free’ experime...
Yet another statistical analysis of the data of the ‘loophole free’ experime...
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
 
Statistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docxStatistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docx
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansion
 
Slides
SlidesSlides
Slides
 
Data sparse approximation of Karhunen-Loeve Expansion
Data sparse approximation of Karhunen-Loeve ExpansionData sparse approximation of Karhunen-Loeve Expansion
Data sparse approximation of Karhunen-Loeve Expansion
 
Trial kedah 2014 spm add math k2 skema
Trial kedah 2014 spm add math k2 skemaTrial kedah 2014 spm add math k2 skema
Trial kedah 2014 spm add math k2 skema
 
DeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep LearningDeepXplore: Automated Whitebox Testing of Deep Learning
DeepXplore: Automated Whitebox Testing of Deep Learning
 
Numerical Methods: Solution of system of equations
Numerical Methods: Solution of system of equationsNumerical Methods: Solution of system of equations
Numerical Methods: Solution of system of equations
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...
 
Steven Duplij, "Polyadic rings of p-adic integers"
Steven Duplij, "Polyadic rings of p-adic integers"Steven Duplij, "Polyadic rings of p-adic integers"
Steven Duplij, "Polyadic rings of p-adic integers"
 

Plus de David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for HadoopDavid Gleich
 

Plus de David Gleich (15)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for Hadoop
 

Dernier

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Non-exhaustive, Overlapping K-means

  • 2. Real-world graph and point data have overlapping clusters. GeneRa 10 20 30 40 50 60 70 NM_003748NM_003862Contig32125_RCU82987AB037863NM_020974Contig55377_RCNM_003882NM_000849Contig48328_RCContig46223_RCNM_006117NM_003239NM_018401AF257175AF201951NM_001282Contig63102_RCNM_000286Contig34634_RCNM_000320AB033007AL355708NM_000017NM_006763AF148505Contig57595NM_001280AJ224741U45975Contig49670_RCContig753_RCContig25055_RCContig53646_RCContig42421_RCContig51749_RCAL137514NM_004911NM_000224NM_013262Contig41887_RCNM_004163AB020689NM_015416Contig43747_RCNM_012429AB033043AL133619NM_016569NM_004480NM_004798Contig37063_RCNM_000507AB037745Contig50802_RCNM_001007Contig53742_RCNM_018104Contig51963Contig53268_RCNM_012261NM_020244Contig55813_RCContig27312_RCContig44064_RCNM_002570NM_002900AL050090NM_015417Contig47405_RCNM_016337Contig55829_RCContig37598Contig45347_RCNM_020675NM_003234AL080110AL137295Contig17359_RCNM_013296NM_019013AF052159Contig55313_RCNM_002358NM_004358Contig50106_RCNM_005342NM_014754U58033Contig64688NM_001827Contig3902_RCContig41413_RCNM_015434NM_014078NM_018120NM_001124L27560Contig45816_RCAL050021NM_006115NM_001333NM_005496Contig51519_RCContig1778_RCNM_014363NM_001905NM_018454NM_002811NM_004603AB032973NM_006096D25328Contig46802_RCX94232NM_018004Contig8581_RCContig55188_RCContig50410Contig53226_RCNM_012214NM_006201NM_006372Contig13480_RCAL137502Contig40128_RCNM_003676NM_013437Contig2504_RCAL133603NM_012177R70506_RCNM_003662NM_018136NM_000158NM_018410Contig21812_RCNM_004052Contig4595Contig60864_RCNM_003878U96131NM_005563NM_018455Contig44799_RCNM_003258NM_004456NM_003158NM_014750Contig25343_RCNM_005196Contig57864_RCNM_014109NM_002808Contig58368_RCContig46653_RCNM_004504M21551NM_014875NM_001168NM_003376NM_018098AF161553NM_020166NM_017779NM_018265AF155117NM_004701NM_006281Contig44289_RCNM_004336Contig33814_RCNM_003600NM_006265NM_000291NM_000096NM_001673NM_001216NM_014968NM_018354NM_007036NM_004702Contig2399_RCNM_001809Contig20217_RCNM_003981NM_007203NM_006681AF055033NM_014889NM_020386NM_000599Contig56457_RCNM_005915Contig24252_RCContig55725_RCNM_002916NM_014321NM_006931AL080079Contig51464_RCNM_000788NM_016448X05610NM_014791Contig40831_RCAK000745NM_015984NM_016577Contig32185_RCAF052162AF073519NM_003607NM_006101NM_003875Contig25991Contig35251_RCNM_004994NM_000436NM_002073NM_002019NM_000127NM_020188AL137718Contig28552_RCContig38288_RCAA555029_RCNM_016359Contig46218_RCContig63649_RCAL080059 Social networks have overlapping clusters because of social circles Genes have overlapping clusters due to their role in multiple functions SILO Seminar David Gleich · Purdue
  • 3. Overlapping research projects are what got me here too! PhD Thesis on Google’s PageRank MSR Intern and Overlapping Clusters for Distributed Computation Accelerated NCP plots and locally minimal communities Neighborhood inflated seed expansion for overlapping communities Non- exhaustive overlapping " K-means SILO Seminar David Gleich · Purdue 1.  NISE Clustering - Whang, Gleich, Dhillon, CIKM 2013 2.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015 3.  NEO-K-means SDP " Hou, Whang, Gleich, Dhillon, KDD 2015 4.  Multiplier Methods for Overlapping K-Means" Hou, Whang, Gleich, Dhillon, Submitted
  • 4. SILO Seminar David Gleich · Purdue
  • 5. es around the seed sets Overlapping communities via seed set expansion works nicely. Filtering Phase Seeding Phase Seed Set Expansion Phase Propagation Phase Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 Coverage (percentage) M Student Version of MATLAB (a) AstroPh 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Coverage (percentage) MaximumConductance egonet graclus centers spread hubs random bigclam (d) Flickr Figure 2: Conductance vs. graph cov centers” outperforms other seeding str We can cover 95% of network with communities of cond. ~0.15. Flickr social network 2M vertices, 22M edges cond(S) = cut(S)/“size”(S) SILO Seminar David Gleich · Purdue
  • 6. We wanted a more principled approach to achieve these results. SILO Seminar David Gleich · Purdue
  • 7. The state of the art for clustering SILO Seminar David Gleich · Purdue K-Means Problem 1 Problem 2 Problem 3 Problem 4 😀 😊 😟 😢 K-Means
  • 8. The state of the art for clustering SILO Seminar David Gleich · Purdue K-Means Problem 1 Problem 2 Problem 3 Problem 4 😀 😊 K-Means NEO-K-Means NEO K-Means 😊 😊
  • 9. m1 m2 || xi – m1 || || xi – m2 || K-means as optimization. SILO Seminar David Gleich · Purdue minimize P ij Uij kxi mj k 2 subject to U is an assignment to clusters mj = 1P i Ui j Uij xi minimize P ij Uij kxi mj k 2 subject to U is an multi-assignment to clusters mj = 1P i Ui j Uij xi Input Points x1, ... , xn Find an assignment matrix U that gives cluster assignments to minimize x1 x2 x3 x4 U = 2 6 6 4 1 0 1 0 0 1 0 1 3 7 7 5 c1 c2 K-means objective! K-means’ objective with overlap?!
  • 10. Overlap is not a natural addition to optimization based clustering. SILO Seminar David Gleich · Purdue
  • 11. The NEO-K-means objective balances overlap and outliers. SILO Seminar David Gleich · Purdue minimize P ij Uij kxi mj k 2 subject to Uij is binary trace(UT U) = (1 + ↵)n (↵n overlap) eT Ind[Ue] (1 )n (up to n outliers) mj = 1P i Ui j Uij xi · If ↵, = 0, then we get back to K-means. · Automatically choose ↵, based on K-means. 😊 1. Make (1 + ↵)n total assignments. 2. Allow up to n outliers.
  • 12. −8 −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned Lloyd’s algorithm for NEO-K-means is just a wee-bit more complex. SILO Seminar David Gleich · Purdue Until done 1. Update centroids. 2. Assign (1 )n nodes to closest centroid 3. Make (↵ + )n assignments based on minimizing distance. 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned This algorithm correctly assigns our example case and even determines overlap and outlier parameters! THEOREM Lloyds algorithm decrease the objective monotonically.
  • 13. The non-exhaustiveness is necessary for assignments. SILO Seminar David Gleich · Purdue −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned b) First extension of k-means −8 −6 −4 −2 0 2 4 −8 −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned (c) NEO-K-Means nerated (n=1,000, ↵=0.1, =0.005). Green points indicate o −4 −2 0 2 4 6 8 & 2 gned st extension of k-means −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned (c) NEO-K-Means d (n=1,000, ↵=0.1, =0.005). Green points indicate overlap4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned Output without assignment constraint. (beta = 1) NEO-K-means output (correct)
  • 14. The Weighted, Kernel " NEO-K-Means objective. •  Introduce weights for each data point. •  Introduce feature maps for each data point too. SILO Seminar David Gleich · Purdue minimize P ij Uij wi k (xi ) mj k 2 subject to Uij is binary trace(UT U) = (1 + ↵)n (↵n overlap) eT Ind[Ue] (1 )n (up to n outliers) mj = 1P i Uij wi wi Uij xi X ij Uij wi k (xi ) mj k 2 = X ij Uij wi Kii uj WKWuj uT j Wuj ! Theorem If K = D 1 + D 1 AD 1 , then the NEO-K-Means objective is equivalent to overlapping conductance. NOTE
  • 15. This means that NEO-K-Means was the principled objective we were after! SILO Seminar David Gleich · Purdue
  • 16. Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: Equivalently, it’s the probability that a random edge leaves the set. Small conductance ó Good community (S) = cut(S) min vol(S), vol( ¯S) (edges leaving the set) (total edges in the set) David Gleich · Purdue cut(S) = 7 vol(S) = 33 vol( ¯S) = 11 (S) = 7/11 SILO Seminar
  • 17. Our theorem means that NEO-K-Means can optimize the sum-conductance obj. SILO Seminar David Gleich · Purdue (S)  cut(S) vol(S) + cut( ¯S) vol( ¯S) X S2C cut(S) vol(S) = X S2C (S) if vol(S)  vol( ¯S) Conductance Normalized cut bi-partition NEO-K-Means" objective When we use this method to partition the Karate club network, we get reasonable solutions. •  Inspired by Dhillon et al.’s work on Graclus •  We have a multilevel method to optimize the graph case.
  • 18. We get state of the art clustering perf. on vector and graph datasets. SILO Seminar David Gleich · Purdue F1 scores on vector datasets from the Mulan repository. moc fuzzy esp isp okm rokm NEO synth1 0.833 0.959 0.977 0.985 0.989 0.969 0.996 synth2 0.836 0.957 0.952 0.973 0.967 0.975 0.996 synth3 0.547 0.919 0.968 0.952 0.970 0.928 0.996 yeast - 0.308 0.289 0.203 0.311 0.203 0.366 music 0.534 0.533 0.527 0.508 0.527 0.454 0.550 scene 0.467 0.431 0.572 0.586 0.571 0.593 0.626 n dim. ¯|C| outliers k synth1 5,000 2 2,750 0 2 synth2 1,000 2 550 5 2 synth3 6,000 2 3,600 6 2 yeast 2,417 103 731.5 0 14 music 593 72 184.7 0 6 scene 2,407 294 430.8 0 6 The Mulan testset has a number of appropriate datasets
  • 19. NEO-K-Means with Lloyds is fast and usually accurate but inconsistent. SILO Seminar David Gleich · Purdue −6 −4 −2 0 2 4 6 −2 0 2 4 6 8 10 Cluster 1 Cluster 2 Cluster 1 & 2 Cluster 3 Not assigned −4 −2 0 2 4 6 ster 1 ster 2 ster 1 & 2 ster 3 assigned −6 −4 −2 −2 0 2 4 6 8 10 Cluster 1 Cluster 2 Cluster 1 & 2 Cluster 3 Not assigned A more complicated overlapping test case The output from NEO-K- Means with Lloyd’s method
  • 20. Can we get a more robust method? Yes! SILO Seminar David Gleich · Purdue
  • 21. Towards better optimization of the objective 1.  An SDP relaxation of the objective. 2.  A practical low-rank SDP heuristic. 3.  Faster optimization methods for the heuristic. SILO Seminar David Gleich · Purdue
  • 22. From assignments to co- occurrence matrices SILO Seminar David Gleich · Purdue There are three key variables in our formulation 1. The co-occurrence matrix Z = X j Wuj uT j W/uT j Wuj 2. The overlap vector f 3. The assignment indicator g U = 2 6 6 4 1 0 1 1 0 1 0 0 3 7 7 5 f = 2 6 6 4 1 2 1 0 3 7 7 5 g = 2 6 6 4 1 1 1 0 3 7 7 5
  • 23. We can convert our objective into a trace minimization problem. SILO Seminar David Gleich · Purdue Kij = (xi )T (xj ) di = wi Kii X ij Uij wi k (xi ) mj k 2 = X ij Uij wi Kii uj WKWuj uT j Wuj ! = X ij Uij wi Kii X j uj WKWuj uT j Wuj = fT d trace(KZ) Z = normalized co-occurrence f = overlap count g = assignment indicator The objective function
  • 24. There is an SDP-like framework to solve NEO-K-means. SILO Seminar David Gleich · Purdue maximize Z,f,g trace(KZ) fT d subject to trace(W 1 Z) = k, (a) Zij 0, (b) Z ⌫ 0, Z = ZT (c) Ze = Wf, (d) eT f = (1 + ↵)n, (e) eT g (1 )n, (f) f g, (g) rank(Z) = k, (h) f 2 Zn 0, g 2 {0, 1}n . (i) Z must come from an assignment matrix Overlap and assignment constraints Combinatorial constraints
  • 25. There is an SDP-relaxation to approximate NEO-K-means. SILO Seminar David Gleich · Purdue Z must come from an assignment matrix Overlap and assignment constraints maximize Z,f,g trace(KZ) fT d subject to trace(W 1 Z) = k, (a) Zij 0, (b) Z ⌫ 0, Z = ZT (c) Ze = Wf, (d) eT f = (1 + ↵)n, (e) eT g (1 )n, (f) f g, (g) 0  g  1 Relaxed constraints
  • 26. This SDP can easily solve simple problems. SILO Seminar David Gleich · Purdue NEO-K-Means SDP Solution Z from CVX is even rank 2!
  • 27. But SDP methods have a number of issues for large-scale problems. 1.  The number of variables is quadratic in the number of data points 2.  The best solvers can only solve problems with a few hundred or thousand points. So like many before us (e.g. Burer & Monteiro, Kulis Surendran, and Platt 2007, and more) we optimize a low-rank factorization of the solution SILO Seminar David Gleich · Purdue
  • 28. Using the NEO-K-Means Low-Rank SDP, we can find assignments directly. SILO Seminar David Gleich · Purdue NEO-K-Means Low-rank SDP Y YT kZ YYT k = 2.3 ⇥ 10 4
  • 29. maximize Y,f,g,s,r trace(YT KY) fT d subject to k = trace(YT W 1 Y) 0 = YYT e Wf 0 = eT f (1 + ↵)n 0 = f g s 0 = eT g (1 )n r Yij 0, s 0, r 0 0  f  ke, 0  g  1 The Low-Rank NEO-K-Means SDP We lose convexity but gain practicality. We introduce slacks at this point. SILO Seminar David Gleich · Purdue icky non-convex term simple bound constraints
  • 30. We use an augmented Lagrangian method to optimize this problem SILO Seminar David Gleich · Purdue Journal on Optimization, 18(1):186–205, 2007. [29] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In International Conference on Music Information Retrieval, pages 325–330, 2008. [30] J. J. Whang, I. S. Dhillon, and D. F. Gleich. Non-exhaustive, overlapping k-means. In Proceedings of the SIAM International Conference on Data Mining, pages 936–944, 2015. [31] J. J. Whang, D. Gleich, and I. S. Dhillon. Overlapping community detection using seed set expansion. In ACM International Conference on Information and Knowledge Management, pages 2099–2108, 2013. [32] L. F. Wu, T. R. Hughes, A. P. Davierwala, M. D. Robinson, R. Stoughton, and S. J. Altschuler. Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nature Genetics, 31(3):255–265, June 2002. [33] E. P. Xing and M. I. Jordan. On semidefinite relaxations for normalized k-cut and connections to spectral clustering. Technical Report UCB/USD-3-1265, University of California, Berkeley, 2003. [34] J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In ACM International Conference on Web Search and Data Mining, pages 587–596, 2013. [35] S. X. Yu and J. Shi. Multiclass spectral clustering. In IEEE International Conference on Computer Vision - Volume 2, 2003. APPENDIX A. AUGMENTED LAGRANGIANS The augmented Lagrangian framework is a general strat- egy to solve nonlinear optimization problems with equality tion and the gradient vector. B. GRADIENTS FOR NEO-LR We now describe the analytic form of the gradients for the augmented Lagrangian of the NEO-LR objective and a brief validation that these are correct. Consider the augmented Lagrangian (5). The gradient has five components for the five sets of variables: Y , f, g, s and r: rY LA(Y , f, g, s, r; , µ, , ) = 2KY eµT Y µeT Y 2( 1 (tr(Y T W 1 Y ) k))W 1 Y + (Y Y T eeT Y + eeT Y Y T Y ) (W feT Y + efT W Y ) rf LA(Y , f, g, s, r; , µ, , ) = d + W µ (W Y Y T e W 2 f) 2e + (eT f (1 + ↵)n)e + (f g s) rgLA(Y , f, g, s, r; , µ, , ) = (f g s) 3e + (eT g (1 )n r)e rsLA(Y , f, g, s, r; , µ, , ) = (f g s) rrLA(Y , f, g, s, r; , µ, , ) = 3 (eT g (1 )n r) Using analytic gradients in a black-box solver such as L- BFGS-B is problematic if the gradients are even slightly in- correctly computed. To guarantee the analytic gradients we derive are correct, we use forward finite di↵erence method to get numerical approximation of the gradients based on the objective function. We compare these with our analytic gradient and expect to see small relative di↵erences on the order of 10 5 or 10 6 . This is exactly what Figure 4 shows. ous studies of low-rank sdp approximations [6]. Let = [ 1; 2; 3] be the Lagrange multipliers associated th the three scalar constraints (s), (u), (w), and µ and be the Lagrange multipliers associated with the vector nstraints (t) and (v), respectively. Let 0 be a penalty rameter. The augmented Lagrangian for (4) is: LA(Y, f, g, s, r; , µ, , ) = fT d trace(Y T KY ) | {z } the objective 1(trace(Y T W 1 Y ) k) + 2 (trace(Y T W 1 Y ) k)2 µT (Y Y T e W f) + 2 (Y Y T e W f)T (Y Y T e W f) 2(eT f (1 + ↵)n) + 2 (eT f (1 + ↵)n)2 T (f g s) + 2 (f g s)T (f g s) 3(eT g (1 )n r) + 2 (eT g (1 )n r)2 (5) t each step in the augmented Lagrangian solution frame- ork, we solve the following subproblem: minimize LA(Y , f, g, s, r; , µ, , )
  • 31. We use an augmented Lagrangian method to optimize this problem •  Use L-BFGS-B to optimize each step. •  Update the multiplier estimates in the standard way. •  Pick parameters in a modestly standard way. •  Some variability between problems to show best results, only a little variation in time/performance. •  Faster than the NEOS solvers SILO Seminar David Gleich · Purdue Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP mparison with Solvers on NEOS Server NEOS Server 1: State-of-the-Art Solvers for Numerical Optimization Our solver with ALM approach is much faster than theirs (e.g., SNOPT which is suitable for large nonlinearly constrained problems with a modest number of degrees of freedom). Our Solver ALM (obj/time) SNOPT solver (obj/time) MUSIC 79514.130/92s 79515.156/306s SCENE 18534.030/3798s 18534.021/8910s YEAST 8902.253/4331s Not solved
  • 32. We win with our LRSDP solver vs. " the CVX default solver •  Dolphins (n=62) and Les Mis (n=77) are graph probs •  LRSDP is much faster and just as accurate. SILO Seminar David Gleich · Purdue LRSDP is roughly an order of magnitude faster than cvx. LRSDP generates solutions as good as the global optimal from cvx. The objective value are di↵erent in light of the solution tolerances. dolphins 1 : 62 nodes, 159 edges, les miserables 2 : 77 nodes, 254 edges Objective value Run time SDP LRSDP SDP LRSDP dolphins k=2, ↵=0.2, =0 -1.968893 -1.968329 107.03 secs 2.55 secs k=2, ↵=0.2, =0.05 -1.969080 -1.968128 56.99 secs 2.96 secs k=3, ↵=0.3, =0 -2.913601 -2.915384 160.57 secs 5.39 secs k=3, ↵=0.3, =0.05 -2.921634 -2.922252 71.83 secs 8.39 secs les miserables k=2, ↵=0.2, =0 -1.937268 -1.935365 453.96 secs 7.10 secs k=2, ↵=0.3, =0 -1.949212 -1.945632 447.20 secs 10.24 secs k=3, ↵=0.2, =0.05 -2.845720 -2.845070 261.64 secs 13.53 secs k=3, ↵=0.3, =0.05 -2.859959 -2.859565 267.07 secs 19.31 secs 1 D. Lusseau et al., Behavioral Ecology and Sociobiology, 2003. 2 D. E. Knuth. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley, 1993. Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 26 / 61 Dolphins from Lusseau et al. 2003; Les Mis from Knuth GraphBase
  • 33. Rounding and Improvement are both important. SILO Seminar David Gleich · Purdue Input ! Relaxed solution ! Rounded solution ! Improved solution Rounding f gives the number of clusters g gives the set of assignments Option 1 Use g and f to determine the number of assignments and go greedy. Option 2 Just greedily assign based on W 1 Y. Improvement Run NEO-K-Means on the output. Initialization Run NEO-K-Means on the intput.
  • 34. The new method is more robust, even in simple tests. Consider clustering a cycle graph SILO Seminar David Gleich · Purdue
  • 35. We use disconnected nodes to measure the cluster quality. SILO Seminar David Gleich · Purdue disconnected nodes 0 0.5 1 1.5 2 2.5 3 3.5 4 0 10 20 30 40 50 60 70 80 90 100 Noise No.ofdisconnectednodes random+onelevel neo multilevel neo lrsdp As we increase the noise, only the LRSDP method can reliably find the true clustering.
  • 36. We get improved vector and graph clustering results too. SILO Seminar David Gleich · Purdue Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP mental Results on Data Clustering parison of NEO-K-Means objective function values Real-world datasets from Mulan1 By using the LRSDP solution as the initialization of the iterative algorithm, we can achieve better (smaller) objective function values. worst best avg. yeast kmeans+neo 9611 9495 9549 lrsdp+neo 9440 9280 9364 slrsdp+neo 9471 9231 9367 music kmeans+neo 87779 70158 77015 lrsdp+neo 82323 70157 75923 slrsdp+neo 82336 70159 75926 scene kmeans+neo 18905 18745 18806 lrsdp+neo 18904 18759 18811 slrsdp+neo 18895 18760 18810 mulan.sourceforge.net/datasets.html ou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 31 / 61 Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP Experimental Results on Data Clustering F1 scores on real-world vector datasets (the larger, the better) NEO-K-Means-based methods outperform other methods. Low-rank SDP method improves the clustering results. moc esp isp okm kmeans+neo lrsdp+neo slrsdp+neo yeast worst - 0.274 0.232 0.311 0.356 0.390 0.369 best - 0.289 0.256 0.323 0.366 0.391 0.391 avg. - 0.284 0.248 0.317 0.360 0.391 0.382 music worst 0.530 0.514 0.506 0.524 0.526 0.537 0.541 best 0.544 0.539 0.539 0.531 0.551 0.552 0.552 avg. 0.538 0.526 0.517 0.527 0.543 0.545 0.547 scene worst 0.466 0.569 0.586 0.571 0.597 0.610 0.605 best 0.470 0.582 0.609 0.576 0.627 0.614 0.625 avg. 0.467 0.575 0.598 0.573 0.610 0.613 0.613 Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 32 / 61 We have improved results – impressively so on the yeast dataset – and only slightly worse on the scene data.
  • 37. We get improved vector and graph clustering results too. SILO Seminar David Gleich · Purdue Facebook1 Facebook2 HepPh AstroPh bigclam 0.830 0.640 0.625 0.645 demon 0.495 0.318 0.503 0.570 oslom 0.319 0.445 0.465 0.580 nise 0.297 0.293 0.102 0.153 m-neo 0.285 0.269 0.206 0.190 LRSDP 0.222 0.148 0.091 0.137 No. of vertices No. of edges Facebook1 348 2,866 Facebook2 756 30,780 HepPh 11,204 117,619 AstroPh 17,903 196,972 For these graphs, we dramatically improve the conductance-vs- coverage plots.
  • 38. Lloyd’s iterative method takes O(1 second) LRSDP method takes O(1 hour) Now we want to improve the LRSDP time. SILO Seminar David Gleich · Purdue
  • 39. We can improve the optimization beyond ALM. 1.  Proximal augmented Lagrangian (PALM)" Add a regularization term to the augmented Lagrangian" " " " Solve with L-BFGS-B 2.  ADMM method (5 blocks) SILO Seminar David Gleich · Purdue x(k+1) = argmin LA(x(k) ; (k) ...) + 1 2⌧ kx x(k) k Yk+1 = argmin Y LA(Y, fk , gk , sk , rk ; k , µk , k , ) fk+1 = argmin f LA(Yk+1 , f, gk , sk , rk k , µk , k , ) gk+1 = argmin g LA(Yk+1 , fk+1 , g, sk , rk k , µk , k , ) sk+1 = argmin s LA(Yk+1 , fk+1 , gk+1 , s, rk k , µk , k , ) rk+1 = argmin r LA(Yk+1 , fk+1 , gk+1 , sk+1 , r k , µk , k , ) Convex J Non-convex L
  • 40. We had to get a new convergence result for the proximal method Results for bound-constrained sub-problems? Ours is a a small adaptation of a general result due to Pennanen (2002). SILO Seminar David Gleich · Purdue Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP Convergence analysis of PALM 1 Theorem 1 Let (¯x, ¯) be a KKT pair satisfying the strongly second order su cient condition and assume the gradients rc(¯x) are linearly independent. If the { k } are large enough with k ! ¯  1 and if k(x0, 0) (¯x, ¯)k is small enough, then there exists a sequence {(xk , k )} conforming to Algorithm 1 along with open neighborhoods Ck such that for each k, xk+1 is the unique solution in Ck to (Pk ). Then also, the sequence {(xk , k )} converges linearly and Fej´er monotonically to ¯x, ¯ with rate r(¯) < 1 that is decreasing in ¯ and r(¯) ! 0 as ¯ ! 1.
  • 41. On the yeast dataset, we see no difference in objective, but faster solves SILO Seminar David Gleich · Purdue 0 500 1000 1500 2000 2500 3000 3500 4000 4500 iterative ALM PALM ADMM Runtimes on YEAST 8700 8800 8900 9000 9100 9200 ALM PALM ADMM f(x) values on YEAST
  • 42. On yeast, we see much better discrete objectives and F1 scores. SILO Seminar David Gleich · Purdue 9000 9100 9200 9300 9400 9500 9600 9700 iterative ALM PALM ADMM NEO−K−Means objectives on YEAST 0.34 0.345 0.35 0.355 0.36 0.365 0.37 0.375 0.38 0.385 0.39 iterative ALM PALM ADMM F1 Scores on YEAST
  • 43. Recap For overlapping clustering of data and overlapping community detection of graphs, we have a new objective •  Fast Lloyd-like iterative algorithm •  SDP relaxation •  Low-rank SDP relaxation •  Proximal and ADMM acceleration techniques SILO Seminar David Gleich · Purdue 1.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015 2.  NEO-K-means SDP + Aug. Lagrangian" Hou, Whang, Gleich, Dhillon, KDD 2015 3.  Multiplier Methods for Overlapping K-Means" Hou, Whang, Gleich, Dhillon, Submitted
  • 44. SILO Seminar David Gleich · Purdue plot(x) 0 2 4 6 8 10 x 10 5 0 0.02 0.04 0.06 0.08 0.1 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros Crawl of flickr from 2006 ~800k nodes, 6M edges, beta=1/2 (I P)x = (1 )s nnz(x) ⇡ 800k kD1 (xx⇤ )k1" Localized solutions of diffusion equations in large graphs. Joint with Kyle Kloster. WAW2013, KDD2014, WAW2015; J. Internet Math. the answer [5]. Thus, just as in scientific computing, marrying the method to the model is key for the best scientific computing on social networks. Ultimately, none of these steps dif- fer from the practice of physical sci- entific computing. The challenges in creating models, devising algorithms, validating results, and comparing models just take on different chal- lenges when the problems come from social data instead of physical mod- els. Thus, let us return to our starting question: What does the matrix have to do with the social network? Just as in scientific computing, many inter- esting problems, models, and meth- ods for social networks boil down to matrix computations. Yet, as in the expander example above, the types of matrix questions change dramatical- ly in order to fit social network mod- els. Let’s see what’s been done that’s enticingly and refreshingly different from the types of matrix computa- tions encountered in physical scien- tific computing. EXPANDER GRAPHS AND PARALLEL COMPUTING Recently, a coalition of folks from aca- demia, national labs, and industry set out to tackle the problems in parallel computing and expander graphs. They established the Graph 500 benchmark (http://www.graph500.org) to measure the performance of a parallel com- puter on a standard graph computa- tion with an expander graph. Over the past three years, they’ve seen perfor- mance grow by more than 1,000-times Diffusion in a plate Movie interest in diffusion The network, or mesh, from a typical problem in scientific computing n a low dimensional space—think of two or three dimensions. These physical ut limits on the size of the boundary or “surface area” of the space given its No such limits exist in social networks and these two sets are usually about size. A network with this property is called an expander network. Size of set » Size of boundary “Networks” from PDEs are usually physical Social networks are expanders
  • 45. SILO Seminar David Gleich · Purdue Higher order organization of complex networks Joint with Austin Benson and Jure Leskovec 9 10 8 7 2 0 4 3 11 6 5 1 CEPDR CEPVR IL2R OLLR RIAL RIAR RIVL RIVR RMDDR RMDL RMDR RMDVL RMFL SMDDL SMDDR SMDVR URBR By using a new generalization of spectral clustering methods, we are able to find completely novel and relevant structures in complex systems such as the connectome and transport networks.
  • 46. SILO Seminar David Gleich · Purdue SIAM Annual Meeting ! (AN16)! July 11-15, 2016 The Westin Waterfront" Boston, Massachusetts David Gleich, Purdue Mary Silber, Northwestern Big Data, Data Science, and Privacy Education, Communication, and Policy Reproducibility and Ethics Efficiency and Optimization Integrating Models and Data (incl. " computational social science, PDEs) Dynamic Networks (learning, evolution, " adaptation, and cooperation) Applied Math, Statistics, and " Machine Learning Earth systems; environmental/ecological applications Epidemiology
  • 47. Future work Even faster solvers Understand why the solution seems to be rank-2. Better init for Lloyds. SILO Seminar David Gleich · Purdue Solution Z from CVX is even rank 2! 1.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015 2.  NEO-K-means SDP + Aug. Lagrangian" Hou, Whang, Gleich, Dhillon, KDD 2015 3.  Multiplier Methods for Overlapping K-Means" Hou, Whang, Gleich, Dhillon, Submitted