This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow
1. Algorithmic !
Anti-Differentiation!
A case study with !
min-cuts, spectral, and flow
!
!
David F. Gleich · Purdue University!
Michael W. Mahoney · Berkeley ICSI!
Code "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank!
1
2. Algorithmic Anti-differentiation!
Understanding how and why heuristic procedures
• Early stopping
• Truncating small entries
• etc
are actually algorithms for implicit objectives.
2
ICML
David Gleich · Purdue
3. The ideal world
Given Problem P
Derive solution
characterization C
Show algorithm A "
finds a solution where C
holds
Profit?!
Given “min-cut”
Derive “max-flow is
equivalent to min-cut”
Show push-relabel
solves max-flow "
Profit!!
ICML
David Gleich · Purdue
3
4. (The ideal world)’
Given Problem P
Derive solution approx.
characterization C
Show algorithm A’ "
finds a solution where C’
holds
Profit?!
Given “sparsest-cut”
Derive Rayleigh-
quotient approximation
Show power method
finds good Rayleigh
quotient
Profit? !
ICML
David Gleich · Purdue
4
(In academia!)!
5. The real world
Given Task P
Hack around until you
find something useful
Write paper presenting
“novel heuristic” H for P
and
Profit!!
Given “find-communities”
Hack around !
… hidden ..!
Write paper on “three
steps of power method
finds communities”
Profit!!
ICML
David Gleich · Purdue
5
6. (The ideal world)’’
Understand why H works!
Show heuristic H solves P’
Guess and check!
until you find something H
solves
Derive characterization of
heuristic H
Given “find-communities”
Hack around !
!
Write paper on “three
steps of power method
finds communities”
Profit!!
ICML
David Gleich · Purdue
6
7. If your algorithm is related
to optimization, this is:
Given a procedure X, "
what objective does it
optimize?
The real world
Algorithmic Anti-differentiation!
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
In the smooth,
unconstrained case,
this is just “anti-
differentiation!”
ICML
David Gleich · Purdue
7
8. Algorithmic Anti-differentiation
in the literature
Mahoney & Orecchia (2011)
Three steps of the power method and p-norm reg.
Dhillon et al. (2007) "
Spectral clustering, trace minimization & kernel k-means
Saunders (1995) LSQR & Craig iterative methods for Ax = b!
… many more …
ICML
David Gleich · Purdue
8
9. Outline
1. A new derivation of the PageRank vector for an
undirected graph based on Laplacians, cuts, or flows.
2. An understanding of the implicit regularization of
PageRank “push” method.
3. The impact of this on a few applications.
ICML
David Gleich · Purdue
9
10. The PageRank problem
The PageRank random surfer
1. With probability beta, follow a
random-walk step
2. With probability (1-beta), jump
randomly ~ dist. v.
Goal find the stationary dist. x!
!
Sym. adjacency matrix
Diagonal degree matrix
Solution
Jump-vector
(I AD 1
)x = (1 )v
ICML
David Gleich · Purdue
10
[↵D + L]z = ↵v
where
= 1/(1 + ↵)
and x = Dz
Equivalent to
Combinatorial "
Laplacian
11. The Push Algorithm for PageRank!
Proposed (in closest form) in Andersen, Chung, Lang "
(also by McSherry, Jeh & Widom) for personalized PageRank
Strongly related to Gauss-Seidel, coordinate descent
Derived to quickly approximate PageRank with sparsity
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > ⌧dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj ⌧dj ⇢)ej
4. r(k+1)
i =
8
><
>:
⌧dj ⇢ i = j
r(k)
i + (rj ⌧dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
The
Push
Method!
⌧, ⇢
ICML
David Gleich · Purdue
11
13. Why do we care
about push?
1. Used for empirical studies of
“communities” and an
ingredient in an empirically
successful community finder
(Whang et al. CIKM 2013).
2. Used for “fast PageRank”
approximation
3. It produces sparse
approximations to PageRank!
Newman’s netscience!
379 vertices, 1828 nnz
“zero” on most of the nodes
v has a single "
one here
13
ICML
14. minimize kBxkC,1 =
P
ij2E Ci,j |xi xj |
subject to xs = 1, xt = 0, x 0.
The s-t min-cut problem
Unweighted incidence matrix
Diagonal cost matrix
14
ICML
David Gleich · Purdue
15. The localized cut graph
Related to a construction
used in “FlowImprove” "
Andersen & Lang (2007); and
Orecchia & Zhu (2014)
AS =
2
4
0 ↵dT
S 0
↵dS A ↵d¯S
0 ↵dT
¯S 0
3
5
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
ICML
David Gleich · Purdue
15
16. The localized cut graph & PageRank
ICML
David Gleich · Purdue
16
minimize kBSxkC(↵),1
subject to xs = 1, xt = 0
x 0.
Solve the s-t min-cut
17. The localized cut graph & PageRank
ICML
David Gleich · Purdue
17
Solve “spectral” s-t min-cut
minimize kBSxkC(↵),2
subject to xs = 1, xt = 0
x 0.
The PageRank vector z that solves
(↵D + L)z = ↵v
with v = dS/vol(S) is a renormalized
solution of the electrical cut computation:
minimize kBSxkC(↵),2
subject to xs = 1, xt = 0.
Specifically, if x is the solution, then
x =
2
4
1
vol(S)z
0
3
5
18. Back to the push method
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
, = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Proof Write out KKT conditions
Show that the push method
solves them. Slackness was “tricky”
Regularization
for sparsity
ICML
David Gleich · Purdue
18
Need for
normalization
19. A simple example
The vector xpr, z, and x(↵, S ) are the PageRank vectors from Theo-
rem 1, where x(↵, S ) solves Prob. (4) and the others are from the
problems at the end of Section 2. The vector xcut solves the cut
Prob. (2), and zG solves Prob. (6).
Deg. xpr z x(↵, S ) xcut zG
2 0.0788 0.0394 0.8276 1 0.2758
4 0.1475 0.0369 0.7742 1 0.2437
7 0.2362 0.0337 0.7086 1 0.2138
4 0.1435 0.0359 0.7533 1 0.2325
4 0.1297 0.0324 0.6812 1 0.1977
7 0.1186 0.0169 0.3557 0 0
3 0.0385 0.0128 0.2693 0 0
2 0.0167 0.0083 0.1749 0 0
4 0.0487 0.0122 0.2554 0 0
3 0.0419 0.0140 0.2933 0 0
Prob. (6) solves an `1-regularized `2 regression problem)
has 24 non-zeros. The true “min-cut” set is large in both
the 2-norm PageRank problem and the regularized problem.
Thus, we identify the underlying graph feature correctly;
but the implicitly regularized ACL procedure does so with
many fewer non-zeros than the vanilla PageRank procedure.
ICML
David Gleich · Purdue
19
20. David Gleich · Purdue
20
Anti-di↵erentiating Approximat
16 nonzeros 15 nonzeros
Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience
with its vertices enlarged. In the other subfigures, we show the solution vectors
(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each v
values are large and dark. White vertices with outlines are numerically non-zer
outlined, in contrast to the third figure). The true min-cut set is large in all ve
with many fewer non-zeros than the vanilla PageRank problem.
References
Andersen, Reid and Lang, Kevin. An algorithm for improving
graph partitions. In Proceedings of the 19th annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 651–660, 2008.
Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-
titioning using PageRank vectors. In Proceedings of the 47th
Annual IEEE Symposium on Foundations of Computer Science,
Leskov
Mic
clus
Inte
Mahon
regu
of th
143
Anti-di↵erentiating Approximation Algorithms
eros 15 nonzeros 284 nonzeros 24 nonzeros
of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlighted
arged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),
with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where high
dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure are
t to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves this
Push’s sparsity
helps it identify
the “right” graph
feature with fewer
non-zeros
The set S
The mincut solution
The push solution
The PageRank solution
ICML
21. It’s easy to make this apply broadly
Easy to cook up interesting diffusion-like problems and adapt them to this
framework. In particular, Zhou et al. (2004) gave a semi-supervised learning
diffusion we are currently studying …
2
4
0 eT
S 0
eS ✓A e¯S
0 e¯S 0
3
5 .
ICML
David Gleich · Purdue
21
minimize 1
2 kBS ˆxk
2
2 + kˆxk1
subject to ˆxs = 1, ˆxt = 0, ˆx 0
minimize 1
2 xT
(I + ✓L)x xT
eS + kxk1
subject to x 0
22. Anti-di↵erentiating Approximation Algorithms
16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros
Figure 1. Examples of the di↵erent cut vectors on a portion of the net-science graph. At left we show the set S highlighted
Recap & Conclusions
ICML
David Gleich · Purdue
22
Open issues!
Better treatment of directed graphs?
Algorithm for rho < 1?!
rho set to ½ in most “uses”
Need new analysis
(Coming soon)"
Improvements to semi-supervised
learning on graphs!
Key point
We don’t solve the 1-norm
regularized problem with
a 1-norm solver, but with
the efficient push method.
Run push, and you get a
1-norm reg. with early
stopping
David Gleich · Purdue
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich
1. “Defined” alg.
anti-diff to
understand why
heuristics work.
2. Found equiv. w/
PageRank and
cut / flow.
3. Push & 1-norm
regularization.
23. PageRank à s-t min-cut
That equivalence works if s is degree-weighted.
What if s is the uniform vector?
A(s) =
2
4
0 ↵sT
0
↵s A ↵(d s)
0 ↵(d s)T
0
3
5 .
David Gleich · Purdue
23
MMDS 2014