Submodularity slides

Beyond Convexity –
Submodularity in Machine Learning

Andreas Krause, Carlos Guestrin
Carnegie Mellon University

International Conference on Machine Learning | July 5, 2008

Carnegie Mellon

Acknowledgements
Thanks for slides and material to Mukund Narasimhan,
Jure Leskovec and Manuel Reyes Gomez

MATLAB Toolbox and details for references available at

http://www.submodularity.org

Algorithms implemented M

2

Optimization in Machine Learning
+ Classify + from – by
w1 finding a separating
+ + -
- hyperplane (parameters w)
+ + -
-
w* - - Which one should we choose?
w2

Define loss L(w) = “1/size of margin”

 Solve for best vector w* = argminw L(w)

L(w)
Key observation: Many problems in ML are convex!
w
 no local minima!!  3

Feature selection
Given random variables Y, X1, … Xn
Want to predict Y from subset XA = (Xi1,…,Xik)
Y
“Sick”
Naïve Bayes
X1 X2 X3 Model
“Fever” “Rash” “Male”

Want k most informative features:

A* = argmax IG(XA; Y) s.t. |A| · k
Uncertainty Uncertainty
before knowing XA after knowing XA
where IG(XA; Y) = H(Y) - H(Y | XA)

4

Factoring distributions
V
Given random variables X1,…,Xn X1 X3 X2
X4 X6 X5 X7
Partition variables V into sets A and
VnA as independent as possible
X1 X3 X2 X7
X4 X6 X5
Formally: Want A
VnA
A* = argminA I(XA; XVnA) s.t. 0<|A|<n

where I(XA,XB) = H(XB) - H(XB j XA)

Fundamental building block in structure learning
[Narasimhan&Bilmes, UAI ’04]
5

Combinatorial problems in ML
Given a (finite) set V, function F: 2V ! R, want
A* = argmin F(A) s.t. some constraints on A

Solving combinatorial problems:
Mixed integer programming?
Often difficult to scale to large problems
Relaxations? (e.g., L1 regularization, etc.)
Not clear when they work
This talk:
Fully combinatorial algorithms (spanning tree, matching, …)
Exploit problem structure to get guarantees about solution!

6

Example: Greedy algorithm for feature selection
Given: finite set V of features, utility function F(A) = IG(XA; Y)
Want: A*µ V such that

Y
“Sick”
NP-hard!
X1 X2 X3
Greedy algorithm: M “Fever” “Rash” “Male”
Start with A = ;
For i = 1 to k
s* := argmaxs F(A [ {s})
A := A [ {s*}

How well can this simple heuristic do? 7

Key property: Diminishing returns
Selection A = {} Selection B = {X2,X3}
Y Y
“Sick” “Sick”

X2 X3
“Rash” “Male”
Adding X1 X1 Adding X1
Theorem [Krause, Guestrin UAI ‘05]: Information gain F(A) in
“Fever”
will help a lot! doesn’t help much
Naïve Bayes models is submodular!
New
feature X1
Submodularity: B A + s Large improvement
+ s Small improvement

For Aµ B, F(A [ {s}) – F(A) ¸ F(B [ {s}) – F(B)
8

Why is submodularity useful?

Theorem [Nemhauser et al ‘78]
Greedy maximization algorithm returns Agreedy:
F(Agreedy) ¸(1-1/e) max|A|·k F(A)

~63%

Greedy algorithm gives near-optimal solution!
More details and exact statement later
For info-gain: Guarantees best possible unless P = NP!
[Krause, Guestrin UAI ’05]

9

Submodularity in Machine Learning
In this tutorial we will see that many ML problems are
submodular, i.e., for F submodular require:
Minimization: A* = argmin F(A)
Structure learning (A* = argmin I(XA; XVnA))
Clustering
MAP inference in Markov Random Fields
…
Maximization: A* = argmax F(A)
Feature selection
Active learning
Ranking
…
10

Tutorial Overview
 Examples and properties of submodular functions

 Submodularity and convexity

 Minimizing submodular functions

 Maximizing submodular functions

 Research directions, …

LOTS of applications to Machine Learning!!
11

Submodularity
Properties and Examples

Carnegie Mellon

Set functions
Finite set V = {1,2,…,n}
Function F: 2V ! R
Will always assume F(;) = 0 (w.l.o.g.)
Assume black-box that can evaluate F for any input A
Approximate (noisy) evaluation of F is ok (e.g., [37])
Example: F(A) = IG(XA; Y) = H(Y) – H(Y | XA)
= ∑y,x P(xA) [log P(y | xA) – log P(y)]
A
Y Y
“Sick” “Sick”

X1 X2 X2 X3
“Fever” “Rash” “Rash” “Male”
F({X1,X2}) = 0.9 F({X2,X3}) = 0.5 13

Submodular set functions
Set function F on V is called submodular if
For all A,B µ V: F(A)+F(B) ¸ F(A[B)+F(AÅB)

+ ¸ +
A A [ BB
AÅB

Equivalent diminishing returns characterization:

Submodularity: B A + S Large improvement
+ S Small improvement

For AµB, s∉B, F(A [ {s}) – F(A) ¸ F(B [ {s}) – F(B)
14

Submodularity and supermodularity
Set function F on V is called submodular if
1) For all A,B µ V: F(A)+F(B) ¸ F(A[B)+F(AÅB)
 2) For all AµB, s∉B, F(A [ {s}) – F(A) ¸ F(B [ {s}) – F(B)

F is called supermodular if –F is submodular
F is called modular if F is both sub- and supermodular
for modular (“additive”) F, F(A) = ∑i2A w(i)

15

Example: Set cover
Place sensors Want to cover floorplan with discs
in building Possible
OFFICE OFFICE QUIET PHONE

locations
CONFERENCE

STORAGE

V
LAB

ELEC COPY

SERVER

KITCHEN

Node predicts For A µ V: F(A) = “area
values of positions covered by sensors placed at A”
with some radius
Formally:
W finite set, collection of n subsets Si µ W
For A µV={1,…,n} define F(A) = |i2 A Si|
16

Set cover is submodular
A={S1,S2}

CONFERENCE

S1 S2
STORAGE

LAB

ELEC COPY

SERVER

S’
KITCHEN

F(A[{S’})-F(A)

¸
CONFERENCE

F(B[{S’})-F(B)
S1 S2
STORAGE

LAB

ELEC COPY

S3
SERVER

S4
KITCHEN

S’
B = {S1,S2,S3,S4}
17

Example: Mutual information
Given random variables X1,…,Xn
F(A) = I(XA; XVnA) = H(XVnA) – H(XVnA |XA)

Lemma: Mutual information F(A) is submodular

F(A [ {s}) – F(A) = H(Xsj XA) – H(Xsj XVn(A[{s}) )
Nonincreasing in A: Nondecreasing in A
AµB ) H(Xs|XA) ¸ H(Xs|XB)

δ s(A) = F(A[{s})-F(A) monotonically nonincreasing
 F submodular  18

Example: Influence in social networks
[Kempe, Kleinberg, Tardos KDD ’03]
Dorothy Eric
Alice 0.2
Prob. of
influencing
0.5 0.4
0.3 0.2 0.5

Bob
0.5 Fiona
Charlie

Who should get free cell phones?
V = {Alice,Bob,Charlie,Dorothy,Eric,Fiona}
F(A) = Expected number of people influenced when targeting A
19

Influence in social networks is submodular
Dorothy Eric
Alice 0.2

0.5 0.4
0.3 0.2 0.5

Bob
0.5 Fiona
Charlie

Key idea: Flip coins c in advance  “live” edges

Fc(A) = People influenced under outcome c (set cover!)

F(A) = ∑ P(c) F (A) is submodular as well! 20

Closedness properties
F1,…,Fm submodular functions on V and λ1,…,λm > 0
Then: F(A) = ∑i λi Fi(A) is submodular!

Submodularity closed under nonnegative linear
combinations!

Extremely useful fact!!
Fθ(A) submodular ) ∑θ P(θ) Fθ(A) submodular!
Multicriterion optimization:
F1,…,Fm submodular, λi¸0 ) ∑i λi Fi(A) submodular

21

Submodularity and Concavity
Suppose g: N ! R and F(A) = g(|A|)
Then F(A) submodular if and only if g concave!
E.g., g could say “buying in bulk is cheaper”

g(|A|)

|A|
22

Maximum of submodular functions
Suppose F1(A) and F2(A) submodular.
Is F(A) = max(F1(A),F2(A)) submodular?

F(A) = max(F1(A),F2(A))

F1(A)
F2(A)

|A|
max(F1,F2) not submodular in general!
23

Minimum of submodular functions
Well, maybe F(A) = min(F1(A),F2(A)) instead?

F1(A) F2(A) F(A)
F({b}) – F(;)=0
; 0 0 0
<
{a} 1 0 0
F({a,b}) – F({a})=1
{b} 0 1 0
{a,b} 1 1 1

min(F1,F2) not submodular in general!

But stay tuned – we’ll address mini Fi later!
24

Duality
For F submodular on V let G(A) = F(V) – F(VnA)
G is supermodular and called dual to F
Details about properties in [Fujishige ’91]

F(A)

|A|

|A|
G(A)
25

Tutorial Overview
Examples and properties of submodular functions
Many problems submodular (mutual information, influence, …)
SFs closed under positive linear combinations;
not under min, max
Submodularity and convexity

Minimizing submodular functions

Maximizing submodular functions

Extensions and research directions
26

Submodularity
and Convexity

Carnegie Mellon

For V = {1,…,n}, and A µ V, let
wA = (w1A,…,wnA) with
wiA = 1 if i 2 A, 0 otherwise

Key result [Lovasz ’83]: Every submodular function
F induces a function g on Rn+, such that
F(A) = g(wA) for all A µ V
g(w) is convex
minA F(A) = minw g(w) s.t. w 2 [0,1]n

Let’s see how one can define g(w)
28

The submodular polyhedron PF
PF = {x 2 Rn: x(A) · F(A) for all A µ V} Example: V = {a,b}
A F(A)
x(A) = ∑i2 A xi ; 0
{a} -1
{b} 2
{a,b} 0
x{b}
2 x({b}) · F({b})
PF
1 x({a,b}) · F({a,b})

-2 -1 0 1 x{a}

x({a}) · F({a})
29

Lovasz extension
Claim: g(w) = maxx2PF wTx
PF = {x 2 Rn: x(A) · F(A) for all A µ V}
w{b} xw=argmaxx2 PF wT x
xw
2 g(w)=wT xw
w
1

-2 -1 0 1 w{a}

Evaluating g(w) requires solving a linear program with
exponentially many constraints 

30

x{b}
xw
2
Evaluating the Lovasz extension w
1

g(w) = maxx2PF wTx -2 -1 0 1 x{a}

PF = {x 2 Rn: x(A) · F(A) for all A µ V}

Theorem [Edmonds ’71, Lovasz ‘83]:
For any given w, can get optimal solution xw to the LP
using the following greedy algorithm:
Order V={e1,…,en} so that w(e1)¸ …¸ w(en) M
Let xw(ei) = F({e1,…,ei}) – F({e1,…,ei-1})

Then wT xw = g(w) = maxx2 PF wT x

Sanity check: If w = wA and A={e1,…,ek}, then
AT k 31

Example: Lovasz extension
A F(A)
g(w) = max {wT x: x 2 PF} ; 0
w{b} {a} -1
{b} 2
[-2,2] 2 {a,b} 0
{b} {a,b}
[-1,1] 1 w=[0,1]
{} {a} want g(w)
-2 -1 0 1 w{a}
Greedy ordering:
e1 = b, e2 = a
 w(e1)=1 > w(e2)=0
g([0,1]) = [0,1]T [-2,2] = 2 = F({b})
xw(e1)=F({b})-F(;)=2
xw(e2)=F({b,a})-F({b})=-2
g([1,1]) = [1,1]T [-1,1] = 0 = F({a,b})  xw=[-2,2]
32

Why is this useful?
x{b}
2 [0,1]2
Theorem [Lovasz ’83]: 1
g(w) attains its minimum in [0,1]n at a corner!
0 1 x{a}

If we can minimize g on [0,1]n, can minimize F…
(at corners, g and F take same values)
F(A) submodular g(w) convex
(and efficient to evaluate)

Does the converse also hold?
No, consider g(w1,w2,w3) = max(w1,w2+w3)

{a} {b} {c} F({a,b})-F({a})=0 < F({a,b,c})-F({a,c})=1 33

Tutorial Overview
SFs closed under positive linear combinations;
not under min, max
Every SF induces a convex function with SAME minimum
Special properties: Greedy solves LP over exponential polytope


34

Minimization of
submodular functions

Carnegie Mellon

Overview minimization
Minimizing general submodular functions

Minimizing symmetric submodular functions

Applications to Machine Learning

36

Minimizing a submodular function
Want to solve A* = argminA F(A)
Need to solve
minw maxx wTx g(w)
s.t. w2[0,1]n, x2PF

Equivalently:
minc,w c
s.t. c ¸ wT x for all x2PF
w2 [0,1]n

37

Ellipsoid algorithm
[Grötschel, Lovasz, Schrijver ’81]
Feasible region Optimality
direction

minc,w c
s.t. c ¸ wT x for all x2PF
w2 [0,1]n
Separation oracle: Find most violated constraint:
maxx wT x – c s.t. x 2 PF
Can solve separation using the greedy algorithm!!
 Ellipsoid algorithm minimizes SFs in poly-time! 38

Ellipsoid algorithm not very practical
Want combinatorial algorithm for minimization!

Theorem [Iwata (2001)]
There is a fully combinatorial, strongly polynomial
algorithm for minimizing SFs, that runs in time
O(n8 log2 n)

Polynomial-time = Practical ???

39

A more practical alternative?
[Fujishige ’91, Fujishige et al ‘06]
x({a,b})=F({a,b}) x A F(A)
{b}
; 0
2 {a} -1
x* {b} 2

[-1,1] 1 Base polytope: {a,b} 0
BF = PF Å {x(V) = F(V)}

-2 -1 0 1 x{a}

Minimum norm algorithm: M
Find x* = argmin ||x||2 s.t. x 2 BF x*=[-1,1]
Return A* = {i: x*(i) < 0} A*={a}

Theorem [Fujishige ’91]: A* is an optimal solution!
Note: Can solve 1. using Wolfe’s algorithm
Runtime finite but unknown!!  40

Empirical comparison
[Fujishige et al ’06]
Cut functions
from DIMACS
Lower is better (log-scale!)

Challenge
Running time (seconds)

Minimum
norm algorithm

64 128 256 512 1024
Problem size (log-scale!)
Minimum norm algorithm orders of magnitude faster!
Our implementation can solve n = 10k in < 6 minutes! 41

Checking optimality (duality)
BF A F(A)
w{b} ; 0
x*
2 {a} -1
{b} 2
[-1,1] 1 {a,b} 0
Base polytope:
-2 -1 0 1 w{a} BF = PF Å {x(V) = F(V)}

Theorem [Edmonds ’70]
A = {a}, F(A) = -1
minA F(A) = maxx {x (V) : x 2 BF}
–
w = [1,0]
where x–(s) = min {x(s), 0} xw = [-1,1]
xw- = [-1,0]
Testing how close A’ is to minA F(A) xw-(V) = -1
Run greedy algorithm for w=wA’ to get xw
 A optimal!
F(A’) ¸ minA F(A) ¸ xw–(V) 42

Can minimizing in polytime using ellipsoid method
Combinatorial, strongly polynomial algorithm O(n^8)
Practical alternative: Minimum norm algorithm?


43

What if we have special structure?
Worst-case complexity of best known algorithm: O(n8 log2n)

Can we do better for special cases?

Example (again): Given RVs X1,…,Xn
F(A) = I(XA; XVnA)
= I(XVnA ; XA)
= F(VnA)

Functions F with F(A) = F(VnA) for all A are symmetric
44

Another example: Cut functions
2 1 3 V={a,b,c,d,e,f,g,h}
a c e g
2 3
2 2 2 3 3 3
b d f h
2 1 3

F(A) = ∑ {ws,t: s2 A, t2 Vn A}
Example: F({a})=6; F({c,d})=10; F({a,b,c,d})=2

Cut function is symmetric and submodular!

45

Minimizing symmetric functions
For any A, submodularity implies
2 F(A) = F(A) + F(VnA)
¸ F(A Å (VnA))+F(A [ (VnA))
= F(;) + F(V)
= 2 F(;) = 0
Hence, any symmetric SF attains minimum at ;
In practice, want nontrivial partition of V into
A and VnA, i.e., require that A is neither ; of V

Want A* = argmin F(A) s.t. 0 < |A| < n

There is an efficient algorithm for doing that!  46

Queyranne’s algorithm (overview)
[Queyranne ’98]
Theorem: There is a fully combinatorial, strongly M
polynomial algorithm for solving

A* = argminA F(A) s.t. 0<|A|<n

for symmetric submodular functions A

Runs in time O(n3) [instead of O(n8)…]

Note: also works for “posimodular” functions:
F posimodular  A,Bµ V: F(A)+F(B) ¸ F(AnB)+F(BnA) 47

Gomory Hu trees
2 1 3 6 2 9
a c e g a c e g
2 3
2 2 2 3 3 3 6 7 10 9
b d f h b d f h
2 1 3
G T
A tree T is called Gomory-Hu (GH) tree for SF F
if for any s, t 2 V it holds that
min {F(A): s2A and t∉A} =
min {wi,j: (i,j) is an edge on the s-t path in T}

“min s-t-cut in T = min s-t-cut in G”
Expensive to
Theorem [Queyranne ‘93]: find one in
GH-trees exist for any symmetric SF F! general!  48

Pendent pairs
For function F on V, s,t2 V: (s,t) is pendent pair if
{s} 2 argminA F(A) s.t. s2A, t∉A
Pendent pairs always exist:
6 2 9
a c e g
7 9 Gomory-Hu
6 10 tree T
b d f h

Take any leaf s and neighbor t, then (s,t) is pendent!
E.g., (a,c), (b,c), (f,e), …

Theorem [Queyranne ’95]: Can find pendent pairs in O(n2)
(without needing GH-tree!)
49

Why are pendent pairs useful?
Key idea: Let (s,t) pendent, A* = argmin F(A)
Then EITHER V
s and t separated by A*, e.g., s2A*, t∉A*. A*
But then A*={s}!! OR s t

s and t are not separated by A*

V V
A* s A*
t s t

Then we can merge s and t…

50

Merging
Suppose F is a symmetric SF on V,
and we want to merge pendent pair (s,t)
Key idea: “If we pick s, get t for free”
V’ = Vn{t}
F’(A) = F(A[{t}) if s2A, or
= F(A) if s∉A

Lemma: F’ is still symmetric and submodular!
2 1 3 1 3
a c e g Merge a,c e g
2 3 (a,c) 3
2 2 2 3 3 3 4 4 3 3 3
b d f h b d f h
2 1 3 2 1 3 51

Queyranne’s algorithm
Input: symmetric SF F on V, |V|=n
Output: A* = argmin F(A) s.t. 0 < |A| < n

Initialize F’ Ã F, and V’ Ã V
For i = 1:n-1
(s,t) Ã pendentPair(F’,V’)
Ai = {s}
(F’,V’) Ã merge(F’,V’,s,t)
Return argmini F(Ai)

Running time: O(n3) function evaluations

52

Note: Finding pendent pairs

Initialize v1 Ã x (x is arbitrary element of V)
For i = 1 to n-1 do
Wi Ã {v1,…,vi}
vi+1 Ã argminv F(Wi[{v}) - F({v}) s.t. v2 VnWi
Return pendent pair (vn-1,vn)

Requires O(n2) evaluations of F

53

Combinatorial, strongly polynomial algorithm O(n8)
Many useful submodular functions are symmetric
Queyranne’s algorithm minimize symmetric SFs in O(n3)

54

Application: Clustering
[Narasimhan, Jojic, Bilmes NIPS ’05]
Group data points V into
“homogeneous clusters”
o V
o Find a partition V=A1 [ … [ Ak
o o o that minimizes
o o A1 o A
o
o o 2
F(A1,…,Ak) = ∑i E(Ai)

“Inhomogeneity of Ai”
Examples for E(A):
Entropy H(A)
Cut function
Special case: k = 2. Then F(A) = E(A) + E(VnA) is symmetric!
If E is submodular, can use Queyranne’s algorithm!  55

What if we want k>2 clusters?
[Zhao et al ’05, Narasimhan et al ‘05]
X2
Greedy Splitting algorithm M X1 X3
X5 X7
X4 X6

Start with partition P = {V}
For i = 1 to k-1 X1
X3
X2 X7
For each member Cj 2 P do X4 X6 X5
split cluster Cj:
A* = argmin E(A) + E(CjnA) s.t. 0<|A|<|Cj|
Pj Ã P n {Cj} [ {A,CjnA} X3
X1 X2 X7
Partition we get by splitting j-th cluster X6
X4 X5
P Ã argminj F(Pj)

Theorem: F(P) · (2-2/k) F(Popt)
56

Example: Clustering species
[Narasimhan et al ‘05]

Common genetic information = #of common substrings:

Can easily extend to sets of species

57

Example: Clustering species
The common genetic information ICG
does not require alignment
captures genetic similarity
is smallest for maximally evolutionarily diverged species
is a symmetric submodular function! 

Greedy splitting algorithm yields phylogenetic tree!

58

Example: SNPs
Study human genetic variation
(for personalized medicine, …)
Most human variation due to point mutations that occur
once in human history at that base location:
Single Nucleotide Polymorphisms (SNPs)
Cataloging all variation too expensive
($10K-$100K per individual!!)

59

SNPs in the ACE gene

Rows: Individuals. Columns: SNPs.
Which columns should we pick to reconstruct the rest?
Can find near-optimal clustering (Queyranne’s algorithm)
60

Reconstruction accuracy

Comparison with
clustering based on
Entropy
Prediction accuracy
Pairwise correlation
# of clusters PCA

61

Example: Speaker segmentation
[Reyes-Gomez, Jojic ‘07] Region A “Fiona”
Mixed waveforms

Frequency
Alice “???”

Time
Fiona “???” E(A)=-log p(XA)

Partition Likelihood of
Spectro- “region” A
“308” gram
F(A)=E(A)+E(VnA)
using
Q-Algo
symmetric
“217” & posimodular
62

Example: Image denoising

63

Example: Image denoising
Pairwise Markov Random Field
Y1 Y2 Y3

X1 X2 X3 P(x1,…,xn,y1,…,yn)
= ∏i,j ψi,j(yi,yj) Πi φi(xi,yi)
Y4 Y5 Y6

X4 X5 X6 Want argmaxy P(y | x)
Y7 Y8 Y9 =argmaxy log P(x,y)
=argminy ∑i,j Ei,j(yi,yj)+∑i Ei(yi)
X7 X8 X9

Xi: noisy pixels Ei,j(yi,yj) = -log ψi,j(yi,yj)
Yi: “true” pixels

When is this MAP inference efficiently solvable
(in high treewidth graphical models)? 64

MAP inference in Markov Random Fields
[Kolmogorov et al, PAMI ’04, see also: Hammer, Ops Res ‘65]

Energy E(y) = ∑i,j Ei,j(yi,yj)+∑i Ei(yi)

Suppose yi are binary, define
F(A) = E(yA) where yAi = 1 iff i2 A
Then miny E(y) = minA F(A)

Theorem
MAP inference problem solvable by graph cuts
 For all i,j: Ei,j(0,0)+Ei,j(1,1) · Ei,j(0,1)+Ei,j(1,0)
 each Ei,j is submodular
“Efficient if prefer that neighboring pixels have same color” 65

Constrained minimization
Have seen: if F submodular on V, can solve
A*=argmin F(A) s.t. A2V

What about
A*=argmin F(A) s.t. A2V and |A| ·k

E.g., clustering with minimum # points per cluster, …

In general, not much known about constrained minimization 
However, can do
A*=argmin F(A) s.t. 0<|A|< n
A*=argmin F(A) s.t. |A| is odd/even [Goemans&Ramakrishnan ‘95]
A*=argmin F(A) s.t. A 2 argmin G(A) for G submodular [Fujishige ’91]
66

Combinatorial, strongly polynomial algorithm O(n8)
Many useful submodular functions are symmetric
Queyranne’s algorithm minimize symmetric SFs in O(n3)
Clustering [Narasimhan et al’ 05]
Speaker segmentation [Reyes-Gomez & Jojic ’07]
MAP inference [Kolmogorov et al ’04]

67

Tutorial Overview
SFs closed under positive linear combinations; not under min, max
Every SF induces a convex function with SAME minimum
Special properties: Greedy solves LP over exponential polytope
Minimization possible in polynomial time (but O(n8)…)
Queyranne’s algorithm minimizes symmetric SFs in O(n3)
Useful for clustering, MAP inference, structure learning, …

68

Maximizing submodular
functions

Carnegie Mellon


Minimizing convex functions: Minimizing submodular functions:
Polynomial time solvable! Polynomial time solvable!

Maximizing convex functions: Maximizing submodular functions:
NP hard! NP hard!

But can get
approximation
guarantees 

70

Maximizing influence
Dorothy Eric
Alice 0.2

0.5 0.4
0.3 0.2 0.5

Bob
0.5 Fiona
Charlie

F(A) = Expected #people influenced when targeting A
F monotonic: If AµB: F(A) ·F(B)
Hence V = argmaxA F(A)
More interesting: argmaxA F(A) – Cost(A) 71

Maximizing non-monotonic functions
maximum
Suppose we want for not monotonic F
A* = argmax F(A) s.t. AµV
|A|
Example:
F(A) = U(A) – C(A) where U(A) is submodular utility,
and C(A) is supermodular cost function
E.g.: Trading off utility and privacy in personalized search
[Krause & Horvitz AAAI ’08]

In general: NP hard. Moreover:
If F(A) can take negative values:
As hard to approximate as maximum independent set
72
1-ε

Maximizing positive submodular functions
[Feige, Mirrokni, Vondrak FOCS ’07]

Theorem
There is an efficient randomized local search procedure,
that, given a positive submodular function F, F(;)=0,
returns set ALS such that
F(ALS) ¸(2/5) maxA F(A)

picking a random set gives ¼ approximation
(½ approximation if F is symmetric!)
we cannot get better than ¾ approximation unless P = NP

73

Scalarization vs. constrained maximization
Given monotonic utility F(A) and cost C(A), optimize:
Option 1: Option 2:
maxA F(A) – C(A) maxA F(A)
s.t. A µ V s.t. C(A) · B
“Scalarization” “Constrained maximization”

Can get 2/5 approx… coming up…
if F(A)-C(A) ¸ 0
for all A µ V

Positiveness is a
strong requirement  74

Constrained maximization: Outline
Monotonic submodular Selected set

Selection cost Budget

Subset selection: C(A) = |A|

Robust optimization Complex constraints

75

Monotonicity
A set function is called monotonic if
AµBµV ) F(A) · F(B)

Examples:
Influence in social networks [Kempe et al KDD ’03]
For discrete RVs, entropy F(A) = H(XA) is monotonic:
Suppose B=A [C. Then
F(B) = H(XA, XC) = H(XA) + H(XC | XA) ¸ H(XA) = F(A)
Information gain: F(A) = H(Y)-H(Y | XA)
Set cover
Matroid rank functions (dimension of vector spaces, …)
…
76

Subset selection
Given: Finite set V, monotonic submodular function F, F(;) = 0


NP-hard!

77

Exact maximization of monotonic submodular functions
1) Mixed integer programming [Nemhauser et al ’81]

max η
s.t. η · F(B) + ∑s2VnB αs δ s(B) for all B µ S
∑s αs · k
αs 2 {0,1}

where δ s(B) = F(B [ {s}) – F(B)

Solved using constraint generation
2) Branch-and-bound: “Data-correcting algorithm” M
[Goldengorin et al ’99]
Both algorithms worst-case exponential! 78

Approximate maximization
Given: finite set V, monotonic submodular function F(A)

NP-hard! Y
“Sick”
Greedy algorithm: M
X1 X2 X3
Start with A0 = ;
“Fever” “Rash” “Male”
For i = 1 to k
si := argmaxs F(Ai-1 [ {s}) - F(Ai-1)
Ai := Ai-1 [ {si}

79

Performance of greedy algorithm

Theorem [Nemhauser et al ‘78]
Given a monotonic submodular function F, F(;)=0, the
greedy maximization algorithm returns Agreedy
F(Agreedy) ¸(1-1/e) max|A|· k F(A)

~63%

Sidenote: Greedy algorithm gives
1/2 approximation for
maximization over any matroid C!
[Fisher et al ’78]

80

An “elementary” counterexample
X1, X2 ~ Bernoulli(0.5) X1 X2
Y = X1 XOR X2

Y
Let F(A) = IG(XA; Y) = H(Y) – H(Y|XA)

Y | X1 and Y | X2 ~ Bernoulli(0.5) (entropy 1)
Y | X1,X2 is deterministic! (entropy 0)

Hence F({1,2}) – F({1}) = 1, but
F({2}) – F(;) =0
F(A) submodular under some conditions! (later) 81

Example: Submodularity of info-gain
Y1,…,Ym, X1, …, Xn discrete RVs
F(A) = IG(Y; XA) = H(Y)-H(Y | XA)
F(A) is always monotonic
However, NOT always submodular

Theorem [Krause & Guestrin UAI’ 05]
If Xi are all conditionally independent given Y,
then F(A) is submodular!

Y1 Y2 Y3 Hence, greedy algorithm works!

In fact, NO algorithm can do better
X1 X2 X3 X4 than (1-1/e) approximation! 82

Building a Sensing Chair
[Mutlu, Krause, Forlizzi, Guestrin, Hodgins UIST ‘07]
People sit a lot
Activity recognition in
assistive technologies
Seating pressure as
user interface

Equipped with
1 sensor per cm2!

Costs $16,000! 
Lean Lean Slouch
Can we get similar left forward
accuracy with fewer, 82% accuracy on
cheaper sensors? 10 postures! [Tan et al]83

How to place sensors on a chair?
Sensor readings at locations V as random variables
Predict posture Y using probabilistic model P(Y,V)
Pick sensor locations A* µ V to minimize entropy:

Possible locations V

Placed sensors, did a user study:
Accuracy Cost
Before 82% $16,000 
After 79% $100 

Similar accuracy at <1% of the cost!
84

Variance reduction
(a.k.a. Orthogonal matching pursuit, Forward Regression)
Let Y = ∑i αi Xi+ε, and (X1,…,Xn,ε) » N(¢; µ,Σ)
Want to pick subset XA to predict Y
Var(Y | XA=xA): conditional variance of Y given XA = xA
Expected variance: Var(Y | XA) = s p(xA) Var(Y | XA=xA) dxA
Variance reduction: FV(A) = Var(Y) – Var(Y | XA)

FV(A) is always monotonic
*under some
Theorem [Das & Kempe, STOC ’08] conditions on Σ
FV(A) is submodular*
 Orthogonal matching pursuit near optimal!
[see other analyses by Tropp, Donoho et al., and Temlyakov]
85

Batch mode active learning [Hoi et al, ICML’06]
o o
Which data points o should we
o +
o
o –
o o o label to minimize error?
+ o
o
o
– o
o Want batch A of k points to
o show an expert for labeling

F(A) selects examples that are
uncertain [σ2(s) = π(s) (1-π(s)) is large]
diverse (points in A are as different as possible)
relevant (as close to VnA is possible, sT s’ large)
F(A) is submodular and monotonic!
[approximation to improvement in Fisher-information] 86

Results about Active Learning
[Hoi et al, ICML’06]

Batch mode Active Learning performs better than
Picking k points at random
Picking k points of highest entropy
87

Monitoring water networks
[Krause et al, J Wat Res Mgt 2008]

Contamination of drinking water
could affect millions of people
Contamination
Sensors

Simulator from EPA Hach Sensor
Place sensors to detect contaminations ~$14K
“Battle of the Water Sensor Networks” competition
Where should we place sensors to quickly detect contamination?
88

Model-based sensing
Utility of placing sensors based on model of the world
For water networks: Water flow simulator from EPA
F(A)=Expected impact reduction placing sensors at A
Model predicts Low impact
High impact
Theorem [Krause et al., J Wat location :
Contamination Res Mgt ’08]
Impact reduction F(A) in water networks is submodular!
Medium impact
S3 location
S1 2
S S3

S1 S4 S2
Sensor reduces
S4
impact through
Set V of all early detection!
S1
network junctions
High impact reduction F(A) = 0.9 Low impact reduction F(A)=0.01
89

Battle of the Water Sensor Networks Competition
Real metropolitan area network (12,527 nodes)
Water flow simulator provided by EPA
3.6 million contamination events
Multiple objectives:
Detection time, affected population, …
Place sensors that detect well “on average”

90

Bounds on optimal solution
[Krause et al., J Wat Res Mgt ’08]
1.4
Offline
Population protected F(A)

1.2 (Nemhauser)
bound
Higher is better

1

0.8

0.6 Water
Greedy
networks
0.4 solution
data
0.2

0
0 5 10 15 20

Number of sensors placed
(1-1/e) bound quite loose… can we get better bounds?
91

Data dependent bounds
[Minoux ’78]
Suppose A is candidate solution to

argmax F(A) s.t. |A| · k
and A* = {s1,…,sk} be an optimal solution

Then F(A*) · F(A [ A*)
= F(A)+∑i F(A[{s1,…,si})-F(A[ {s1,…,si-1})
· F(A) + ∑i (F(A[{si})-F(A))
= F(A) + ∑i δ si

M
For each s 2 VnA, let δ s = F(A[{s})-F(A)
Order such that δ 1 ¸ δ 2 ¸ … ¸ δ n 92

Bounds on optimal solution
[Krause et al., J Wat Res Mgt ’08]
1.4
Offline
1.2 (Nemhauser)
Sensing quality F(A)

bound Data-dependent
Higher is better

1 bound
0.8

0.6 Water
Greedy
networks
0.4 solution
data
0.2

0
0 5 10 15 20

Number of sensors placed
Submodularity gives data-dependent bounds on the
performance of any algorithm 93

BWSN Competition results
[Ostfeld et al., J Wat Res Mgt 2008]
13 participants
Performance measured in 30 different criteria
G: Genetic algorithm D: Domain knowledge
H: Other heuristic E: “Exact” method (MIP)
30
25
G H E
Higher is better

20
Total Score

E G 15
H
G 10

G H D D G 5
0
.

i

l.
.

.
an
i

s
r
ld

ll

ou

sk

.
el

al

al

al
lle

on

ta
do

al
tfe

Gu

m

al
rp
et

et

et
Pi

m
rk

ie
W
ht
Os

et
ca

lo
an

&
Ba

g

rry
rin
ac

an

&
ly

Sa
&

o
Gu

se
Be
Tr

Po

Do
&

at

u
Hu
s

&

W

24% better performance than runner-up! 
op
ei

i re

au
&

ld
Pr

Pr
im

s

tfe

Kr
de
Gh

Os
ia
El

94

What was the trick?
Simulated all 3.6M contaminations on 2 weeks / 40 processors
152 GB data on disk, 16 GB in main memory (compressed)
 Very accurate computation of F(A) Very slow evaluation of F(A) 
30 hours/20 sensors
Running time (minutes)

300
Lower is better

Exhaustive search
(All subsets)
6 weeks for all
200
30 settings 
Naive
greedy
ubmodularity
100
to the rescue
0
1 2 3 4 5 6 7 8 9 10
Number of sensors selected

95

Scaling up greedy algorithm
[Minoux ’78]
In round i+1,
have picked Ai = {s1,…,si}
pick si+1 = argmaxs F(Ai [ {s})-F(Ai)
I.e., maximize “marginal benefit” δ s(Ai)

δ s(Ai) = F(Ai [ {s})-F(Ai)

Key observation: Submodularity implies
δ s(Ai) ¸ δ s(Ai+1)
i · j ) δ s(Ai) ¸ δ s(Aj) s

Marginal benefits can never increase!
96

“Lazy” greedy algorithm
[Minoux ’78]

Lazy greedy algorithm: M
 First iteration as usual Benefit δ s(A)
a
Keep an ordered list of marginal
benefits δ i from previous iteration b
d
Re-evaluate δ i only for top b
c
element
e
d
If δ i stays on top, use it,
c
e
otherwise re-sort

Note: Very easy to compute online bounds, lazy evaluations, etc.
[Leskovec et al. ’07]
97

Result of lazy evaluation
Simulated all 3.6M contaminations on 2 weeks / 40 processors
152 GB data on disk, 16 GB in main memory (compressed)
 Very accurate computation of F(A) Very slow evaluation of F(A) 
30 hours/20 sensors
Running time (minutes)

300
Lower is better

Exhaustive search
(All subsets)
6 weeks for all
200
30 settings 
Naive
greedy
ubmodularity
100 Fast greedy
to the rescue:
0 Using “lazy evaluations”:
1 2 3 4 5 6 7 8 9 10
Number of sensors selected 1 hour/20 sensors
Done after 2 days! 
98

What about worst-case?
[Krause et al., NIPS ’07]
Knowing the sensor locations, an
adversary contaminates here!

S3
S2 S3

S4 S2
S1
S4

S1

Placement detects Very different average-case impact,
well on “average-case” Same worst-case impact
(accidental) contamination
Where should we place sensors to quickly detect in the worst case?
99

Utility function Selected set


Subset selection


100

Optimizing for the worst case
Separate utility function Fi for each contamination i
Fi(A) = impact reduction by sensors A for contamination i
Contamination
at node s Sensors A
Want to solve (A)
Fs(B) is high
Sensors B

Contamination
Each of the Fi is submodular at node r
Unfortunately, mini Fi not submodular! Fr(A) is high
(B) low
101
How can we solve this robust optimization problem?

How does the greedy algorithm do?
V={ , , }
Can only buy k=2 Set A F1 F2 mini Fi
1 0 0
Greedy picks
0 2 0 first
Optimal ε ε ε
Hence we can’t find any Then, can
solution 1 ε ε
choose only
approximation algorithm.
ε 2 ε or
Optimal score: 1 Or can 2 1
1 we? Greedy score: ε
 Greedy does arbitrarily badly. Is there something better?
Theorem [NIPS ’07]: The problem max|A|· k mini Fi(A)
does not admit any approximation unless P=NP
102

Alternative formulation
If somebody told us the optimal value,

can we recover the optimal solution A*?

Need to find

Is this any easier?

Yes, if we relax the constraint |A| · k
103

Solving the alternative problem
Trick: For each Fi and c, define truncation
Fi(A)
c
Remains F’i,c(A)
submodular!

|A|

Problem 1 (last slide) Problem 2

Non-submodular  optimal solutions!Submodular!
Same
Don’t know how to solve solves the other as constraint?
Solving one But appears
104

Maximization vs. coverage
Previously: Wanted
A* = argmax F(A) s.t. |A| · k

Now need to solve:
A* = argmin |A| s.t. F(A) ¸ Q
M
Greedy algorithm:
Start with A := ;; For bound, assume
While F(A) < Q and |A|< n F is integral.
s* := argmaxs F(A [ {s}) If not, just round it.
A := A [ {s*}

Theorem [Wolsey et al]: Greedy will return Agreedy
|A | · (1+log max F({s})) |A | 105

Solving the alternative problem
Trick: For each Fi and c, define truncation
Fi(A)
c
F’i,c(A)

|A|

Problem 1 (last slide) Problem 2

Non-submodular  Submodular!
Don’t know how to solve Can use greedy algorithm!
106

Back to our example

Guess c=1 Set A F1 F2 mini Fi F’avg,1
First pick 1 0 0 ½
Then pick 0 2 0 ½
 Optimal solution! ε ε ε ε
1 ε ε (1+ε)/2
ε 2 ε (1+ε)/2

1 2 1 1

How do we find c?
Do binary search!
107

SATURATE Algorithm
[Krause et al, NIPS ‘07]
Given: set V, integer k and monotonic SFs F1,…,Fm M
Initialize cmin=0, cmax = mini Fi(V)
Do binary search: c = (cmin+cmax)/2
Greedily find AG such that F’avg,c(AG) = c
If |AG| · α k: increase cmin
If |AG| > α k: decrease cmax
until convergence

Truncation
threshold
(color)

108

Theoretical guarantees
[Krause et al, NIPS ‘07]

Theorem: The problem max|A|· k mini Fi(A)
does not admit any approximation unless P=NP 
Theorem: SATURATE finds a solution AS such that

mini Fi(AS) ¸ OPTk and |AS| · α k

where OPTk = max|A|·k mini Fi(A)
α = 1 + log maxs ∑i Fi({s})
Theorem:
If there were a polytime algorithm with better factor
β < α, then NP µ DTIME(nlog log n)
109

Example: Lake monitoring
Monitor pH values using robotic sensor
transect

Prediction at unobserved
Observations A locations
pH value

True (hidden) pH values

Var(s | A) Use probabilistic model
(Gaussian processes)
Position s along transect to estimate prediction error

Where should we sense to minimize our maximum error?

 Robust submodular
(often) submodular
optimization problem! [Das & Kempe ’08] 110

Comparison with state of the art
Algorithm used in geostatistics: Simulated Annealing
[Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…]
7 parameters that need to be fine-tuned
0.25

Maximum marginal variance
nce

2.5
0.2
a um arginal varia
better

G dy
ree Greedy
2
0.15

SATURATE 1.5 Simulated
M xim m

0.1 Annealing
1
0.05 S u
im lated
A n alin
ne g SATURATE
0.5
0
0 20 40 60 0 20 40 60 80 100

SATURATE is competitive & 10x faster
N m er of se sors
u b n Number of sensors

Environmental monitoring Precipitation data
No parameters to tune!
111

Results on water networks
3000
Maximum detection time (minutes) No decrease
2500 until all
Greedy
Lower is better

2000
contaminations
Simulated detected!
1500 Annealing

1000

500 SATURATE
0
0 10 20 Water networks
Number of sensors

60% lower worst-case detection time!

112

Worst- vs. average case
Given: Set V, submodular functions F1,…,Fm

Average-case score Worst-case score

Too optimistic? Very pessimistic!

Want to optimize both average- and worst-case score!

Can modify SATURATE to solve this problem! 
Want: Fac(A) ¸ cac and Fwc(A) ¸ cwc
Truncate: min{Fac(A),cac} + min{Fwc(A),cwc} ¸ cac+cwc 113

Worst- vs. average case
7000
Only optimize for
6000 average case
Worst case impact
lower is better

5000
Knee in
4000
tradeoff curve Water
3000 Tradeoffs networks
(SATURATE)
2000
data

1000 Only optimize for worst case
0
0 50 100 150 200 250 300 350
Can findAverage case impact
good compromise between
average- and is better
lower worst-case score!

114

Utility function Selected set


Subset selection


115

Other aspects: Complex constraints
maxA F(A) or maxA mini Fi(A) subject to
So far: |A| · k
In practice, more complex constraints:
Different costs: C(A) · B

Locations need to be Sensors need to communicate
connected by paths (form a routing tree)
[Chekuri & Pal, FOCS ’05]
[Singh et al, IJCAI ’07]

Lake monitoring Building monitoring

116

Non-constant cost functions
For each s 2 V, let c(s)>0 be its cost
(e.g., feature acquisition costs, …)
Cost of a set C(A) = ∑s2 A c(s) (modular function!)
Want to solve

A* = argmax F(A) s.t. C(A) · B

Cost-benefit greedy algorithm: M
Start with A := ;;
While there is an s2VnA s.t. C(A[{s}) · B

A := A [ {s*}
117

Performance of cost-benefit greedy
Want Set A F(A) C(A)
{a} 2ε ε
maxA F(A) s.t. C(A)· 1 {b} 1 1

Cost-benefit greedy picks a.
Then cannot afford b!

 Cost-benefit greedy performs arbitrarily badly!

118

Cost-benefit optimization
[Wolsey ’82, Sviridenko ’04, Leskovec et al ’07]
Theorem [Leskovec et al. KDD ‘07]
ACB: cost-benefit greedy solution and
AUC: unit-cost greedy solution (i.e., ignore costs)
Then
max { F(ACB), F(AUC) } ¸ ½ (1-1/e) OPT

Can still compute online bounds and
speed up using lazy evaluations

Note: Can also get
(1-1/e) approximation in time O(n4) [Sviridenko ’04]
Slightly better than ½ (1-1/e) in O(n2) [Wolsey ‘82]

119

Example: Cascades in the Blogosphere
[Leskovec, Krause, Guestrin, Faloutsos, VanBriesen, Glance ‘07]
Time

Learn about
story after us!

Information
cascade
Which blogs should we read to learn about big cascades early? 120

Water vs. Web

Placing sensors in Selecting
water networks vs. informative blogs
In both problems we are given
Graph with nodes (junctions / blogs) and edges (pipes / links)
Cascades spreading dynamically over the graph (contamination / citations)
Want to pick nodes to detect big cascades early
In both applications, utility functions submodular 
[Generalizes Kempe et al, KDD ’03]
121

Performance on Blog selection
0.7 400

Lower is better
Greedy
Higher is better

Running time (seconds)
0.6 Exhaustive search
Cascades captured

300 (All subsets)
0.5

0.4 Naive
200 greedy
In-links
0.3 All outlinks
0.2 # Posts 100
Fast greedy
0.1 Random
0
0
0 20 40 60 80 100 1 2 3 4 5 6 7 8 9 10
Number of blogs Number of blogs selected

Blog selection Blog selection
~45k blogs
Outperforms state-of-the-art heuristics
700x speedup using submodularity!
122

Cost of reading a blog skip

Naïve approach: Just pick 10 best blogs
Selects big, well known blogs (Instapundit, etc.)
These contain many posts, take long to read!
Cost/benefit
0.6
Cascades captured

analysis

0.4
Ignoring cost
0.2

0
0
0 2 1 4 26 8 3 10 412 5
14
4
x10
Cost(A) = Number of posts / day

Cost-benefit optimization picks summarizer blogs!
123

Predicting the “hot” blogs
Want blogs that will be informative in the future
Split data set; train on historic, test on future
Detects on training set
0.25 Greedy on future G edy
re
Test on future
Cascades captured

s
“Cheating”

ction
0.2
200

#dete
0.15
0
Jan Feb M ar Ar
p May
0.1 Greedy on historic
Test on future Blog selection “overfits”
Detect well
S rate
atu
Detect poorly
s
0.05
#detection 200
here! training data!
to here!
0
0
0 2 1000
4 6 2000
8 10
3000 12 14
4000 Poor generalization!
Cost(A) = Number of posts / day 0
Want blogs that May
JaWhy’s that? Apr
n F eb Mar
Let’s see what continue to do well!
goes wrong here.
124

Robust optimization G edy
re

“Overfit” blog

ctions
200
selection A

#dete
0
Jan Feb M ar Ar
p May
Fi(A) = detections S rate
atu
F1(A)=.5 F (A)=.6 F5 (A)=.02
G edy
re
3
in interval i
#dete s s
#dete ction
ction
F2 (A)=.8 F4(A)=.01
200
200

Optimize 0
Jan Feb Mar A pr M ay
0
worst-case Jan Feb M ar
S rate
atu
Ar
p May
s

Detections using SATURATE
ction

“Robust” blog 200
#dete

selection A* 0
Jan Feb Mar Apr May

Robust optimization  Regularization!
125

Predicting the “hot” blogs

0.25 Greedy on future
Test on future Robust solution
Cascades captured “Cheating”
0.2 Test on future

0.15

0.1 Greedy on historic
Test on future
0.05

0
00 2 4
1000 6 2000
8 10
300012 14
4000

Cost(A) = Number of posts / day

50% better generalization!
126

Other aspects: Complex constraints skip
maxA F(A) or maxA mini Fi(A) subject to
So far: |A| · k
In practice, more complex constraints:
Different costs: C(A) · B

Sensors need to communicate
Locations need to be (form a routing tree)
connected by paths
[Chekuri & Pal, FOCS ’05]
[Singh et al, IJCAI ’07]

Building monitoring
Lake monitoring
127

Naïve approach: Greedy-connect long

Simple heuristic: Greedily optimize submodular utility function F(A)
Then add nodes to minimize communication cost C(A)

50
1 51
OFFICE

2 2
OFFICE 54 9
11
12
QUIET PHONE

15
16

1.5
8
53

relayF(A) =
52 10 13 14

No communication 0.2
CONFERENCE
49
17

1
7

Most
18

F(A) = 4
STORAGE

48
ELEC COPY 5 node
possible! 6
LAB
19

11
informative
2 C(A) 1 3
47

C(A) = 1 = C(A) = 10
4 20
46 21

3
45
2
SERVER
22
44
relay
43
39
KITCHEN

37
1
33
Second
29 27
23

= node C(A) 2 3.5 most informative
35

F(A)2 3.5 =
40 31

efficient
25 24
42 41 34
38 36 32 30 28 26

communication!
Very informative,
Communication cost = Expected # of trials Not very
High communication
(learned using Gaussian Processes)
Want to find optimal tradeoff informative 
cost! 
between information and communication cost
128

The pSPIEL Algorithm
[Krause, Guestrin, Gupta, Kleinberg IPSN 2006]
pSPIEL: Efficient nonmyopic algorithm
(padded Sensor Placements at Informative and cost-
Effective Locations)

1 2 Decompose sensing region
2
1 into small, well-separated
C1
C2 clusters
Solve cardinality constrained
3 2 problem per cluster (greedy)
1 3 C4 C3 Combine solutions using
2 1
k-MST algorithm
129

Submodularity slides

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Submodularity slides

Similaire à Submodularity slides (20)

Dernier

Dernier (20)

Submodularity slides

Notes de l'éditeur