SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
MTH702 Optimization
Nonlinear Optimization
Convex Set
Convex Functions
Convex Sets
A set C is convex if the line segment between any two points of C lies in C, i.e., if for any
x, y ∈ C and any λ with 0 ≤ λ ≤ 1, we have
λx + (1 − λ)y ∈ C.
2 Convex
Figure 2.2 Some simple convex and nonconvex sets. Left. The hexagon,
which includes its boundary (shown darker), is convex. Middle. The kidney
shaped set is not convex, since the line segment between the two points in
*Figure 2.2 from S. Boyd, L. Vandenberghe
Q: Which sets are convex?
2/29
Convex Sets
A set C is convex if the line segment between any two points of C lies in C, i.e., if for any
x, y ∈ C and any λ with 0 ≤ λ ≤ 1, we have
λx + (1 − λ)y ∈ C.
2 Convex
Figure 2.2 Some simple convex and nonconvex sets. Left. The hexagon,
which includes its boundary (shown darker), is convex. Middle. The kidney
shaped set is not convex, since the line segment between the two points in
*Figure 2.2 from S. Boyd, L. Vandenberghe
Q: Which sets are convex?
Left Convex.
Middle Not convex, since line segment not in set.
Right Not convex, since some, but not all boundary points are contained in the set.
2/29
Properties of Convex Sets
Intersections of convex sets are convex
Observation 1.2. Let Ci, i ∈ I be convex sets, where I is a (possibly infinite) index set.
Then C =
T
i∈I Ci is a convex set.
(later) Projections onto convex sets are unique,
and often efficient to compute
PC(x0
) := argminy∈C ky − x0
k
3/29
Properties of Convex Sets
Intersections of convex sets are convex
Observation 1.2. Let Ci, i ∈ I be convex sets, where I is a (possibly infinite) index set.
Then C =
T
i∈I Ci is a convex set.
(later) Projections onto convex sets are unique,
and often efficient to compute
PC(x0
) := argminy∈C ky − x0
k
Q: Is ”convexity” required to have unique projection?
3/29
Properties of Convex Sets
Intersections of convex sets are convex
Observation 1.2. Let Ci, i ∈ I be convex sets, where I is a (possibly infinite) index set.
Then C =
T
i∈I Ci is a convex set.
(later) Projections onto convex sets are unique,
and often efficient to compute
PC(x0
) := argminy∈C ky − x0
k
Q: Is ”convexity” required to have unique projection?
3/29
Convex Functions
Definition
A function f : Rd → R is convex if (i)
dom(f) is a convex set and (ii) for all
x, y ∈ dom(f), and λ with 0 ≤ λ ≤ 1, we
have
f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y).
is convex and concave is affine.
A function is convex if and only if it is convex when restricted to an
intersects its domain. In other words f is convex if and only if for all x ∈ d
(x, f(x))
(y, f(y))
Figure 3.1 Graph of a convex function. The chord (i.e., line segment)
tween any two points on the graph lies above the graph.
*Figure 3.1 from S. Boyd, L. Vandenberghe
Geometrically: The line segment between (x, f(x)) and (y, f(y)) lies above the graph of f.
4/29
Convex Functions & Sets
The graph of a function f : Rd → R is defined as
{(x, f(x)) | x ∈ dom(f)},
The epigraph of a function f : Rd → R is defined as
epi(f) := {(x, α) ∈ Rd+1
| x ∈ dom(f), α ≥ f(x)},
5/29
Convex Functions & Sets
The graph of a function f : Rd → R is defined as
{(x, f(x)) | x ∈ dom(f)},
The epigraph of a function f : Rd → R is defined as
epi(f) := {(x, α) ∈ Rd+1
| x ∈ dom(f), α ≥ f(x)},
Observation 1.4. A function is convex iff its epigraph is a convex set.
epi(f)
x
f(x)
graph of f
epi(f)
f(x)
x
5/29
Sublevel Sets
Let f : Rd → R
Define: f≤α = {x ∈ domf | f(x) ≤ α}
Lemma
If f is convex ⇒ f≤α is convex for any α
6/29
Sublevel Sets
Let f : Rd → R
Define: f≤α = {x ∈ domf | f(x) ≤ α}
Lemma
If f is convex ⇒ f≤α is convex for any α
Q: Does the ⇐ holds as well?
6/29
Sublevel Sets
Let f : Rd → R
Define: f≤α = {x ∈ domf | f(x) ≤ α}
Lemma
If f is convex ⇒ f≤α is convex for any α
Q: Does the ⇐ holds as well?
A: No! x = kxkp for p ∈ (0, 1)
6/29
Quasiconvex Functions
Definition
Function f is quasiconvex (or unimodal) if its domain and all its sublevel sets are convex.
7/29
Quasiconvex Functions
Definition
Function f is quasiconvex (or unimodal) if its domain and all its sublevel sets are convex.
Source: Convex Optimization, S. Boyd and L. Vandenberghe 7/29
Convex Functions
Examples of convex functions
Linear functions: f(x) = a>x
Affine functions: f(x) = a>x + b
Exponential: f(x) = eαx
Norms. Every norm on Rd is convex.
8/29
Convex Functions
Examples of convex functions
Linear functions: f(x) = a>x
Affine functions: f(x) = a>x + b
Exponential: f(x) = eαx
Norms. Every norm on Rd is convex.
Convexity of a norm f(x) Q: How can we show it?
By the triangle inequality f(x + y) ≤ f(x) + f(y) and homogeneity of a norm
f(ax) = |a|f(x) , a scalar:
f(λx + (1 − λ)y) ≤ f(λx) + f((1 − λ)y) = λf(x) + (1 − λ)f(y).
We used the triangle inequality for the inequality and homogeneity for the equality.
8/29
Jensen’s inequality
Lemma (Jensen’s inequality)
Let f be convex, x1, . . . , xm ∈ dom(f), λ1, . . . , λm ∈ R+ such that
Pm
i=1 λi = 1. Then
f
m
X
i=1
λixi
!
≤
m
X
i=1
λif(xi).
For m = 2, this is convexity.
9/29
Convex Functions are Continuous
Lemma 1.6.: Let f be convex and suppose that dom(f) is convex and open. Then f is
continuous.
10/29
Convex Functions are Continuous
Lemma 1.6.: Let f be convex and suppose that dom(f) is convex and open. Then f is
continuous.
Not entirely obvious (try to think about the proof - feel free to send it to me).
10/29
Differentiable Functions
Graph of the affine function f(x) + ∇f(x)>(y − x) is a tangent hyperplane to the graph of f
at (x, f(x)).
x
f(y)
f(x) + ∇f(x)>
(y − x)
y
11/29
First-order characterization of convexity
Lemma ([BV04, 3.1.3])
Suppose that dom(f) is open and that f is differentiable; in particular, the gradient (vector
of partial derivatives)
∇f(x) :=

∂f(x)
∂x1
, . . . ,
∂f(x)
∂xd

exists at every point x ∈ dom(f). Then f is convex if and only if dom(f) is convex and
f(y) ≥ f(x) + ∇f(x)
(y − x) (1)
holds for all x, y ∈ dom(f).
12/29
First-order Characterization of Convexity
f(y) ≥ f(x) + ∇f(x)
(y − x), x, y ∈ dom(f).
Graph of f is above all its tangent hyperplanes.
f(y)
f(x) + ∇f(x)
(y − x)
13/29
Nondifferentiable Functions. . .
are also relevant in practice.
x
f(x) = |x|
0
14/29
Nondifferentiable Functions. . .
are also relevant in practice.
x
f(x) = |x|
0
More generally, f(x) = kxk (Euclidean
norm).
14/29
Nondifferentiable Functions. . .
are also relevant in practice.
x
f(x) = |x|
0
More generally, f(x) = kxk (Euclidean
norm).
For d = 2, graph is the ice cream cone:
14/29
Second-order characterization of convexity
Lemma ([BV04, 3.1.4])
Suppose that dom(f) is open and that f is twice differentiable; in particular, the Hessian
(matrix of second partial derivatives)
∇2
f(x) =







∂2f(x)
∂x1∂x1
∂2f(x)
∂x1∂x2
· · · ∂2f(x)
∂x1∂xd
∂2f(x)
∂x2∂x1
∂2f(x)
∂x2∂x2
· · · ∂2f(x)
∂x2∂xd
.
.
.
.
.
. · · ·
.
.
.
∂2f(x)
∂xd∂x1
∂2f(x)
∂xd∂x2
· · · ∂2f(x)
∂xd∂xd







exists at every point x ∈ dom(f) and is symmetric. Then f is convex if and only if dom(f)
is convex, and for all x ∈ dom(f), we have
∇2
f(x)  0 (i.e. ∇2f(x) is positive semidefinite).
(A symmetric matrix M is positive semidefinite if x
Mx ≥ 0 for all x, and positive definite if x
Mx  0 for
all x 6= 0.)
15/29
Operations that preserve convexity
Lemma (Exercise)
Let f1, f2, . . . , fm be convex functions, λ1, λ2, . . . , λm ∈ R+. Then f :=
Pm
i=1 λifi is convex
on dom(f) :=
Tm
i=1 dom(fi)
16/29
Operations that preserve convexity
Lemma (Exercise)
Let f be a convex function with dom(f) ⊆ Rd, g : Rm → Rd an affine function, meaning
that g(x) = Ax + b, for some matrix A ∈ Rd×m and some vector b ∈ Rd. Then the function
f ◦ g (that maps x to f(Ax + b)) is convex on dom(f ◦ g) := {x ∈ Rm : g(x) ∈ dom(f)}.
17/29
Optimization
Motivation: Convex Optimization
Convex Optimization Problems are of the form
min f(x) s.t. x ∈ C
where both
f is a convex function
C is a convex set (note: Rd
is convex)
19/29
Motivation: Convex Optimization
Convex Optimization Problems are of the form
min f(x) s.t. x ∈ C
where both
f is a convex function
C is a convex set (note: Rd
is convex)
Properties of Convex Optimization Problems
Every local minimum is a global minimum, see next...
19/29
Motivation: Solving Convex Optimization - Provably
For convex optimization problems, all algorithms
Coordinate Descent, Gradient Descent, Stochastic Gradient Descent, Projected and
Proximal Gradient Descent
do converge to the global optimum! (assuming f differentiable)
What is the convergence rate for smooth convex problem (with Lipschitz continous
gradient)?
20/29
Motivation: Solving Convex Optimization - Provably
For convex optimization problems, all algorithms
Coordinate Descent, Gradient Descent, Stochastic Gradient Descent, Projected and
Proximal Gradient Descent
do converge to the global optimum! (assuming f differentiable)
What is the convergence rate for smooth convex problem (with Lipschitz continous
gradient)?
Example Theorem: The convergence rate is proportional to 1/t, i.e.
f(xt) − f(x?
) ≤
c
t
(where x? is some optimal solution to the problem.)
Meaning: Approximation error converges to 0 over time.
20/29
Motivation: Convergence Theory
1.6. Overview of the results and disclaimer 243
f Algorithm Rate # Iter Cost/iter
non-smooth
center of
gravity
exp
!
≠ t
n

n log
!1
Á
 1 Ò,
1 n-dim
s
non-smooth
ellipsoid
method
R
r
exp
!
≠ t
n2

n2
log
! R
rÁ
 1 Ò,
mat-vec ◊
non-smooth Vaidya Rn
r
exp
!
≠ t
n

n log
!Rn
rÁ
 1 Ò,
mat-mat ◊
quadratic CG
exact
exp
!
≠ t
Ÿ
 n
Ÿ log
!1
Á
 1 Ò
non-smooth,
Lipschitz
PGD RL/
Ô
t R2
L2
/Á2 1 Ò,
1 proj.
smooth PGD —R2
/t —R2
/Á
1 Ò,
1 proj.
smooth AGD —R2
/t2
R

—/Á 1 Ò
smooth
(any norm)
FW —R2
/t —R2
/Á
1 Ò,
1 LP
strong.
conv.,
Lipschitz
PGD L2
/(–t) L2
/(–Á)
1 Ò ,
1 proj.
strong.
conv.,
smooth
PGD R2
exp
!
≠ t
Ÿ

Ÿ log
1
R2
Á
2
1 Ò ,
1 proj.
strong.
conv.,
smooth
AGD R2
exp
1
≠ t
Ô
Ÿ
2 Ô
Ÿ log
1
R2
Á
2
1 Ò
f + g,
f smooth, FISTA —R2
/t2
R

—/Á
1 Ò of f
Prox of g
Lipschitz 1 proj.
smooth PGD —R2
/t —R2
/Á
1 Ò,
1 proj.
smooth AGD —R2
/t2
R

—/Á 1 Ò
smooth
(any norm)
FW —R2
/t —R2
/Á
1 Ò,
1 LP
strong.
conv.,
Lipschitz
PGD L2
/(–t) L2
/(–Á)
1 Ò ,
1 proj.
strong.
conv.,
smooth
PGD R2
exp
!
≠ t
Ÿ

Ÿ log
1
R2
Á
2
1 Ò ,
1 proj.
strong.
conv.,
smooth
AGD R2
exp
1
≠ t
Ô
Ÿ
2 Ô
Ÿ log
1
R2
Á
2
1 Ò
f + g,
f smooth,
g simple
FISTA —R2
/t2
R

—/Á
1 Ò of f
Prox of g
max
yœY
Ï(x, y),
Ï smooth
SP-MP —R2
/t —R2
/Á
MD on X
MD on Y
linear,
X with F
‹-self-conc.
IPM ‹ exp
1
≠ t
Ô
‹
2 Ô
‹ log
!‹
Á
 Newton
step on F
non-smooth SGD BL/
Ô
t B2
L2
/Á2 1 stoch. Ò,
1 proj.
non-smooth,
strong. conv.
SGD B2
/(–t) B2
/(–Á)
1 stoch. Ò,
1 proj.
f = 1
m
q
fi
fi smooth
strong. conv.
SVRG – (m + Ÿ) log
!1
Á

1 stoch. Ò
Table 1.1: Summary of the results proved in Chapter 2 to Chapter 5 and some of
the results in Chapter 6.
(Bubeck) [Bub15]
21/29
Local Minima are Global Minima
Definition
A local minimum of f : dom(f) → R is a point x such that there exists ε  0 with
f(x) ≤ f(y) ∀y ∈ dom(f) satisfying ky − xk  ε.
22/29
Local Minima are Global Minima
Definition
A local minimum of f : dom(f) → R is a point x such that there exists ε  0 with
f(x) ≤ f(y) ∀y ∈ dom(f) satisfying ky − xk  ε.
Lemma
Let x? be a local minimum of a convex function f : dom(f) → R. Then x? is a global
minimum, meaning that
f(x?
) ≤ f(y) ∀y ∈ dom(f)/
Proof.
Q: How could we prove it?
22/29
Local Minima are Global Minima
Definition
A local minimum of f : dom(f) → R is a point x such that there exists ε  0 with
f(x) ≤ f(y) ∀y ∈ dom(f) satisfying ky − xk  ε.
Lemma
Let x? be a local minimum of a convex function f : dom(f) → R. Then x? is a global
minimum, meaning that
f(x?
) ≤ f(y) ∀y ∈ dom(f)/
Proof.
Q: How could we prove it? Suppose there exists y ∈ dom(f) such that f(y)  f(x?) and
define y0 := λx? + (1 − λ)y for λ ∈ (0, 1). From convexity, we get that that f(y0)  f(x?).
Choosing λ so close to 1 that ky0 − x?k  ε yields a contradiction to x? being a local
minimum.
22/29
Critical Points are Global Minima
Lemma
Suppose that f is convex and differentiable over an open domain dom(f). Let x ∈ dom(f).
If ∇f(x) = 0 (critical point), then x is a global minimum.
Proof.
Q: How could we prove it?
23/29
Critical Points are Global Minima
Lemma
Suppose that f is convex and differentiable over an open domain dom(f). Let x ∈ dom(f).
If ∇f(x) = 0 (critical point), then x is a global minimum.
Proof.
Q: How could we prove it?
Suppose that ∇f(x) = 0. According to our Lemma on the first-order characterization of
convexity, we have
f(y) ≥ f(x) + ∇f(x)
(y − x)
23/29
Critical Points are Global Minima
Lemma
Suppose that f is convex and differentiable over an open domain dom(f). Let x ∈ dom(f).
If ∇f(x) = 0 (critical point), then x is a global minimum.
Proof.
Q: How could we prove it?
Suppose that ∇f(x) = 0. According to our Lemma on the first-order characterization of
convexity, we have
f(y) ≥ f(x) + ∇f(x)
(y − x) = f(x)
for all y ∈ dom(f), so x is a global minimum.
Geometrically, tangent hyperplane is horizontal at x.
23/29
Strictly Convex Functions
Definition ([BV04, 3.1.1])
A function f : dom(f) → R is strictly convex if (i) dom(f) is convex and (ii) for all
x 6= y ∈ dom(f) and all λ ∈ (0, 1), we have
f(λx + (1 − λ)y)  λf(x) + (1 − λ)f(y). (2)
24/29
Strictly Convex Functions
Definition ([BV04, 3.1.1])
A function f : dom(f) → R is strictly convex if (i) dom(f) is convex and (ii) for all
x 6= y ∈ dom(f) and all λ ∈ (0, 1), we have
f(λx + (1 − λ)y)  λf(x) + (1 − λ)f(y). (2)
convex, but not strictly convex strictly convex
24/29
Strictly Convex Functions
Definition ([BV04, 3.1.1])
A function f : dom(f) → R is strictly convex if (i) dom(f) is convex and (ii) for all
x 6= y ∈ dom(f) and all λ ∈ (0, 1), we have
f(λx + (1 − λ)y)  λf(x) + (1 − λ)f(y). (2)
convex, but not strictly convex strictly convex
Lemma
Let f : dom(f) → R be strictly convex. Then f has at most one global minimum.
Q: Why?
24/29
Constrained Minimization
Definition
Let f : dom(f) → R be convex and let X ⊆ dom(f) be a convex set. A point x ∈ X is a
minimizer of f over X if
f(x) ≤ f(y) ∀y ∈ X.
25/29
Constrained Minimization
Definition
Let f : dom(f) → R be convex and let X ⊆ dom(f) be a convex set. A point x ∈ X is a
minimizer of f over X if
f(x) ≤ f(y) ∀y ∈ X.
Lemma
Suppose that f : dom(f) → R is convex and differentiable over an open domain
dom(f) ⊆ Rd, and let X ⊆ dom(f) be a convex set. Point x? ∈ X is a minimizer of f over
X if and only if
∇f(x?
)
(x − x?
) ≥ 0 ∀x ∈ X.
25/29
Constrained Minimization
x?
∇f(x?
)
x
X
∇f(x?
)
(x − x?
) ≥ 0
26/29
Existence of a minimizer
How do we know that a global minimum exists?
27/29
Existence of a minimizer
How do we know that a global minimum exists?
Not necessarily the case, even if f bounded from below Q: Why?
27/29
Existence of a minimizer
How do we know that a global minimum exists?
Not necessarily the case, even if f bounded from below Q: Why? (f(x) = ex)
27/29
Existence of a minimizer
How do we know that a global minimum exists?
Not necessarily the case, even if f bounded from below Q: Why? (f(x) = ex)
Definition
f : Rd → R, α ∈ R. The set f≤α := {x ∈ Rd : f(x) ≤ α} is the α-sublevel set of f
27/29
Existence of a minimizer
How do we know that a global minimum exists?
Not necessarily the case, even if f bounded from below Q: Why? (f(x) = ex)
Definition
f : Rd → R, α ∈ R. The set f≤α := {x ∈ Rd : f(x) ≤ α} is the α-sublevel set of f
α
f≤α
f≤α
f≤α
27/29
The Weierstrass Theorem
Theorem
Let f : Rd → R be a convex function, and suppose there is a nonempty and bounded sublevel
set f≤α. Then f has a global minimum.
28/29
The Weierstrass Theorem
Theorem
Let f : Rd → R be a convex function, and suppose there is a nonempty and bounded sublevel
set f≤α. Then f has a global minimum.
Proof:
We know that f—as a continuous function—attains a minimum over the closed and bounded
(= compact) set f≤α at some x?. This x? is also a global minimum as it has value
f(x?) ≤ α, while any x /
∈ f≤α has value f(x)  α ≥ f(x?).
28/29
The Weierstrass Theorem
Theorem
Let f : Rd → R be a convex function, and suppose there is a nonempty and bounded sublevel
set f≤α. Then f has a global minimum.
Proof:
We know that f—as a continuous function—attains a minimum over the closed and bounded
(= compact) set f≤α at some x?. This x? is also a global minimum as it has value
f(x?) ≤ α, while any x /
∈ f≤α has value f(x)  α ≥ f(x?).
Generalizes to suitable domains dom(f) 6= Rd.
28/29
Bibliography
Sébastien Bubeck.
Convex Optimization: Algorithms and Complexity.
Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.
Stephen Boyd and Lieven Vandenberghe.
Convex Optimization.
Cambridge University Press, New York, NY, USA, 2004.
Thanks also to Prof. Martin Jaggi and Prof. Mark Schmidt for their slides and lectures.
29/29
mbzuai.ac.ae
Mohamed bin Zayed
University of Artificial Intelligence
Masdar City
Abu Dhabi
United Arab Emirates

Contenu connexe

Similaire à Optimization introduction

Basics of Integration and Derivatives
Basics of Integration and DerivativesBasics of Integration and Derivatives
Basics of Integration and DerivativesFaisal Waqar
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeTasuku Soma
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Avichai Cohen
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Katsuya Ito
 
Engg. mathematics iii
Engg. mathematics iiiEngg. mathematics iii
Engg. mathematics iiimanoj302009
 
Calculus- Basics
Calculus- BasicsCalculus- Basics
Calculus- BasicsRabin BK
 
Functions of several variables.pdf
Functions of several variables.pdfFunctions of several variables.pdf
Functions of several variables.pdfFaisalMehmood887349
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Another possibility
Another possibilityAnother possibility
Another possibilityCss Founder
 
Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theoryHeinrich Hartmann
 

Similaire à Optimization introduction (20)

Cse41
Cse41Cse41
Cse41
 
Contour
ContourContour
Contour
 
Basics of Integration and Derivatives
Basics of Integration and DerivativesBasics of Integration and Derivatives
Basics of Integration and Derivatives
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
Thesis
ThesisThesis
Thesis
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
 
Unit1
Unit1Unit1
Unit1
 
Engg. mathematics iii
Engg. mathematics iiiEngg. mathematics iii
Engg. mathematics iii
 
math camp
math campmath camp
math camp
 
Calculus- Basics
Calculus- BasicsCalculus- Basics
Calculus- Basics
 
Functions of several variables.pdf
Functions of several variables.pdfFunctions of several variables.pdf
Functions of several variables.pdf
 
The integral
The integralThe integral
The integral
 
R lecture co4_math 21-1
R lecture co4_math 21-1R lecture co4_math 21-1
R lecture co4_math 21-1
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Another possibility
Another possibilityAnother possibility
Another possibility
 
Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theory
 

Dernier

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 

Dernier (20)

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 

Optimization introduction

  • 3. Convex Sets A set C is convex if the line segment between any two points of C lies in C, i.e., if for any x, y ∈ C and any λ with 0 ≤ λ ≤ 1, we have λx + (1 − λ)y ∈ C. 2 Convex Figure 2.2 Some simple convex and nonconvex sets. Left. The hexagon, which includes its boundary (shown darker), is convex. Middle. The kidney shaped set is not convex, since the line segment between the two points in *Figure 2.2 from S. Boyd, L. Vandenberghe Q: Which sets are convex? 2/29
  • 4. Convex Sets A set C is convex if the line segment between any two points of C lies in C, i.e., if for any x, y ∈ C and any λ with 0 ≤ λ ≤ 1, we have λx + (1 − λ)y ∈ C. 2 Convex Figure 2.2 Some simple convex and nonconvex sets. Left. The hexagon, which includes its boundary (shown darker), is convex. Middle. The kidney shaped set is not convex, since the line segment between the two points in *Figure 2.2 from S. Boyd, L. Vandenberghe Q: Which sets are convex? Left Convex. Middle Not convex, since line segment not in set. Right Not convex, since some, but not all boundary points are contained in the set. 2/29
  • 5. Properties of Convex Sets Intersections of convex sets are convex Observation 1.2. Let Ci, i ∈ I be convex sets, where I is a (possibly infinite) index set. Then C = T i∈I Ci is a convex set. (later) Projections onto convex sets are unique, and often efficient to compute PC(x0 ) := argminy∈C ky − x0 k 3/29
  • 6. Properties of Convex Sets Intersections of convex sets are convex Observation 1.2. Let Ci, i ∈ I be convex sets, where I is a (possibly infinite) index set. Then C = T i∈I Ci is a convex set. (later) Projections onto convex sets are unique, and often efficient to compute PC(x0 ) := argminy∈C ky − x0 k Q: Is ”convexity” required to have unique projection? 3/29
  • 7. Properties of Convex Sets Intersections of convex sets are convex Observation 1.2. Let Ci, i ∈ I be convex sets, where I is a (possibly infinite) index set. Then C = T i∈I Ci is a convex set. (later) Projections onto convex sets are unique, and often efficient to compute PC(x0 ) := argminy∈C ky − x0 k Q: Is ”convexity” required to have unique projection? 3/29
  • 8. Convex Functions Definition A function f : Rd → R is convex if (i) dom(f) is a convex set and (ii) for all x, y ∈ dom(f), and λ with 0 ≤ λ ≤ 1, we have f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y). is convex and concave is affine. A function is convex if and only if it is convex when restricted to an intersects its domain. In other words f is convex if and only if for all x ∈ d (x, f(x)) (y, f(y)) Figure 3.1 Graph of a convex function. The chord (i.e., line segment) tween any two points on the graph lies above the graph. *Figure 3.1 from S. Boyd, L. Vandenberghe Geometrically: The line segment between (x, f(x)) and (y, f(y)) lies above the graph of f. 4/29
  • 9. Convex Functions & Sets The graph of a function f : Rd → R is defined as {(x, f(x)) | x ∈ dom(f)}, The epigraph of a function f : Rd → R is defined as epi(f) := {(x, α) ∈ Rd+1 | x ∈ dom(f), α ≥ f(x)}, 5/29
  • 10. Convex Functions & Sets The graph of a function f : Rd → R is defined as {(x, f(x)) | x ∈ dom(f)}, The epigraph of a function f : Rd → R is defined as epi(f) := {(x, α) ∈ Rd+1 | x ∈ dom(f), α ≥ f(x)}, Observation 1.4. A function is convex iff its epigraph is a convex set. epi(f) x f(x) graph of f epi(f) f(x) x 5/29
  • 11. Sublevel Sets Let f : Rd → R Define: f≤α = {x ∈ domf | f(x) ≤ α} Lemma If f is convex ⇒ f≤α is convex for any α 6/29
  • 12. Sublevel Sets Let f : Rd → R Define: f≤α = {x ∈ domf | f(x) ≤ α} Lemma If f is convex ⇒ f≤α is convex for any α Q: Does the ⇐ holds as well? 6/29
  • 13. Sublevel Sets Let f : Rd → R Define: f≤α = {x ∈ domf | f(x) ≤ α} Lemma If f is convex ⇒ f≤α is convex for any α Q: Does the ⇐ holds as well? A: No! x = kxkp for p ∈ (0, 1) 6/29
  • 14. Quasiconvex Functions Definition Function f is quasiconvex (or unimodal) if its domain and all its sublevel sets are convex. 7/29
  • 15. Quasiconvex Functions Definition Function f is quasiconvex (or unimodal) if its domain and all its sublevel sets are convex. Source: Convex Optimization, S. Boyd and L. Vandenberghe 7/29
  • 16. Convex Functions Examples of convex functions Linear functions: f(x) = a>x Affine functions: f(x) = a>x + b Exponential: f(x) = eαx Norms. Every norm on Rd is convex. 8/29
  • 17. Convex Functions Examples of convex functions Linear functions: f(x) = a>x Affine functions: f(x) = a>x + b Exponential: f(x) = eαx Norms. Every norm on Rd is convex. Convexity of a norm f(x) Q: How can we show it? By the triangle inequality f(x + y) ≤ f(x) + f(y) and homogeneity of a norm f(ax) = |a|f(x) , a scalar: f(λx + (1 − λ)y) ≤ f(λx) + f((1 − λ)y) = λf(x) + (1 − λ)f(y). We used the triangle inequality for the inequality and homogeneity for the equality. 8/29
  • 18. Jensen’s inequality Lemma (Jensen’s inequality) Let f be convex, x1, . . . , xm ∈ dom(f), λ1, . . . , λm ∈ R+ such that Pm i=1 λi = 1. Then f m X i=1 λixi ! ≤ m X i=1 λif(xi). For m = 2, this is convexity. 9/29
  • 19. Convex Functions are Continuous Lemma 1.6.: Let f be convex and suppose that dom(f) is convex and open. Then f is continuous. 10/29
  • 20. Convex Functions are Continuous Lemma 1.6.: Let f be convex and suppose that dom(f) is convex and open. Then f is continuous. Not entirely obvious (try to think about the proof - feel free to send it to me). 10/29
  • 21. Differentiable Functions Graph of the affine function f(x) + ∇f(x)>(y − x) is a tangent hyperplane to the graph of f at (x, f(x)). x f(y) f(x) + ∇f(x)> (y − x) y 11/29
  • 22. First-order characterization of convexity Lemma ([BV04, 3.1.3]) Suppose that dom(f) is open and that f is differentiable; in particular, the gradient (vector of partial derivatives) ∇f(x) := ∂f(x) ∂x1 , . . . , ∂f(x) ∂xd exists at every point x ∈ dom(f). Then f is convex if and only if dom(f) is convex and f(y) ≥ f(x) + ∇f(x) (y − x) (1) holds for all x, y ∈ dom(f). 12/29
  • 23. First-order Characterization of Convexity f(y) ≥ f(x) + ∇f(x) (y − x), x, y ∈ dom(f). Graph of f is above all its tangent hyperplanes. f(y) f(x) + ∇f(x) (y − x) 13/29
  • 24. Nondifferentiable Functions. . . are also relevant in practice. x f(x) = |x| 0 14/29
  • 25. Nondifferentiable Functions. . . are also relevant in practice. x f(x) = |x| 0 More generally, f(x) = kxk (Euclidean norm). 14/29
  • 26. Nondifferentiable Functions. . . are also relevant in practice. x f(x) = |x| 0 More generally, f(x) = kxk (Euclidean norm). For d = 2, graph is the ice cream cone: 14/29
  • 27. Second-order characterization of convexity Lemma ([BV04, 3.1.4]) Suppose that dom(f) is open and that f is twice differentiable; in particular, the Hessian (matrix of second partial derivatives) ∇2 f(x) =        ∂2f(x) ∂x1∂x1 ∂2f(x) ∂x1∂x2 · · · ∂2f(x) ∂x1∂xd ∂2f(x) ∂x2∂x1 ∂2f(x) ∂x2∂x2 · · · ∂2f(x) ∂x2∂xd . . . . . . · · · . . . ∂2f(x) ∂xd∂x1 ∂2f(x) ∂xd∂x2 · · · ∂2f(x) ∂xd∂xd        exists at every point x ∈ dom(f) and is symmetric. Then f is convex if and only if dom(f) is convex, and for all x ∈ dom(f), we have ∇2 f(x) 0 (i.e. ∇2f(x) is positive semidefinite). (A symmetric matrix M is positive semidefinite if x Mx ≥ 0 for all x, and positive definite if x Mx 0 for all x 6= 0.) 15/29
  • 28. Operations that preserve convexity Lemma (Exercise) Let f1, f2, . . . , fm be convex functions, λ1, λ2, . . . , λm ∈ R+. Then f := Pm i=1 λifi is convex on dom(f) := Tm i=1 dom(fi) 16/29
  • 29. Operations that preserve convexity Lemma (Exercise) Let f be a convex function with dom(f) ⊆ Rd, g : Rm → Rd an affine function, meaning that g(x) = Ax + b, for some matrix A ∈ Rd×m and some vector b ∈ Rd. Then the function f ◦ g (that maps x to f(Ax + b)) is convex on dom(f ◦ g) := {x ∈ Rm : g(x) ∈ dom(f)}. 17/29
  • 31. Motivation: Convex Optimization Convex Optimization Problems are of the form min f(x) s.t. x ∈ C where both f is a convex function C is a convex set (note: Rd is convex) 19/29
  • 32. Motivation: Convex Optimization Convex Optimization Problems are of the form min f(x) s.t. x ∈ C where both f is a convex function C is a convex set (note: Rd is convex) Properties of Convex Optimization Problems Every local minimum is a global minimum, see next... 19/29
  • 33. Motivation: Solving Convex Optimization - Provably For convex optimization problems, all algorithms Coordinate Descent, Gradient Descent, Stochastic Gradient Descent, Projected and Proximal Gradient Descent do converge to the global optimum! (assuming f differentiable) What is the convergence rate for smooth convex problem (with Lipschitz continous gradient)? 20/29
  • 34. Motivation: Solving Convex Optimization - Provably For convex optimization problems, all algorithms Coordinate Descent, Gradient Descent, Stochastic Gradient Descent, Projected and Proximal Gradient Descent do converge to the global optimum! (assuming f differentiable) What is the convergence rate for smooth convex problem (with Lipschitz continous gradient)? Example Theorem: The convergence rate is proportional to 1/t, i.e. f(xt) − f(x? ) ≤ c t (where x? is some optimal solution to the problem.) Meaning: Approximation error converges to 0 over time. 20/29
  • 35. Motivation: Convergence Theory 1.6. Overview of the results and disclaimer 243 f Algorithm Rate # Iter Cost/iter non-smooth center of gravity exp ! ≠ t n n log !1 Á 1 Ò, 1 n-dim s non-smooth ellipsoid method R r exp ! ≠ t n2 n2 log ! R rÁ 1 Ò, mat-vec ◊ non-smooth Vaidya Rn r exp ! ≠ t n n log !Rn rÁ 1 Ò, mat-mat ◊ quadratic CG exact exp ! ≠ t Ÿ n Ÿ log !1 Á 1 Ò non-smooth, Lipschitz PGD RL/ Ô t R2 L2 /Á2 1 Ò, 1 proj. smooth PGD —R2 /t —R2 /Á 1 Ò, 1 proj. smooth AGD —R2 /t2 R  —/Á 1 Ò smooth (any norm) FW —R2 /t —R2 /Á 1 Ò, 1 LP strong. conv., Lipschitz PGD L2 /(–t) L2 /(–Á) 1 Ò , 1 proj. strong. conv., smooth PGD R2 exp ! ≠ t Ÿ Ÿ log 1 R2 Á 2 1 Ò , 1 proj. strong. conv., smooth AGD R2 exp 1 ≠ t Ô Ÿ 2 Ô Ÿ log 1 R2 Á 2 1 Ò f + g, f smooth, FISTA —R2 /t2 R  —/Á 1 Ò of f Prox of g Lipschitz 1 proj. smooth PGD —R2 /t —R2 /Á 1 Ò, 1 proj. smooth AGD —R2 /t2 R  —/Á 1 Ò smooth (any norm) FW —R2 /t —R2 /Á 1 Ò, 1 LP strong. conv., Lipschitz PGD L2 /(–t) L2 /(–Á) 1 Ò , 1 proj. strong. conv., smooth PGD R2 exp ! ≠ t Ÿ Ÿ log 1 R2 Á 2 1 Ò , 1 proj. strong. conv., smooth AGD R2 exp 1 ≠ t Ô Ÿ 2 Ô Ÿ log 1 R2 Á 2 1 Ò f + g, f smooth, g simple FISTA —R2 /t2 R  —/Á 1 Ò of f Prox of g max yœY Ï(x, y), Ï smooth SP-MP —R2 /t —R2 /Á MD on X MD on Y linear, X with F ‹-self-conc. IPM ‹ exp 1 ≠ t Ô ‹ 2 Ô ‹ log !‹ Á Newton step on F non-smooth SGD BL/ Ô t B2 L2 /Á2 1 stoch. Ò, 1 proj. non-smooth, strong. conv. SGD B2 /(–t) B2 /(–Á) 1 stoch. Ò, 1 proj. f = 1 m q fi fi smooth strong. conv. SVRG – (m + Ÿ) log !1 Á 1 stoch. Ò Table 1.1: Summary of the results proved in Chapter 2 to Chapter 5 and some of the results in Chapter 6. (Bubeck) [Bub15] 21/29
  • 36. Local Minima are Global Minima Definition A local minimum of f : dom(f) → R is a point x such that there exists ε 0 with f(x) ≤ f(y) ∀y ∈ dom(f) satisfying ky − xk ε. 22/29
  • 37. Local Minima are Global Minima Definition A local minimum of f : dom(f) → R is a point x such that there exists ε 0 with f(x) ≤ f(y) ∀y ∈ dom(f) satisfying ky − xk ε. Lemma Let x? be a local minimum of a convex function f : dom(f) → R. Then x? is a global minimum, meaning that f(x? ) ≤ f(y) ∀y ∈ dom(f)/ Proof. Q: How could we prove it? 22/29
  • 38. Local Minima are Global Minima Definition A local minimum of f : dom(f) → R is a point x such that there exists ε 0 with f(x) ≤ f(y) ∀y ∈ dom(f) satisfying ky − xk ε. Lemma Let x? be a local minimum of a convex function f : dom(f) → R. Then x? is a global minimum, meaning that f(x? ) ≤ f(y) ∀y ∈ dom(f)/ Proof. Q: How could we prove it? Suppose there exists y ∈ dom(f) such that f(y) f(x?) and define y0 := λx? + (1 − λ)y for λ ∈ (0, 1). From convexity, we get that that f(y0) f(x?). Choosing λ so close to 1 that ky0 − x?k ε yields a contradiction to x? being a local minimum. 22/29
  • 39. Critical Points are Global Minima Lemma Suppose that f is convex and differentiable over an open domain dom(f). Let x ∈ dom(f). If ∇f(x) = 0 (critical point), then x is a global minimum. Proof. Q: How could we prove it? 23/29
  • 40. Critical Points are Global Minima Lemma Suppose that f is convex and differentiable over an open domain dom(f). Let x ∈ dom(f). If ∇f(x) = 0 (critical point), then x is a global minimum. Proof. Q: How could we prove it? Suppose that ∇f(x) = 0. According to our Lemma on the first-order characterization of convexity, we have f(y) ≥ f(x) + ∇f(x) (y − x) 23/29
  • 41. Critical Points are Global Minima Lemma Suppose that f is convex and differentiable over an open domain dom(f). Let x ∈ dom(f). If ∇f(x) = 0 (critical point), then x is a global minimum. Proof. Q: How could we prove it? Suppose that ∇f(x) = 0. According to our Lemma on the first-order characterization of convexity, we have f(y) ≥ f(x) + ∇f(x) (y − x) = f(x) for all y ∈ dom(f), so x is a global minimum. Geometrically, tangent hyperplane is horizontal at x. 23/29
  • 42. Strictly Convex Functions Definition ([BV04, 3.1.1]) A function f : dom(f) → R is strictly convex if (i) dom(f) is convex and (ii) for all x 6= y ∈ dom(f) and all λ ∈ (0, 1), we have f(λx + (1 − λ)y) λf(x) + (1 − λ)f(y). (2) 24/29
  • 43. Strictly Convex Functions Definition ([BV04, 3.1.1]) A function f : dom(f) → R is strictly convex if (i) dom(f) is convex and (ii) for all x 6= y ∈ dom(f) and all λ ∈ (0, 1), we have f(λx + (1 − λ)y) λf(x) + (1 − λ)f(y). (2) convex, but not strictly convex strictly convex 24/29
  • 44. Strictly Convex Functions Definition ([BV04, 3.1.1]) A function f : dom(f) → R is strictly convex if (i) dom(f) is convex and (ii) for all x 6= y ∈ dom(f) and all λ ∈ (0, 1), we have f(λx + (1 − λ)y) λf(x) + (1 − λ)f(y). (2) convex, but not strictly convex strictly convex Lemma Let f : dom(f) → R be strictly convex. Then f has at most one global minimum. Q: Why? 24/29
  • 45. Constrained Minimization Definition Let f : dom(f) → R be convex and let X ⊆ dom(f) be a convex set. A point x ∈ X is a minimizer of f over X if f(x) ≤ f(y) ∀y ∈ X. 25/29
  • 46. Constrained Minimization Definition Let f : dom(f) → R be convex and let X ⊆ dom(f) be a convex set. A point x ∈ X is a minimizer of f over X if f(x) ≤ f(y) ∀y ∈ X. Lemma Suppose that f : dom(f) → R is convex and differentiable over an open domain dom(f) ⊆ Rd, and let X ⊆ dom(f) be a convex set. Point x? ∈ X is a minimizer of f over X if and only if ∇f(x? ) (x − x? ) ≥ 0 ∀x ∈ X. 25/29
  • 48. Existence of a minimizer How do we know that a global minimum exists? 27/29
  • 49. Existence of a minimizer How do we know that a global minimum exists? Not necessarily the case, even if f bounded from below Q: Why? 27/29
  • 50. Existence of a minimizer How do we know that a global minimum exists? Not necessarily the case, even if f bounded from below Q: Why? (f(x) = ex) 27/29
  • 51. Existence of a minimizer How do we know that a global minimum exists? Not necessarily the case, even if f bounded from below Q: Why? (f(x) = ex) Definition f : Rd → R, α ∈ R. The set f≤α := {x ∈ Rd : f(x) ≤ α} is the α-sublevel set of f 27/29
  • 52. Existence of a minimizer How do we know that a global minimum exists? Not necessarily the case, even if f bounded from below Q: Why? (f(x) = ex) Definition f : Rd → R, α ∈ R. The set f≤α := {x ∈ Rd : f(x) ≤ α} is the α-sublevel set of f α f≤α f≤α f≤α 27/29
  • 53. The Weierstrass Theorem Theorem Let f : Rd → R be a convex function, and suppose there is a nonempty and bounded sublevel set f≤α. Then f has a global minimum. 28/29
  • 54. The Weierstrass Theorem Theorem Let f : Rd → R be a convex function, and suppose there is a nonempty and bounded sublevel set f≤α. Then f has a global minimum. Proof: We know that f—as a continuous function—attains a minimum over the closed and bounded (= compact) set f≤α at some x?. This x? is also a global minimum as it has value f(x?) ≤ α, while any x / ∈ f≤α has value f(x) α ≥ f(x?). 28/29
  • 55. The Weierstrass Theorem Theorem Let f : Rd → R be a convex function, and suppose there is a nonempty and bounded sublevel set f≤α. Then f has a global minimum. Proof: We know that f—as a continuous function—attains a minimum over the closed and bounded (= compact) set f≤α at some x?. This x? is also a global minimum as it has value f(x?) ≤ α, while any x / ∈ f≤α has value f(x) α ≥ f(x?). Generalizes to suitable domains dom(f) 6= Rd. 28/29
  • 56. Bibliography Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY, USA, 2004. Thanks also to Prof. Martin Jaggi and Prof. Mark Schmidt for their slides and lectures. 29/29
  • 57. mbzuai.ac.ae Mohamed bin Zayed University of Artificial Intelligence Masdar City Abu Dhabi United Arab Emirates