SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
C OMPUTER V ISION : L EAST S QUARES M INIMIZATION


                                 IIT Kharagpur


                     Computer Science and Engineering,
                       Indian Institute of Technology
                                Kharagpur.




   (IIT Kharagpur)                Minimization           Jan ’10   1 / 35
Solution of Linear equations
Consider a system of equations of the form Ax = b. Let A be an m × n
matrix.
    If m < n there are more unknowns than equations. In this case
    there will not be a unique solution, but rather a vector space of
    solutions.
    If m = n there will be a unique solution as long as A is invertible.
    If m > n there will be more equations than unknowns. In general
    the system will not have a solution.




      (IIT Kharagpur)           Minimization                    Jan ’10    2 / 35
Least-squares solution                                  Full rank case
Consider the case m > n and assume that A is of rank n. We seek a
vector x that is closest to providing a solution to the system Ax = b.
    We seek x such that ||Ax − b|| is minimized. Such an x is known as
    the least squares solution to the over-determined system.
    We seek x that minimizes ||Ax − b|| = ||UDV T x − b||
    Because of the norm preserving property of orthogonal
    transforms,
                     ||UDV T x − b|| = ||DV T x − U T b||
    Writing y = V T x and b = UT b the problem becomes one of
    minimizing ||Dy − b || where D is a diagonal matrix.




      (IIT Kharagpur)           Minimization                   Jan ’10   3 / 35
b1
                                                          
                                                          
                d1
                                                        
                                                     b2
                                             
                                                          
                                                           
                                                          
                                                      .
                                
                    d2
                                                        
                                                      .
                                          
                                                          
                                                           
                                      y1            .
                                                        
                         ..
               
                                
                                             
                                                          
                                                           
                            .
               
                                
                                                       
                                    y2 
                                                        
                                                     bn
               
                                
                                           
                                                          
                                                           
                                      . 
                                       
                                           =             
                              dn     . 
                                            
                                                          
                                                           
               
               
               
               
                                 
                                 
                                 
                                     . 
                                         
                                              
                                              
                                              
                                                           
                                                           
                                                           
                                                      
                                                   bn+1
                                                      
                                       yn
               
                                
                                           
                                                          
                                                           
                                                     .
                                                          
                          0
                                                        
                                                     .
               
                                
                                                         
                                                     .
               
                                
                                             
                                                          
                                                           
                                              
                                                          
                                                           
                                              
                                                          
                                                           
                                                     bm

The nearest Dy can approach to b is the vector
(b1 , b2 , . . . , bn , 0, . . . , . . . , 0)T
This is achieved by setting yi = bi /di for i = 1, . . . , n
The assumption rank A = n ensures that di              0
Finally x is retrieved from x = Vy.


 (IIT Kharagpur)               Minimization                    Jan ’10   4 / 35
Algorithm                                                     Least Squares

 Objective:

 Find the least-squares solution to the m × n set of equations Ax = b,
 where m > n and rank A= n.

 Algorithm:

  (i) Find the SVD A = UDV T
  (ii) Set b = U T b
  (iii) Find the vector y defined yi = bi /di , where di is the i th diagonal
        entry of D
  (iv) The solution is x = Vy




     (IIT Kharagpur)               Minimization                         Jan ’10   5 / 35
Pseudo Inverse
    Given a square diagonal matrix D, we define its pseudo-inverse to
    be the diagonal matrix D+ such that

                         +         0    if Dii = 0
                        Dn =         −1 otherwise
                                   Dii

    For an m × n matrix A with m ≥ n, let the SVD of A = UDV T . The
    pseudo-inverse of matrix A is

                               A
                                   +
                                       = VD+ U T


The least-squares solution to an m × n system of equations Ax = b of
rank n is given by x = A+ b. In the case of a deficient-rank system,
x = A+ b is the solution that minimizes ||x||.



      (IIT Kharagpur)          Minimization                  Jan ’10   6 / 35
Linear least-squares using normal equations
Consider a system of equations of the form Ax = b. Let A be an m × n
matrix. m > n.
    In general, no solution x will exist for this set of equations.
    Consequently, the task is to find the vector x that minimizes the
    norm ||Ax − b||.
    As the vector x varies over all values, the product Ax varies over
    the complete column space of A, i.e. the subspace of Rm spanned
    by the columns of A.
    The task is to find the closest vector to b that lies in the column
    space of A.




      (IIT Kharagpur)           Minimization                   Jan ’10   7 / 35
Linear least-squares using normal equations
    Let x be the solution to this problem.
    Thus Ax is the closest point to b. In this case, the difference
    Ax − b must be orthogonal to the column space of A.
    This means that Ax − b is perpendicular to each of the columns of
    A, hence
                           T
                       A       (Ax − b) = 0                (A T A)x = A T b
    The solution is given as:

                           x = (A T A)−1 A T b

                           x = A+ b                A
                                                       +
                                                           = (A T A)−1 A T

The pseudo-inverse of matrix A using SVD is given as

                                      A
                                          +
                                              = VD+ U T

     (IIT Kharagpur)                      Minimization                        Jan ’10   8 / 35
Least-squares solution of
homogeneous equations
Solving a set of equations of the form Ax = 0.
    x has the homogeneous representation. Hence if x is a solution,
    then k x is also a solution.
    A reasonable constraint would be to seek a solution for which
    ||x|| = 1
    In general, such a set of equations will not have an exact solution.
    The problem is to find x that minimizes ||Ax|| subject to ||x|| = 1




      (IIT Kharagpur)           Minimization                   Jan ’10   9 / 35
Least-squares solution of
homogeneous equations
 Let A = UDV T
 We need to minimize ||UDV T x||.
 Note that ||UDV T x|| = ||DV T x|| so we need to minimize ||DV T x||
 Note that ||x|| = ||V T x|| so we have the condition that ||V T x|| = 1
 Let y = V T x, so we minimize ||Dy|| subject to ||y || = 1
                                                                     0 
                                                                     
  Since D is a diagonal matrix with its diagonal                     
                                                                     0 
                                                                     
                                                                     
  entries in descending order.
                                                             y= . 
                                                                     
                                                                     . 
                                                                     
                                                                     
  It follows that the solution to this problem is                    . 
                                                                     
                                                                     
                T x, x = Vy is simply the last
                                                                     0 
                                                                     
  Since y = V
                                                                     
                                                                     
                                                                      1
                                                                     
  column of V



     (IIT Kharagpur)              Minimization                     Jan ’10   10 / 35
Iterative estimation techniques
                                  X = f(P)

  X is a measurement vector in RN
  P is a parameter vector in RM .
  We wish to seek the vector P satisfying

                                  X = f(P) −

  for which || || is minimized.


  The linear least squares problem is exactly of this type with the
  function f being defined as a linear function f(P) = AP



   (IIT Kharagpur)                Minimization              Jan ’10   11 / 35
Iterative estimation methods
                    If the function f is not a
                    linear function we use
                    iterative estimation
                    techniques.




  (IIT Kharagpur)               Minimization     Jan ’10   12 / 35
Iterative estimation methods
We start with an initial estimated value P0 , and proceed to refine
the estimate under the assumption that the function f is locally
linear.

                         Let        0   = f(P0 ) − X
We assume that the function is approximated at P0 by

                       f(P0 + ∆) = f(P0 ) + J∆

J   is the linear mapping represented by the Jacobian matrix

                                J   = ∂f/∂P




    (IIT Kharagpur)            Minimization              Jan ’10   13 / 35
Iterative estimation methods
We seek a point f(P1 ), with P1 = P0 + ∆, which minimizes

                      f(P1 ) − X = f(P0 ) + J∆ − X
                                   =    0   + J∆

Thus it is required to minimize ||      0   + J∆|| over ∆, which is a linear
minimization problem.
The vector ∆ is obtained by solving the normal equations
                    T
                   J J∆   = −J T   0             ∆ = −J+   0




 (IIT Kharagpur)               Minimization                       Jan ’10   14 / 35
Iterative estimation methods
The solution vector P is obtained by starting with an estimate P0
and computing successive approximations according to the
formula
                          Pi+1 = Pi + ∆i
where ∆i is the solution to the linear least-squares problem

                             J∆    =−     i

Matrix J is the Jacobian ∂f/∂P evaluated at Pi and   i   = f(Pi ) − X.

The algorithm converges to a least squares solution P.
Convergence can take place to a local minimum, or there may be
no convergence at all.



 (IIT Kharagpur)           Minimization                     Jan ’10   15 / 35
Newton’s method
 We consider finding minima of functions of many variables.
 Consider an arbitrary scalar-valued function g(P) where P is a
 vector.
 The optimization problem is simply to minimize g(P) over all
 values of P.
 Expand g(P) about P0 in a Taylor series to get

                    g(P0 + ∆) = g + gP ∆ + ∆T gPP ∆/2 + . . .

 where gP denotes the differentiation of g(P) with respect to P,
 where gPP denotes the differentiation of gP with respect to P.




  (IIT Kharagpur)                 Minimization                  Jan ’10   16 / 35
Newton’s method
 Expand g(P) about P0 in a Taylor series to get

                    g(P0 + ∆) = g + gP ∆ + ∆T gPP ∆/2 + . . .

 Differentiating the Taylor series with respect to ∆ we get

                    gP + gPP ∆ = 0               ∆ = −gP /gPP

 Hessian matrix: gPP is the matrix of second derivatives, the
 Hessian of g. The (i, j)th entry is ∂2 g/∂pi ∂pj , and pi and pj are the
 i th and j th parameters. Vector gP is the gradient of g.

 The method of Newton iteration consists in starting with an initial
 value of the parameters, P0 and iteratively computing parameter
 increments ∆ until convergence occurs.


  (IIT Kharagpur)                 Minimization                  Jan ’10   17 / 35
Gauss Newton Method
 Consider a special case when gP is a squared norm of an error
 function.
                          1            (P)T (P)
                  g(P) = || (P)||2 =
                          2               2
                           (P) = f(P) − X
  (P) is the error function (P) = f(P) − X
  (P) is a vector valued function of the parameter P

                                      ∂g(P)         T
                    The gradient               gP = P
                                       ∂P
                                     ∂ (P)   ∂f(P)
                    where   P   =          =         fP
                                      ∂P      ∂P
 We know that fP = J,       ∴    P   = J hence we have gP = J T


  (IIT Kharagpur)               Minimization                 Jan ’10   18 / 35
Gauss Newton Method
 Consider the second derivative gPP
                    T                                     T            T
               gP = P      therefore        gPP =         P   P   +    PP

 Since     P   = fP , and assuming that f(P) is linear,           PP   vanishes.
                                    T
                              gPP = P        P   = JT J

 We have got an approximation of the 2nd derivative gPP .
 Now using the Newton’s equation

                    gPP ∆ = −gP     we get          J T J∆    = −J T

 This is the Gauss-Newton method, in which we use an
 approximation of the Hessian gPP = J T J of the function g(P).


  (IIT Kharagpur)                 Minimization                              Jan ’10   19 / 35
Gradient Descent
                                        T
  The gradient of g(P) is given as gP = P
                                         T
  The negative gradient vector −gP = − P defines the direction of
  most rapid decrease of the cost function.
  Gradient descent is a strategy of minimization of g where we
  move iteratively in the gradient direction.
  We take small steps in the direction of descent.
                     −gP
            ∆=              where λ controls the length of the step
                      λ
  Recall that in Newton’s method, the step size is given by
                     −g∆
          ∆=               Hessian approximated by scalar matrix λI
                     gPP



   (IIT Kharagpur)                 Minimization                 Jan ’10   20 / 35
Gradient Descent
  Gradient descent by itself is not a very good minimization strategy,
  typically characterized by slow convergence due to zig-zagging.
  However Gradient descent can be quite useful in conjunction with
  Gauss-Newton iteration as a way of getting out of tight corners.
  Levenberg-Marquardt method is essentially a Gauss-Newton
  method that transitions smoothly to gradient descent when the
  Gauss-Newton updates fail.




   (IIT Kharagpur)           Minimization                  Jan ’10   21 / 35
Summary
g(P) is an arbitrary scalar valued function. g(P) = (P)T (P)/2

  Newton’s Method        Gauss Newton            Gradient Descent
                              T           T
     gPP ∆ = −gP              P P∆     =− P      λ∆ = − T = −gP

  where                  The Hessian is          The Hessian is
          T
  gPP = P P + PP  T      approximated as         replaced by λI
              T           T
  and gP = P The          P P
  cost function is
  approximated as
  quadratic near the
  minimum.




      (IIT Kharagpur)             Minimization             Jan ’10   22 / 35
Levenberg-Marquardt iteration LM
  This is a slight variation of the Gauss-Newton iteration method.
  We have the augmented normal equations:
          T
         J J∆        = −J T     −→            (J T J + λI)∆ = −J T

  The value of λ varies from iteration to iteration.
  A typical initial value of λ is 10−3 times the average of the diagonal
  elements of J T J




   (IIT Kharagpur)             Minimization                   Jan ’10   23 / 35
Levenberg-Marquardt iteration LM
  If the value of ∆              If the value of ∆ leads to an
  obtained by solving the        increased error, then λ is multiplied by
  augmented normal               the same factor and the augmented
  equations leads to             normal equations are solved again.
  reduction of error, then       This process continues until a value
  the increment is               of ∆ is found that gives rise to a
  accepted and λ is              decreased error.
  divided by a factor
  (typically 10) before the
  next iteration.
 The process of repeatedly solving the augmented
 normal equations for different values of λ until an
 acceptable ∆ is found constitutes one iteration of
 the LM algorithm.

    (IIT Kharagpur)           Minimization                   Jan ’10   24 / 35
Robust cost functions



 Squared Error (convex)       PDF        Attenuation function




   (IIT Kharagpur)        Minimization                    Jan ’10   25 / 35
Robust cost functions



Blake Zisserman (non-convex)             PDF     Attenuation function




Corrupted Gaussian (non-convex)            PDF    Attenuation function




    (IIT Kharagpur)       Minimization                       Jan ’10   26 / 35
Robust cost functions



Cauchy (non-convex)    PDF              Attenuation function




  L1 cost (convex)    PDF             Attenuation function




   (IIT Kharagpur)     Minimization                      Jan ’10   27 / 35
Robust cost functions



   Huber (convex)       PDF             Attenuation function




Pseudo Huber (convex)         PDF           Attenuation function




    (IIT Kharagpur)      Minimization                      Jan ’10   28 / 35
Square Error cost function
                  C(δ) = δ2           PDF = exp(−C(δ))

   Its main drawback is that it is not robust to outliers in the
   measurements.
   Because of the rapid growth of the quadratic curve, distant outliers
   exert an excessive influence, and can draw the cost minimum well
   away from the desired value.

   The squared-error cost function is generally very susceptible to
   outliers, and may be regarded as unusable as long as outliers are
   present.
   If outliers have been thoroughly eradicated, using for instance
   RANSAC, then it may be used.



    (IIT Kharagpur)             Minimization                       Jan ’10   29 / 35
Non-convex cost functions
  The Blake-Zisserman, corrupted Gaussian and Cauchy cost
  functions seek to mitigate the deleterious effect of outliers by
  giving them diminished weight.
  As is seen in the plot of the first two of these, once the error
  exceeds a certain threshold, it is classified as an outlier, and the
  cost remains substantially constant.
  The Cauchy cost function also seeks to deemphasize the cost of
  outliers, but this is done more gradually.




    (IIT Kharagpur)           Minimization                   Jan ’10   30 / 35
Asymptotically Linear cost functions
   The L1 cost function measures the absolute value of the error.
   The main effect of this is to give outliers less weight compared
   with the squared error.
   This cost function acts to find the median of a set of data.
   Consider a set of real valued data {ai } and a cost function defined
   by C(x) = i |x − ai | The minimum of this function is at the median
   of the set {ai }.
   For higher dimensional data ai ∈ Rn , the minimum of the cost
   function C(x) = i ||x − ai || similar stability properties with regard
   to outliers.




    (IIT Kharagpur)             Minimization                   Jan ’10   31 / 35
Huber Cost function
   The Huber cost function takes the form of a quadratic for small
   values of the error, δ, and becomes linear for values of δ beyond a
   given threshold.
   It retains the outlier stability of the L1 cost function, while for inliers
   it reflects the property that the squared-error cost function gives
   the Maximum Likelihood estimate.




    (IIT Kharagpur)              Minimization                     Jan ’10   32 / 35
Non-convex Cost functions
  The non-convex cost functions, though generally having a stable
  minimum, not much effected by outliers have the significant
  disadvantage of having local minima, which can make
  convergence to a global minimum chancy.
  The estimate is not strongly attracted to the minimum from outside
  of its immediate neighbourhood.
  Thus, they are not useful, unless (or until) the estimate is close to
  the final correct value.




    (IIT Kharagpur)           Minimization                   Jan ’10   33 / 35
Maximum Likelihood method
 Maximum likelihood is the procedure of finding the value of one or
 more parameters for a given statistic which makes the known
 likelihood distribution a maximum.
 The maximum likelihood estimate for a parameter µ is denoted µ.
                                                               ˆ
                                               n
                                                       1
                                                      √ e(xi −µ) /2σ
                                                                2    2
              f (x1 , x2 , . . . , xn |µσ) =
                                               i=1   σ 2π
                                               (2π)−n/2       (xi − µ)2
                                         =              exp −
                                                  σn            2σ2

 Taking the logarithm

                             1                                   (xi − µ)2
                    log f = − n log(2π) − n log σ −
                             2                                    2σ2


  (IIT Kharagpur)                       Minimization                         Jan ’10   34 / 35
To maximize the log likelihood

                        ∂(log f )     (xi − µ)                      xi
                                  =               = 0 giving µ =
                                                             ˆ
                           ∂µ           σ2                         n


    Similarly

           ∂(log f )   n              (xi − µ)2                               µ
                                                                         (xi −ˆ)2
                     =− +                         = 0 giving σ =
                                                             ˆ
              ∂σ       σ                σ3                                 n


    Minimizing the least squares cost function gives a result which is
    equivalent to the maximum likelihood estimate assuming
    Gaussian distribution.
In general, the maximum likelihood estimate of the parameter vector θ
is given as
                       ˆ
                       θML = arg max p(x|θ)
                                                  θ


      (IIT Kharagpur)                    Minimization                      Jan ’10   35 / 35

Contenu connexe

Tendances

3D Rigid Body Transformation for SLAM
3D Rigid Body Transformation for SLAM3D Rigid Body Transformation for SLAM
3D Rigid Body Transformation for SLAMEdwardIm1
 
【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法Ruo Ando
 
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Amrinder Arora
 
Les algorithmes d'arithmetique
Les algorithmes d'arithmetiqueLes algorithmes d'arithmetique
Les algorithmes d'arithmetiquemohamed_SAYARI
 
Dynamic programming, Branch and bound algorithm & Greedy algorithms
Dynamic programming, Branch and bound algorithm & Greedy algorithms Dynamic programming, Branch and bound algorithm & Greedy algorithms
Dynamic programming, Branch and bound algorithm & Greedy algorithms SURBHI SAROHA
 
[논문리뷰] Data Augmentation for 1D 시계열 데이터
[논문리뷰] Data Augmentation for 1D 시계열 데이터[논문리뷰] Data Augmentation for 1D 시계열 데이터
[논문리뷰] Data Augmentation for 1D 시계열 데이터Donghyeon Kim
 
Prml3.5 エビデンス近似〜
Prml3.5 エビデンス近似〜Prml3.5 エビデンス近似〜
Prml3.5 エビデンス近似〜Yuki Matsubara
 
コホート研究 isseing333
コホート研究 isseing333コホート研究 isseing333
コホート研究 isseing333Issei Kurahashi
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
 
로봇팔 제어 겉핥기
로봇팔 제어 겉핥기로봇팔 제어 겉핥기
로봇팔 제어 겉핥기Hancheol Choi
 
éNoncés+corrections bac2009
éNoncés+corrections bac2009éNoncés+corrections bac2009
éNoncés+corrections bac2009Morom Bil Morom
 
Théorie de l'information
Théorie de l'informationThéorie de l'information
Théorie de l'informationRichardTerrat1
 
Code로 이해하는 RNN
Code로 이해하는 RNNCode로 이해하는 RNN
Code로 이해하는 RNNSANG WON PARK
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 
Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)Thom Lane
 
セクシー女優で学ぶ画像分類入門
セクシー女優で学ぶ画像分類入門セクシー女優で学ぶ画像分類入門
セクシー女優で学ぶ画像分類入門Takami Sato
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 

Tendances (20)

3D Rigid Body Transformation for SLAM
3D Rigid Body Transformation for SLAM3D Rigid Body Transformation for SLAM
3D Rigid Body Transformation for SLAM
 
【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法
 
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...
 
Les algorithmes d'arithmetique
Les algorithmes d'arithmetiqueLes algorithmes d'arithmetique
Les algorithmes d'arithmetique
 
Dynamic programming, Branch and bound algorithm & Greedy algorithms
Dynamic programming, Branch and bound algorithm & Greedy algorithms Dynamic programming, Branch and bound algorithm & Greedy algorithms
Dynamic programming, Branch and bound algorithm & Greedy algorithms
 
W8PRML5.1-5.3
W8PRML5.1-5.3W8PRML5.1-5.3
W8PRML5.1-5.3
 
[논문리뷰] Data Augmentation for 1D 시계열 데이터
[논문리뷰] Data Augmentation for 1D 시계열 데이터[논문리뷰] Data Augmentation for 1D 시계열 데이터
[논문리뷰] Data Augmentation for 1D 시계열 데이터
 
Prml3.5 エビデンス近似〜
Prml3.5 エビデンス近似〜Prml3.5 エビデンス近似〜
Prml3.5 エビデンス近似〜
 
コホート研究 isseing333
コホート研究 isseing333コホート研究 isseing333
コホート研究 isseing333
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario Bros
 
로봇팔 제어 겉핥기
로봇팔 제어 겉핥기로봇팔 제어 겉핥기
로봇팔 제어 겉핥기
 
éNoncés+corrections bac2009
éNoncés+corrections bac2009éNoncés+corrections bac2009
éNoncés+corrections bac2009
 
Théorie de l'information
Théorie de l'informationThéorie de l'information
Théorie de l'information
 
Code로 이해하는 RNN
Code로 이해하는 RNNCode로 이해하는 RNN
Code로 이해하는 RNN
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)
 
セクシー女優で学ぶ画像分類入門
セクシー女優で学ぶ画像分類入門セクシー女優で学ぶ画像分類入門
セクシー女優で学ぶ画像分類入門
 
GPT
GPTGPT
GPT
 
t-vMF Similarity
t-vMF Similarityt-vMF Similarity
t-vMF Similarity
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 

En vedette

Star report 2011 (w/out API
Star report 2011 (w/out APIStar report 2011 (w/out API
Star report 2011 (w/out APIbkissell
 
Using teacher weblogs
Using teacher weblogsUsing teacher weblogs
Using teacher weblogsAnna Howe
 
Intro to general convention
Intro to general conventionIntro to general convention
Intro to general conventionphod
 
Rocket - startups, o que fazer?
Rocket - startups, o que fazer?Rocket - startups, o que fazer?
Rocket - startups, o que fazer?Gabriel Pehls
 
Pm archives photos w bios
 Pm archives photos w bios Pm archives photos w bios
Pm archives photos w biosphod
 
Action Research Project EDTP 645
Action Research Project EDTP 645Action Research Project EDTP 645
Action Research Project EDTP 645Anna Howe
 
Structure Presentation to Provincial Synods by Deputy Gay Jennings
Structure Presentation to Provincial Synods by Deputy Gay JenningsStructure Presentation to Provincial Synods by Deputy Gay Jennings
Structure Presentation to Provincial Synods by Deputy Gay Jenningsphod
 
Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11
Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11
Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11NissenbaumLawGroup
 
A igreja e as ultimas coisas d.m lloyd jones
A igreja e as ultimas coisas   d.m lloyd jonesA igreja e as ultimas coisas   d.m lloyd jones
A igreja e as ultimas coisas d.m lloyd jonesWillian's Mende's
 
2015 Sample Social Media Tactical Plan
2015 Sample Social Media Tactical Plan2015 Sample Social Media Tactical Plan
2015 Sample Social Media Tactical PlanGeoff Song
 
Aprende a decir NO - #WCSevilla16
Aprende a decir NO - #WCSevilla16Aprende a decir NO - #WCSevilla16
Aprende a decir NO - #WCSevilla16Jose Arcos
 

En vedette (18)

The place we live in
The place we live inThe place we live in
The place we live in
 
Star report 2011 (w/out API
Star report 2011 (w/out APIStar report 2011 (w/out API
Star report 2011 (w/out API
 
Using teacher weblogs
Using teacher weblogsUsing teacher weblogs
Using teacher weblogs
 
Segclus
SegclusSegclus
Segclus
 
Intro to general convention
Intro to general conventionIntro to general convention
Intro to general convention
 
Rocket - startups, o que fazer?
Rocket - startups, o que fazer?Rocket - startups, o que fazer?
Rocket - startups, o que fazer?
 
Pm archives photos w bios
 Pm archives photos w bios Pm archives photos w bios
Pm archives photos w bios
 
Action Research Project EDTP 645
Action Research Project EDTP 645Action Research Project EDTP 645
Action Research Project EDTP 645
 
Structure Presentation to Provincial Synods by Deputy Gay Jennings
Structure Presentation to Provincial Synods by Deputy Gay JenningsStructure Presentation to Provincial Synods by Deputy Gay Jennings
Structure Presentation to Provincial Synods by Deputy Gay Jennings
 
Pilietinis
PilietinisPilietinis
Pilietinis
 
Linear
LinearLinear
Linear
 
MobileEyes
MobileEyesMobileEyes
MobileEyes
 
The place we live in
The place we live inThe place we live in
The place we live in
 
The place we live in
The place we live inThe place we live in
The place we live in
 
Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11
Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11
Supercharging a Unitarian Universalist Social Action Committeeppt - 6-22-11
 
A igreja e as ultimas coisas d.m lloyd jones
A igreja e as ultimas coisas   d.m lloyd jonesA igreja e as ultimas coisas   d.m lloyd jones
A igreja e as ultimas coisas d.m lloyd jones
 
2015 Sample Social Media Tactical Plan
2015 Sample Social Media Tactical Plan2015 Sample Social Media Tactical Plan
2015 Sample Social Media Tactical Plan
 
Aprende a decir NO - #WCSevilla16
Aprende a decir NO - #WCSevilla16Aprende a decir NO - #WCSevilla16
Aprende a decir NO - #WCSevilla16
 

Similaire à Lecture 5

Evaluasi nilai berkala i gasal_kpn_12-13
Evaluasi nilai berkala i gasal_kpn_12-13Evaluasi nilai berkala i gasal_kpn_12-13
Evaluasi nilai berkala i gasal_kpn_12-13Didik Purwiyanto Vay
 
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d ApplicationsJava Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applicationswhite paper
 
Reforestation Tax Credit
Reforestation Tax CreditReforestation Tax Credit
Reforestation Tax Credittaxman taxman
 
Cv S.Uiterwijk
Cv S.UiterwijkCv S.Uiterwijk
Cv S.UiterwijkSuiterwijk
 
Underestimate of Tax
Underestimate of TaxUnderestimate of Tax
Underestimate of Taxtaxman taxman
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Resident Individual Income Tax Return - EZ
Resident Individual Income Tax Return - EZResident Individual Income Tax Return - EZ
Resident Individual Income Tax Return - EZtaxman taxman
 
Business of software 2010 atlassian lessons
Business of software 2010   atlassian lessonsBusiness of software 2010   atlassian lessons
Business of software 2010 atlassian lessonsScott Farquhar
 
Padgett Stratemann Scores a PR Trifecta
Padgett Stratemann Scores a PR TrifectaPadgett Stratemann Scores a PR Trifecta
Padgett Stratemann Scores a PR Trifectamariatbarrett
 
If You Be My Baby
If You Be My BabyIf You Be My Baby
If You Be My Babymabbagliati
 
South Park Crawl Map
South Park Crawl MapSouth Park Crawl Map
South Park Crawl Mapguest19c4f2
 

Similaire à Lecture 5 (20)

Decision theory
Decision theoryDecision theory
Decision theory
 
Evaluasi nilai berkala i gasal_kpn_12-13
Evaluasi nilai berkala i gasal_kpn_12-13Evaluasi nilai berkala i gasal_kpn_12-13
Evaluasi nilai berkala i gasal_kpn_12-13
 
P13 033
P13 033P13 033
P13 033
 
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d ApplicationsJava Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
 
Reforestation Tax Credit
Reforestation Tax CreditReforestation Tax Credit
Reforestation Tax Credit
 
Cv S.Uiterwijk
Cv S.UiterwijkCv S.Uiterwijk
Cv S.Uiterwijk
 
Underestimate of Tax
Underestimate of TaxUnderestimate of Tax
Underestimate of Tax
 
Jun05 A01 Bct
Jun05 A01 BctJun05 A01 Bct
Jun05 A01 Bct
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
E3NYC
E3NYCE3NYC
E3NYC
 
Resident Individual Income Tax Return - EZ
Resident Individual Income Tax Return - EZResident Individual Income Tax Return - EZ
Resident Individual Income Tax Return - EZ
 
Presentation for HLP
Presentation for HLPPresentation for HLP
Presentation for HLP
 
P523
P523P523
P523
 
Business of software 2010 atlassian lessons
Business of software 2010   atlassian lessonsBusiness of software 2010   atlassian lessons
Business of software 2010 atlassian lessons
 
How a government file multiplies.
How a government file multiplies.How a government file multiplies.
How a government file multiplies.
 
Padgett Stratemann Scores a PR Trifecta
Padgett Stratemann Scores a PR TrifectaPadgett Stratemann Scores a PR Trifecta
Padgett Stratemann Scores a PR Trifecta
 
P13 021
P13 021P13 021
P13 021
 
Social media marketing
Social media marketing Social media marketing
Social media marketing
 
If You Be My Baby
If You Be My BabyIf You Be My Baby
If You Be My Baby
 
South Park Crawl Map
South Park Crawl MapSouth Park Crawl Map
South Park Crawl Map
 

Plus de Krishna Karri (15)

11 mm91r05
11 mm91r0511 mm91r05
11 mm91r05
 
Translational health research
Translational health researchTranslational health research
Translational health research
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Lecture 12
Lecture 12Lecture 12
Lecture 12
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Lecture 10h
Lecture 10hLecture 10h
Lecture 10h
 
Lecture 9h
Lecture 9hLecture 9h
Lecture 9h
 
Lecture9
Lecture9Lecture9
Lecture9
 
Em
EmEm
Em
 

Lecture 5

  • 1. C OMPUTER V ISION : L EAST S QUARES M INIMIZATION IIT Kharagpur Computer Science and Engineering, Indian Institute of Technology Kharagpur. (IIT Kharagpur) Minimization Jan ’10 1 / 35
  • 2. Solution of Linear equations Consider a system of equations of the form Ax = b. Let A be an m × n matrix. If m < n there are more unknowns than equations. In this case there will not be a unique solution, but rather a vector space of solutions. If m = n there will be a unique solution as long as A is invertible. If m > n there will be more equations than unknowns. In general the system will not have a solution. (IIT Kharagpur) Minimization Jan ’10 2 / 35
  • 3. Least-squares solution Full rank case Consider the case m > n and assume that A is of rank n. We seek a vector x that is closest to providing a solution to the system Ax = b. We seek x such that ||Ax − b|| is minimized. Such an x is known as the least squares solution to the over-determined system. We seek x that minimizes ||Ax − b|| = ||UDV T x − b|| Because of the norm preserving property of orthogonal transforms, ||UDV T x − b|| = ||DV T x − U T b|| Writing y = V T x and b = UT b the problem becomes one of minimizing ||Dy − b || where D is a diagonal matrix. (IIT Kharagpur) Minimization Jan ’10 3 / 35
  • 4. b1      d1     b2        .   d2     .          y1  .     ..         .            y2      bn            .       =   dn   .                 .                bn+1       yn           .   0     .       .                 bm The nearest Dy can approach to b is the vector (b1 , b2 , . . . , bn , 0, . . . , . . . , 0)T This is achieved by setting yi = bi /di for i = 1, . . . , n The assumption rank A = n ensures that di 0 Finally x is retrieved from x = Vy. (IIT Kharagpur) Minimization Jan ’10 4 / 35
  • 5. Algorithm Least Squares Objective: Find the least-squares solution to the m × n set of equations Ax = b, where m > n and rank A= n. Algorithm: (i) Find the SVD A = UDV T (ii) Set b = U T b (iii) Find the vector y defined yi = bi /di , where di is the i th diagonal entry of D (iv) The solution is x = Vy (IIT Kharagpur) Minimization Jan ’10 5 / 35
  • 6. Pseudo Inverse Given a square diagonal matrix D, we define its pseudo-inverse to be the diagonal matrix D+ such that + 0 if Dii = 0 Dn = −1 otherwise Dii For an m × n matrix A with m ≥ n, let the SVD of A = UDV T . The pseudo-inverse of matrix A is A + = VD+ U T The least-squares solution to an m × n system of equations Ax = b of rank n is given by x = A+ b. In the case of a deficient-rank system, x = A+ b is the solution that minimizes ||x||. (IIT Kharagpur) Minimization Jan ’10 6 / 35
  • 7. Linear least-squares using normal equations Consider a system of equations of the form Ax = b. Let A be an m × n matrix. m > n. In general, no solution x will exist for this set of equations. Consequently, the task is to find the vector x that minimizes the norm ||Ax − b||. As the vector x varies over all values, the product Ax varies over the complete column space of A, i.e. the subspace of Rm spanned by the columns of A. The task is to find the closest vector to b that lies in the column space of A. (IIT Kharagpur) Minimization Jan ’10 7 / 35
  • 8. Linear least-squares using normal equations Let x be the solution to this problem. Thus Ax is the closest point to b. In this case, the difference Ax − b must be orthogonal to the column space of A. This means that Ax − b is perpendicular to each of the columns of A, hence T A (Ax − b) = 0 (A T A)x = A T b The solution is given as: x = (A T A)−1 A T b x = A+ b A + = (A T A)−1 A T The pseudo-inverse of matrix A using SVD is given as A + = VD+ U T (IIT Kharagpur) Minimization Jan ’10 8 / 35
  • 9. Least-squares solution of homogeneous equations Solving a set of equations of the form Ax = 0. x has the homogeneous representation. Hence if x is a solution, then k x is also a solution. A reasonable constraint would be to seek a solution for which ||x|| = 1 In general, such a set of equations will not have an exact solution. The problem is to find x that minimizes ||Ax|| subject to ||x|| = 1 (IIT Kharagpur) Minimization Jan ’10 9 / 35
  • 10. Least-squares solution of homogeneous equations Let A = UDV T We need to minimize ||UDV T x||. Note that ||UDV T x|| = ||DV T x|| so we need to minimize ||DV T x|| Note that ||x|| = ||V T x|| so we have the condition that ||V T x|| = 1 Let y = V T x, so we minimize ||Dy|| subject to ||y || = 1  0    Since D is a diagonal matrix with its diagonal    0      entries in descending order. y= .     .      It follows that the solution to this problem is  .      T x, x = Vy is simply the last  0    Since y = V     1   column of V (IIT Kharagpur) Minimization Jan ’10 10 / 35
  • 11. Iterative estimation techniques X = f(P) X is a measurement vector in RN P is a parameter vector in RM . We wish to seek the vector P satisfying X = f(P) − for which || || is minimized. The linear least squares problem is exactly of this type with the function f being defined as a linear function f(P) = AP (IIT Kharagpur) Minimization Jan ’10 11 / 35
  • 12. Iterative estimation methods If the function f is not a linear function we use iterative estimation techniques. (IIT Kharagpur) Minimization Jan ’10 12 / 35
  • 13. Iterative estimation methods We start with an initial estimated value P0 , and proceed to refine the estimate under the assumption that the function f is locally linear. Let 0 = f(P0 ) − X We assume that the function is approximated at P0 by f(P0 + ∆) = f(P0 ) + J∆ J is the linear mapping represented by the Jacobian matrix J = ∂f/∂P (IIT Kharagpur) Minimization Jan ’10 13 / 35
  • 14. Iterative estimation methods We seek a point f(P1 ), with P1 = P0 + ∆, which minimizes f(P1 ) − X = f(P0 ) + J∆ − X = 0 + J∆ Thus it is required to minimize || 0 + J∆|| over ∆, which is a linear minimization problem. The vector ∆ is obtained by solving the normal equations T J J∆ = −J T 0 ∆ = −J+ 0 (IIT Kharagpur) Minimization Jan ’10 14 / 35
  • 15. Iterative estimation methods The solution vector P is obtained by starting with an estimate P0 and computing successive approximations according to the formula Pi+1 = Pi + ∆i where ∆i is the solution to the linear least-squares problem J∆ =− i Matrix J is the Jacobian ∂f/∂P evaluated at Pi and i = f(Pi ) − X. The algorithm converges to a least squares solution P. Convergence can take place to a local minimum, or there may be no convergence at all. (IIT Kharagpur) Minimization Jan ’10 15 / 35
  • 16. Newton’s method We consider finding minima of functions of many variables. Consider an arbitrary scalar-valued function g(P) where P is a vector. The optimization problem is simply to minimize g(P) over all values of P. Expand g(P) about P0 in a Taylor series to get g(P0 + ∆) = g + gP ∆ + ∆T gPP ∆/2 + . . . where gP denotes the differentiation of g(P) with respect to P, where gPP denotes the differentiation of gP with respect to P. (IIT Kharagpur) Minimization Jan ’10 16 / 35
  • 17. Newton’s method Expand g(P) about P0 in a Taylor series to get g(P0 + ∆) = g + gP ∆ + ∆T gPP ∆/2 + . . . Differentiating the Taylor series with respect to ∆ we get gP + gPP ∆ = 0 ∆ = −gP /gPP Hessian matrix: gPP is the matrix of second derivatives, the Hessian of g. The (i, j)th entry is ∂2 g/∂pi ∂pj , and pi and pj are the i th and j th parameters. Vector gP is the gradient of g. The method of Newton iteration consists in starting with an initial value of the parameters, P0 and iteratively computing parameter increments ∆ until convergence occurs. (IIT Kharagpur) Minimization Jan ’10 17 / 35
  • 18. Gauss Newton Method Consider a special case when gP is a squared norm of an error function. 1 (P)T (P) g(P) = || (P)||2 = 2 2 (P) = f(P) − X (P) is the error function (P) = f(P) − X (P) is a vector valued function of the parameter P ∂g(P) T The gradient gP = P ∂P ∂ (P) ∂f(P) where P = = fP ∂P ∂P We know that fP = J, ∴ P = J hence we have gP = J T (IIT Kharagpur) Minimization Jan ’10 18 / 35
  • 19. Gauss Newton Method Consider the second derivative gPP T T T gP = P therefore gPP = P P + PP Since P = fP , and assuming that f(P) is linear, PP vanishes. T gPP = P P = JT J We have got an approximation of the 2nd derivative gPP . Now using the Newton’s equation gPP ∆ = −gP we get J T J∆ = −J T This is the Gauss-Newton method, in which we use an approximation of the Hessian gPP = J T J of the function g(P). (IIT Kharagpur) Minimization Jan ’10 19 / 35
  • 20. Gradient Descent T The gradient of g(P) is given as gP = P T The negative gradient vector −gP = − P defines the direction of most rapid decrease of the cost function. Gradient descent is a strategy of minimization of g where we move iteratively in the gradient direction. We take small steps in the direction of descent. −gP ∆= where λ controls the length of the step λ Recall that in Newton’s method, the step size is given by −g∆ ∆= Hessian approximated by scalar matrix λI gPP (IIT Kharagpur) Minimization Jan ’10 20 / 35
  • 21. Gradient Descent Gradient descent by itself is not a very good minimization strategy, typically characterized by slow convergence due to zig-zagging. However Gradient descent can be quite useful in conjunction with Gauss-Newton iteration as a way of getting out of tight corners. Levenberg-Marquardt method is essentially a Gauss-Newton method that transitions smoothly to gradient descent when the Gauss-Newton updates fail. (IIT Kharagpur) Minimization Jan ’10 21 / 35
  • 22. Summary g(P) is an arbitrary scalar valued function. g(P) = (P)T (P)/2 Newton’s Method Gauss Newton Gradient Descent T T gPP ∆ = −gP P P∆ =− P λ∆ = − T = −gP where The Hessian is The Hessian is T gPP = P P + PP T approximated as replaced by λI T T and gP = P The P P cost function is approximated as quadratic near the minimum. (IIT Kharagpur) Minimization Jan ’10 22 / 35
  • 23. Levenberg-Marquardt iteration LM This is a slight variation of the Gauss-Newton iteration method. We have the augmented normal equations: T J J∆ = −J T −→ (J T J + λI)∆ = −J T The value of λ varies from iteration to iteration. A typical initial value of λ is 10−3 times the average of the diagonal elements of J T J (IIT Kharagpur) Minimization Jan ’10 23 / 35
  • 24. Levenberg-Marquardt iteration LM If the value of ∆ If the value of ∆ leads to an obtained by solving the increased error, then λ is multiplied by augmented normal the same factor and the augmented equations leads to normal equations are solved again. reduction of error, then This process continues until a value the increment is of ∆ is found that gives rise to a accepted and λ is decreased error. divided by a factor (typically 10) before the next iteration. The process of repeatedly solving the augmented normal equations for different values of λ until an acceptable ∆ is found constitutes one iteration of the LM algorithm. (IIT Kharagpur) Minimization Jan ’10 24 / 35
  • 25. Robust cost functions Squared Error (convex) PDF Attenuation function (IIT Kharagpur) Minimization Jan ’10 25 / 35
  • 26. Robust cost functions Blake Zisserman (non-convex) PDF Attenuation function Corrupted Gaussian (non-convex) PDF Attenuation function (IIT Kharagpur) Minimization Jan ’10 26 / 35
  • 27. Robust cost functions Cauchy (non-convex) PDF Attenuation function L1 cost (convex) PDF Attenuation function (IIT Kharagpur) Minimization Jan ’10 27 / 35
  • 28. Robust cost functions Huber (convex) PDF Attenuation function Pseudo Huber (convex) PDF Attenuation function (IIT Kharagpur) Minimization Jan ’10 28 / 35
  • 29. Square Error cost function C(δ) = δ2 PDF = exp(−C(δ)) Its main drawback is that it is not robust to outliers in the measurements. Because of the rapid growth of the quadratic curve, distant outliers exert an excessive influence, and can draw the cost minimum well away from the desired value. The squared-error cost function is generally very susceptible to outliers, and may be regarded as unusable as long as outliers are present. If outliers have been thoroughly eradicated, using for instance RANSAC, then it may be used. (IIT Kharagpur) Minimization Jan ’10 29 / 35
  • 30. Non-convex cost functions The Blake-Zisserman, corrupted Gaussian and Cauchy cost functions seek to mitigate the deleterious effect of outliers by giving them diminished weight. As is seen in the plot of the first two of these, once the error exceeds a certain threshold, it is classified as an outlier, and the cost remains substantially constant. The Cauchy cost function also seeks to deemphasize the cost of outliers, but this is done more gradually. (IIT Kharagpur) Minimization Jan ’10 30 / 35
  • 31. Asymptotically Linear cost functions The L1 cost function measures the absolute value of the error. The main effect of this is to give outliers less weight compared with the squared error. This cost function acts to find the median of a set of data. Consider a set of real valued data {ai } and a cost function defined by C(x) = i |x − ai | The minimum of this function is at the median of the set {ai }. For higher dimensional data ai ∈ Rn , the minimum of the cost function C(x) = i ||x − ai || similar stability properties with regard to outliers. (IIT Kharagpur) Minimization Jan ’10 31 / 35
  • 32. Huber Cost function The Huber cost function takes the form of a quadratic for small values of the error, δ, and becomes linear for values of δ beyond a given threshold. It retains the outlier stability of the L1 cost function, while for inliers it reflects the property that the squared-error cost function gives the Maximum Likelihood estimate. (IIT Kharagpur) Minimization Jan ’10 32 / 35
  • 33. Non-convex Cost functions The non-convex cost functions, though generally having a stable minimum, not much effected by outliers have the significant disadvantage of having local minima, which can make convergence to a global minimum chancy. The estimate is not strongly attracted to the minimum from outside of its immediate neighbourhood. Thus, they are not useful, unless (or until) the estimate is close to the final correct value. (IIT Kharagpur) Minimization Jan ’10 33 / 35
  • 34. Maximum Likelihood method Maximum likelihood is the procedure of finding the value of one or more parameters for a given statistic which makes the known likelihood distribution a maximum. The maximum likelihood estimate for a parameter µ is denoted µ. ˆ n 1 √ e(xi −µ) /2σ 2 2 f (x1 , x2 , . . . , xn |µσ) = i=1 σ 2π (2π)−n/2 (xi − µ)2 = exp − σn 2σ2 Taking the logarithm 1 (xi − µ)2 log f = − n log(2π) − n log σ − 2 2σ2 (IIT Kharagpur) Minimization Jan ’10 34 / 35
  • 35. To maximize the log likelihood ∂(log f ) (xi − µ) xi = = 0 giving µ = ˆ ∂µ σ2 n Similarly ∂(log f ) n (xi − µ)2 µ (xi −ˆ)2 =− + = 0 giving σ = ˆ ∂σ σ σ3 n Minimizing the least squares cost function gives a result which is equivalent to the maximum likelihood estimate assuming Gaussian distribution. In general, the maximum likelihood estimate of the parameter vector θ is given as ˆ θML = arg max p(x|θ) θ (IIT Kharagpur) Minimization Jan ’10 35 / 35