SlideShare une entreprise Scribd logo
1  sur  53
Robust parametric classification and variable
     selection with minimum distance estimation

              Eric Chia,b,1 with David W. Scotta,2
                      a Department  of Statistics,
                            Rice University
                     b Baylor   College of Medicine


                          June 17, 2010




1
    DOE DE-FG02-97ER25308
2
    NSF DMS-09-07491
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Logistic Regression




   Suppose we wish to predict y ∈ {0, 1}n using X ∈ Rn×p .
   The number of features p could be very large.
Univariate Logistic Regression: MLE



              1.0                                  qq q q q q
                                                   q qq         q    q qq
                                                                       q    q q qq qq qq qqqq q qq q
                                                                               qq   q q qq
                                                                                    qq         q qq
                                                                                                  q




              0.8



              0.6
    Pr(Y=1)




              0.4



              0.2



              0.0             qq q q q q q qq q q q q q q
                                q q   q  q q q q qq
                                         qq          q      q q qq
                                                            q q
                                                             q          q q q q qq
                                                                             q q q
                                                                                 q        q


                    −6   −4                −2                       0                    2             4   6
                                                                X
MLE is sensitive to outliers



              1.0   q    qq   q   q                             qq q q q q
                                                                q qq         q    q qq
                                                                                    q    q q qq qq qq qqqq q qq q
                                                                                            qq   q q qq
                                                                                                 qq         q qq
                                                                                                               q




              0.8



              0.6
    Pr(Y=1)




              0.4



              0.2



              0.0                          qq q q q q q qq q q q q q q
                                             q q   q  q q q q qq
                                                      qq          q      q q qq
                                                                         q q
                                                                          q          q q q q qq
                                                                                          q q q
                                                                                              q        q


                    −6                −4                −2                       0                    2             4   6
                                                                             X
MLE is sensitive to outliers




   Likelihood based choice
       Outlier or not, MLE puts mass wherever data lies.
       Cost: MLE puts mass over regions where there is no data.
MLE is sensitive to outliers
                      1.0   q    qq   q   q                             qq q q q q
                                                                        q qq         q    q qq
                                                                                            q    q q qq qq qq qqqq q qq q
                                                                                                    qq   q q qq
                                                                                                         qq         q qq
                                                                                                                       q




                      0.8



                      0.6
            Pr(Y=1)




                      0.4



                      0.2



                      0.0                          qq q q q q q qq q q q q q q
                                                     q q   q  q q q q qq
                                                              qq          q      q q qq
                                                                                 q q
                                                                                  q          q q q q qq
                                                                                                  q q q
                                                                                                      q        q


                            −6                −4                −2                       0                    2             4   6
                                                                                     X




   There are no ’ones’ between -4 and -2.

                                      But P(Y = 1|X ∈ (−4, −2)) ↑.

   There are no ’zeros’ between 4 and 6.

                                          But P(Y = 0|X ∈ (4, 6)) ↑.
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
The L2 distance as an alternative to the deviance loss.




       g : unknown true density.
       fθ : putative parametric density.
       Find θ that minimizes the ISE

                      ˆ
                      θ = argmin     (fθ (x) − g (x))2 dx.
                              θ
The L2 E Method



      The equivalent empirical criterion:
                                                    n
               ˆ                               2
               θ = argmin       fθ (x)2 dx −             fθ (Xi ) ,
                       θ                       n
                                                   i=1

      where Xi ∈ Rp is the covariate vector of the i th observation.
      The L2 Estimator or L2 E [Scott, 2001].
      Familar quantity: Smoothing parameter selection in
      non-parametric density estimation.
Density-power divergence


   The L2 E and MLE are empirical minimizers of two different points
   in a spectrum divergence measures [Basu et al, 1998].


                                      1                    1 1+γ
   dγ (g , fθ ) =   fθ1+γ (z) − 1 +       g (z)fθγ (z) +     g   (z) dz,
                                      γ                    γ

       γ > 0 trades off efficiency for robustness.
       γ = 1 =⇒ L2 loss.
       γ → 0 =⇒ Kullback - Leibler divergence.
Robustness of the L2 distance


                  ˆ
                  θ = argmin      (fθ (x) − g (x))2 dx.
                              θ

      The L2 distance is zero-forcing:

                         g (x) = 0 forces fθ (x) = 0.

      Puts premium on avoiding “false positives”.
      L2 E balances:

                       mass where data is
                       v.s.
                       no mass where data is absent.
Partial Densities: An extra degree of freedom


       Expand the search space [Scott, 2001]:

                             (wfθ (x) − g (x))2 dx.

       Fit a parametric model to only a fraction, w , of the data
       (Hopefully the fraction described well by the parametric
       model!)

                                                      n
          ˆ ˆ                   2     2     2w
         (θ, w ) = argmin     w fθ (x) dx −               fθ (Xi ) .
                    θ,w                      n
                                                  i=1
Logistic L2 E loss



   Let F (u) = 1/(1 + exp(−u)), logistic function, then

                              n
      ˆ ˆ               w2
     (β, w ) = argmin              F (xiT β)2 + (1 − F (xiT β))2
              β,w ∈[0,1] n   i=1
                              n
                         w
                    −2             yi F (xiT β) + (1 − yi )(1 − F (xiT β)) .
                         n
                             i=1
Two dimensional example


                          4


                          2


                          0

                    X2
                         −2


                         −4




                              −5   0        5   10
                                       X1




      n = 300 and p = 2.
      Three clusters each of size 100
          Two are labelled 0
          One is labelled 1
5                           5




      0                           0
X2




                            X2
     −5                          −5




          0        5   10                   0        5   10
              X1                                X1


          (a) MLE                     (b) L2 E A :w = 1.026
                                                  ˆ
5                                  5




      0                                  0
X2




                                   X2
     −5                                 −5




                0        5    10                   0        5   10
                    X1                                 X1


          (c) L2 E B: w = 0.666
                      ˆ                      (d) L2 E C: w = 0.668
                                                         ˆ
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
The optimization problem



   Challenges
       L2 E loss is not convex.
              Hessian of the L2 E loss is non-definite.
              Standard Newton-Raphson fails.
       Scalability and stability as p increases?

   Solution
       Majorization-Minimization
Majorization-Minimization




   Strategy
   Minimize a surrogate function, majorization.
   Choose surrogate such that
       ↓ surrogate =⇒ ↓ objective.
       surrogate is easier to minimize than objective.
Majorization-Minimization




   Definition
   Given f and g , real-valued functions on Rp , g majorizes f at x if
    1. g (x) = f (x) and
    2. g (u) ≥ f (u) for all u.
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
More
Lack of fit




              Less



                     very bad                  optimal            less bad

                                The spectrum of logistic models
Quadratic majorization of the logistic L2 E loss

   The loss has bounded curvature with respect to β. Fix w .
   Majorize the exact second order Taylor expansion.
                                     1 T −1 T (m)
                 β (m+1) = β (m) −     (X X ) X Z ,
                                     K
   where
                 1            3 4                      w
            K≥      max         wz − z 3 − 2wz 2 + z +      .
                 4 z∈[−1,1]   2                        2

       K controls the step size. Its lower bound is related to the
       maximum curvature of the loss.
       Z (m) is a working response that depends on Y and X β (m) .
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Continuous variable selection with the LASSO



   Minimize
                                           p
                          “L2 E loss ”+λ         |βi |
                                           i=1

   Penalized majorization of loss majorizes the penalized loss.
   Minimize
                                                         p
                  “majorization of L2 E loss ”+λ               |βi |
                                                         i=1
Coordinate Descent



   Suppose X is standardized, then

                   (m+1)          (m)       1 T (m)
                  βk       = S βk       −    X Z ,λ ,
                                            K (k)

   where S is the soft threshold function

                    S(x, λ) = sign(x) max(|x| − λ, 0).

   Extension to elastic net is straightforward.
Heuristic Model Selection



   Regularization Path
   Calculate penalized regression coefficients for range of λ values.

   Information Criterion
       For each λ, calculate deviance loss using L2 E coefficients and
       add correction term (AIC and BIC).
       Select model with lowest AIC/BIC value.
       Use number of non-zero penalized regression coefficients for
       degrees of freedom [Zou et al, 2007].
Heuristic Model Selection


                                                                         151   127    111   97          69   38       7 4 4 4 4 3 2 0




                                                                   800
           1.5




                                                                   700
           1.0




                                                                   600
           0.5




                                                         L2E BIC
      βj




                                                                   500
           0.0




                                                                   400
           -0.5




                                                                   300
           -1.0




                                                                   200
                  -3.0   -2.5              -2.0   -1.5                         -3.0              -2.5               -2.0      -1.5

                                log10(λ)                                                                 log10(λ)
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Simulations: Estimation




      n = 200, p = 4
      Xi | Group 1 ∼ i.i.d. N(µ, σ)
      Xi | Group 2 ∼ i.i.d. N(−µ, σ)
      β = (1, 1/2, 1, 2)
      Yi |Xi ∼ i.d. Bern(F (XiT β))
      1,000 replicates.
Case 1




   Vary position of 1 outlier.
Distributions of fitted coefficients


                                                             1                                                                               2

                   6

                   4                                                                                       q   q       q     q       q   q
                                                                                                                                                     q   q
                                                                                                                                                                 q    q
                       q   q   q       q     q       q   q           q   q       q    q       q    q
                           q   q       q     q       q   q           q   q       q    q       q    q                                                                               q
                       q   q
                           q   q
                               q       q
                                       q     q
                                             q       q
                                                     q   q
                                                         q           q
                                                                     q   q
                                                                         q       q
                                                                                 q    q
                                                                                      q       q
                                                                                              q    q
                                                                                                   q           q                                                              q
                       q
                       q   q
                           q   q   q   q     q
                                             q       q
                                                     q   q           q
                                                                     q   q       q
                                                                                 q    q
                                                                                      q       q    q   q   q
                                                                                                           q   q       q     q
                                                                                                                             q       q   q
                                                                                                                                         q           q
                                                                                                                                                     q   q
                                                                                                                                                         q       q
                                                                                                                                                                 q    q       q    q
                       q   q
                           q   q
                               q   q
                                   q   q
                                       q     q
                                             q       q   q
                                                         q           q
                                                                     q   q
                                                                         q       q    q
                                                                                      q       q
                                                                                              q    q
                                                                                                   q       q   q   q   q
                                                                                                                       q     q       q
                                                                                                                                     q   q           q   q       q    q
                                                                                                                                                                      q       q    q
                                                                                                                                                                                   q
                   2   q
                       q   q
                           q   q   q
                                   q
                                   q
                                       q     q   q
                                                 q
                                                 q
                                                 q
                                                 q
                                                     q
                                                     q   q
                                                         q
                                                                 q
                                                                 q
                                                                     q   q       q
                                                                                 q    q
                                                                                      q       q    q
                                                                                                   q                             q                                            q
                                                 q               q
                                                                 q           q            q                                                      q
                                                                 q           q
                                                                             q            q                                                                  q
                                                                             q
                                                                             q            q                                                                               q
                                                                                          q
                                                                                          q
                                                                                          q
                       q   q   q   q   q     q       q   q           q   q       q    q            q
                   0   q
                       q
                       q
                       q
                           q
                           q   q
                               q
                               q
                                   q
                                   q
                                   q
                                       q
                                       q
                                       q     q
                                             q
                                             q
                                                 q
                                                 q
                                                 q
                                                 q
                                                     q
                                                     q   q
                                                         q
                                                         q       q
                                                                 q
                                                                     q
                                                                     q   q
                                                                         q
                                                                         q
                                                                                 q
                                                                                 q    q
                                                                                      q       q
                                                                                              q
                                                                                              q
                                                                                              q
                                                                                                   q
                                                                                                   q
                                                                                                   q
                       q   q   q   q   q     q   q   q   q       q   q   q   q
                                                                             q   q    q       q
                                                                                              q
                                                                                              q    q
                                                                                                   q
                                                                                                   q
                           q   q       q     q       q   q       q   q   q   q   q    q   q   q    q
                                                                                                                                                                                       Method
    Fitted Value




                                                                             q
                                                                             q            q
                                                                                          q            q   q   q   q   q     q   q   q   q           q   q       q    q
                                                                                          q   q    q                                             q                            q    q
                                                                                                                                                             q
                                                                                                                                                             q            q
                                                                                                                                                                                                     MLE
                                                             3                                                                               4
                                                                                                                                                                                                L2E: w = 1
                                                                                                               q             q           q
                                                                                                           q           q             q                   q
                                                                                                                                                     q
                                                                                                                                                                 q
                                                                                                                                                                      q                     L2E: w = wopt
                   6                                                                                                                                                               q
                                                                                                                                                                              q
                           q   q       q     q           q               q
                                                     q               q
                                                                                 q    q
                   4       q   q       q     q       q   q           q   q       q    q
                                                                                              q    q
                                                                                                   q
                       q   q   q       q     q       q   q           q   q       q    q
                                   q

                   2

                   0



                       −0.25           1.5           3               6           12           24       −0.25           1.5           3               6           12           24
                                                                                          Outlier position
Estimation




   MLE regression coefficients driven to zero (implosion breakdown)
Case 2




   Vary number of outliers at a fixed position.
Distributions of fitted coefficients


                                                      1                                                                               2
                                                                                             q                                                                               q
                                                                                                                                                                q
                                                                                q                                                                  q
                                                                                          q                                       q
                    4                                                           q         q
                                                                                                  4     q q       q q
                                                                   q            q         q
                                                                                          q                                   q               q
                                                                   q            q         q
                                                                                          q                                                                q            q
                                                  q                q            q
                                                                                q         q
                    3   q q q
                          q q
                                  q   q       q   q
                                                  q           q    q
                                                                   q
                                                                   q
                                                                           q    q
                                                                                q
                                                                                q
                                                                                          q
                                                                                        q q       3                                                          q            q
                                                                                                                                                                          q
                          q q     q
                                  q   q       q   q           q    q       q    q
                                                                                q         q
                                                                                        q q
                                                                                          q
                                                                                          q                                                     q            q            q
                        q q q     q
                                  q   q
                                      q       q
                                              q
                                                  q
                                                  q
                                                  q           q    q
                                                                   q
                                                                   q       q    q
                                                                                q       q q           q q q       q q           q
                                                                                                                              q q             q q          q q
                                                                                                                                                             q            q
                                                                                                                                                                        q q
                          q q
                          q q
                        q q q     q   q
                                      q       q   q           q
                                                              q    q       q
                                                                           q    q       q q
                                                                                        q q             q q       q q         q q             q q          q q          q
                        q q q
                        q q q     q
                                  q   q       q
                                              q   q           q    q       q    q
                                                                                q       q q             q q       q q         q q             q            q            q
                    2   q q q
                        q
                        q q q
                                  q
                                q q
                                  q
                                  q
                                      q
                                      q
                                      q
                                      q
                                              q
                                              q
                                              q
                                              q
                                                  q
                                                  q
                                                  q
                                                  q
                                                              q
                                                              q
                                                              q
                                                              q
                                                                   q
                                                                   q
                                                                   q       q
                                                                           q
                                                                           q
                                                                           q
                                                                                q       q
                                                                                        q
                                                                                        q
                                                                                        q         2           q
                        q q q   q q
                                q             q               q            q            q
                                q
                                q
                                q
                                          q
                                          q
                                          q               q
                                                          q            q
                                                                       q            q
                                                                                    q             1                       q
                                                                                                                                          q            q            q
                    1                     q
                                          q
                                          q
                                          q
                                                          q
                                                          q
                                                          q
                                                                       q
                                                                       q
                                                                       q
                                                                       q
                                                                                    q
                                                                                    q
                                                                                    q
                                                                                    q
                                                          q            q
                                                                       q            q
                                                                                    q
                        q                   q
                                            q q             q q          q            q q
                                                                                                  0
                    0   q q q
                        q q q
                        q
                        q q q   q q q
                                q q q
                                q q q       q q
                                            q q             q q
                                                            q q          q
                                                                         q
                                                                                q
                                                                                q
                                                                                q     q
                                                                                      q q
                        q q q
                        q       q q q
                                q
                                q   q       q q             q q
                                                            q q          q
                                                                         q      q
                                                                                q     q q
                                                                                      q q
                                                                                        q
                        q q q   q q q     q q
                                          q                 q            q            q          −1   q q q   q q q           q q             q            q            q
                                          q q q           q q q                 q                                                                                                Method
    Fitted Value




                          q q     q q     q   q           q            q q
                                                                       q            q q q                                                          q            q
                                          q                   q                     q                                                                                        q
                   −1                     q               q
                                                          q
                                                          q
                                                                       q
                                                                       q
                                                                       q
                                                                       q
                                                                                q   q
                                                                                    q
                                                                                    q
                                                                                    q
                                                                                        q                                 q
                                                                                                                          q               q
                                                                                                                                          q            q            q
                                                                                                                                                                                               MLE
                                                      3                                                                               4
                                                                                                                                                                                          L2E: w = 1
                                                                                             q                                                                               q
                                                                                                                                                                q            q
                                                                                             q    8                                                q                                  L2E: w = w.opt
                                                                                             q                                    q
                    6                                              q
                                                                                q
                                                                                q                                     q                                                      q
                                                  q                                                     q q       q           q               q                 q            q
                                                                   q
                          q q     q q         q q                                                 6                                                q       q            q
                                                              q            q q          q q                                       q
                    4     q q     q q         q               q              q
                                                                           q q          q q
                                                                q
                        q q q     q q         q q             q            q            q
                                                                                                  4
                    2
                                                                                                  2
                    0
                                                                                                  0

                         0       1            5               10           15           20             0          1           5               10           15           20
                                                                                    Number of outliers
Simulations: Variable Selection


       n = 200, p = 1000
       Xi | Group 1 ∼ i.i.d. N(µ, σ)
       Xi | Group 2 ∼ i.i.d. N(−µ, σ)
       β = (1, 1, 1, 1, 0, . . . , 0)
       Yi |Xi ∼ i.d. Bern(F (XiT β))
       1,000 replicates.

   Single Outlier
   Moved along ray starting at centroid of one group and moving
   away along (1, 1, 1, 1, 0, . . . , 0).
Average number of correct variables selected


                                           AIC                                                 BIC

                  4




                  3


                                                                                                                              method
    Expectation




                                                                                                                                            MLE
                  2
                                                                                                                                       L2E: w = 1
                                                                                                                                   L2E: w = wopt



                  1




                  0

                      0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5   0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5
                                                          Outlier Relative Position
Average number of incorrect variables selected


                                             AIC                                                 BIC

                  140


                  120


                  100

                                                                                                                                method
    Expectation




                   80
                                                                                                                                              MLE
                                                                                                                                         L2E: w = 1
                   60                                                                                                                L2E: w = wopt


                   40


                   20


                    0

                        0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5   0   1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44   9.5
                                                           Outlier Relative Position
Variable Selection




   Implosion breakdown =⇒ reduced SNR =⇒ missed detections
Outline


   The binary regression problem

   The L2 E Method

   Estimation

   Variable Selection

   Simulations

   Conclusion
Summary




     MLE logistic regression is sensitive to implosion breakdown.
     Estimation and variable selection are affected: contaminants
     reduce SNR.
     L2 E is robust because it is zero forcing.
     Majorization-Minimization + Coordinate Descent facilitate
     fast and stable optimization.
Future work




      Is w worth optimizing over?
      What is the correct AIC or BIC formulation?
      What are the degrees of freedom in the L2 E loss model?
References


      D.W. Scott.
      Parametric statistical modeling by minimum integrated square
      error.
      Technometrics, 43(3):274–285, 2001.
      A. Basu et al.
      Robust and efficient estimation by minimising a density power
      divergence.
      Biometrika, 85(3):549–559, 1998
      H. Zou et al.
      On the “degrees of freedom” of the lasso.
      Annals of Statistics, 35(5):2173–2192, 2007

Contenu connexe

Similaire à Robust parametric classification and variable selection with minimum distance estimation

Similaire à Robust parametric classification and variable selection with minimum distance estimation (11)

Regression Modelling Overview
Regression Modelling OverviewRegression Modelling Overview
Regression Modelling Overview
 
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & ApplicationsNavigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
 
Clustering Plot
Clustering PlotClustering Plot
Clustering Plot
 
Slides mcneil
Slides mcneilSlides mcneil
Slides mcneil
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
 
Slides alexander-mcneil
Slides alexander-mcneilSlides alexander-mcneil
Slides alexander-mcneil
 
Detecting Drug Effects in the Brain
Detecting Drug Effects in the BrainDetecting Drug Effects in the Brain
Detecting Drug Effects in the Brain
 
Time series compare
Time series compareTime series compare
Time series compare
 
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
Characterizing the Density of Chemical Spaces and its Use in Outlier Analysis...
 
Stat7840 hao wu
Stat7840 hao wuStat7840 hao wu
Stat7840 hao wu
 
Slides lyon-2011
Slides lyon-2011Slides lyon-2011
Slides lyon-2011
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Robust parametric classification and variable selection with minimum distance estimation

  • 1. Robust parametric classification and variable selection with minimum distance estimation Eric Chia,b,1 with David W. Scotta,2 a Department of Statistics, Rice University b Baylor College of Medicine June 17, 2010 1 DOE DE-FG02-97ER25308 2 NSF DMS-09-07491
  • 2. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 3. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 4. Logistic Regression Suppose we wish to predict y ∈ {0, 1}n using X ∈ Rn×p . The number of features p could be very large.
  • 5. Univariate Logistic Regression: MLE 1.0 qq q q q q q qq q q qq q q q qq qq qq qqqq q qq q qq q q qq qq q qq q 0.8 0.6 Pr(Y=1) 0.4 0.2 0.0 qq q q q q q qq q q q q q q q q q q q q q qq qq q q q qq q q q q q q q qq q q q q q −6 −4 −2 0 2 4 6 X
  • 6. MLE is sensitive to outliers 1.0 q qq q q qq q q q q q qq q q qq q q q qq qq qq qqqq q qq q qq q q qq qq q qq q 0.8 0.6 Pr(Y=1) 0.4 0.2 0.0 qq q q q q q qq q q q q q q q q q q q q q qq qq q q q qq q q q q q q q qq q q q q q −6 −4 −2 0 2 4 6 X
  • 7. MLE is sensitive to outliers Likelihood based choice Outlier or not, MLE puts mass wherever data lies. Cost: MLE puts mass over regions where there is no data.
  • 8. MLE is sensitive to outliers 1.0 q qq q q qq q q q q q qq q q qq q q q qq qq qq qqqq q qq q qq q q qq qq q qq q 0.8 0.6 Pr(Y=1) 0.4 0.2 0.0 qq q q q q q qq q q q q q q q q q q q q q qq qq q q q qq q q q q q q q qq q q q q q −6 −4 −2 0 2 4 6 X There are no ’ones’ between -4 and -2. But P(Y = 1|X ∈ (−4, −2)) ↑. There are no ’zeros’ between 4 and 6. But P(Y = 0|X ∈ (4, 6)) ↑.
  • 9. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 10. The L2 distance as an alternative to the deviance loss. g : unknown true density. fθ : putative parametric density. Find θ that minimizes the ISE ˆ θ = argmin (fθ (x) − g (x))2 dx. θ
  • 11. The L2 E Method The equivalent empirical criterion: n ˆ 2 θ = argmin fθ (x)2 dx − fθ (Xi ) , θ n i=1 where Xi ∈ Rp is the covariate vector of the i th observation. The L2 Estimator or L2 E [Scott, 2001]. Familar quantity: Smoothing parameter selection in non-parametric density estimation.
  • 12. Density-power divergence The L2 E and MLE are empirical minimizers of two different points in a spectrum divergence measures [Basu et al, 1998]. 1 1 1+γ dγ (g , fθ ) = fθ1+γ (z) − 1 + g (z)fθγ (z) + g (z) dz, γ γ γ > 0 trades off efficiency for robustness. γ = 1 =⇒ L2 loss. γ → 0 =⇒ Kullback - Leibler divergence.
  • 13. Robustness of the L2 distance ˆ θ = argmin (fθ (x) − g (x))2 dx. θ The L2 distance is zero-forcing: g (x) = 0 forces fθ (x) = 0. Puts premium on avoiding “false positives”. L2 E balances: mass where data is v.s. no mass where data is absent.
  • 14. Partial Densities: An extra degree of freedom Expand the search space [Scott, 2001]: (wfθ (x) − g (x))2 dx. Fit a parametric model to only a fraction, w , of the data (Hopefully the fraction described well by the parametric model!) n ˆ ˆ 2 2 2w (θ, w ) = argmin w fθ (x) dx − fθ (Xi ) . θ,w n i=1
  • 15. Logistic L2 E loss Let F (u) = 1/(1 + exp(−u)), logistic function, then n ˆ ˆ w2 (β, w ) = argmin F (xiT β)2 + (1 − F (xiT β))2 β,w ∈[0,1] n i=1 n w −2 yi F (xiT β) + (1 − yi )(1 − F (xiT β)) . n i=1
  • 16. Two dimensional example 4 2 0 X2 −2 −4 −5 0 5 10 X1 n = 300 and p = 2. Three clusters each of size 100 Two are labelled 0 One is labelled 1
  • 17. 5 5 0 0 X2 X2 −5 −5 0 5 10 0 5 10 X1 X1 (a) MLE (b) L2 E A :w = 1.026 ˆ
  • 18. 5 5 0 0 X2 X2 −5 −5 0 5 10 0 5 10 X1 X1 (c) L2 E B: w = 0.666 ˆ (d) L2 E C: w = 0.668 ˆ
  • 19. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 20. The optimization problem Challenges L2 E loss is not convex. Hessian of the L2 E loss is non-definite. Standard Newton-Raphson fails. Scalability and stability as p increases? Solution Majorization-Minimization
  • 21. Majorization-Minimization Strategy Minimize a surrogate function, majorization. Choose surrogate such that ↓ surrogate =⇒ ↓ objective. surrogate is easier to minimize than objective.
  • 22. Majorization-Minimization Definition Given f and g , real-valued functions on Rp , g majorizes f at x if 1. g (x) = f (x) and 2. g (u) ≥ f (u) for all u.
  • 23. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 24. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 25. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 26. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 27. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 28. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 29. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 30. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 31. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 32. More Lack of fit Less very bad optimal less bad The spectrum of logistic models
  • 33. Quadratic majorization of the logistic L2 E loss The loss has bounded curvature with respect to β. Fix w . Majorize the exact second order Taylor expansion. 1 T −1 T (m) β (m+1) = β (m) − (X X ) X Z , K where 1 3 4 w K≥ max wz − z 3 − 2wz 2 + z + . 4 z∈[−1,1] 2 2 K controls the step size. Its lower bound is related to the maximum curvature of the loss. Z (m) is a working response that depends on Y and X β (m) .
  • 34. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 35. Continuous variable selection with the LASSO Minimize p “L2 E loss ”+λ |βi | i=1 Penalized majorization of loss majorizes the penalized loss. Minimize p “majorization of L2 E loss ”+λ |βi | i=1
  • 36. Coordinate Descent Suppose X is standardized, then (m+1) (m) 1 T (m) βk = S βk − X Z ,λ , K (k) where S is the soft threshold function S(x, λ) = sign(x) max(|x| − λ, 0). Extension to elastic net is straightforward.
  • 37. Heuristic Model Selection Regularization Path Calculate penalized regression coefficients for range of λ values. Information Criterion For each λ, calculate deviance loss using L2 E coefficients and add correction term (AIC and BIC). Select model with lowest AIC/BIC value. Use number of non-zero penalized regression coefficients for degrees of freedom [Zou et al, 2007].
  • 38. Heuristic Model Selection 151 127 111 97 69 38 7 4 4 4 4 3 2 0 800 1.5 700 1.0 600 0.5 L2E BIC βj 500 0.0 400 -0.5 300 -1.0 200 -3.0 -2.5 -2.0 -1.5 -3.0 -2.5 -2.0 -1.5 log10(λ) log10(λ)
  • 39. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 40. Simulations: Estimation n = 200, p = 4 Xi | Group 1 ∼ i.i.d. N(µ, σ) Xi | Group 2 ∼ i.i.d. N(−µ, σ) β = (1, 1/2, 1, 2) Yi |Xi ∼ i.d. Bern(F (XiT β)) 1,000 replicates.
  • 41. Case 1 Vary position of 1 outlier.
  • 42. Distributions of fitted coefficients 1 2 6 4 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Method Fitted Value q q q q q q q q q q q q q q q q q q q q q q q q q q MLE 3 4 L2E: w = 1 q q q q q q q q q q L2E: w = wopt 6 q q q q q q q q q q q q 4 q q q q q q q q q q q q q q q q q q q q q q q q q 2 0 −0.25 1.5 3 6 12 24 −0.25 1.5 3 6 12 24 Outlier position
  • 43. Estimation MLE regression coefficients driven to zero (implosion breakdown)
  • 44. Case 2 Vary number of outliers at a fixed position.
  • 45. Distributions of fitted coefficients 1 2 q q q q q q q 4 q q 4 q q q q q q q q q q q q q q q q q q q q q 3 q q q q q q q q q q q q q q q q q q q q q 3 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q 1 q q q q 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −1 q q q q q q q q q q q q q q q q q q Method Fitted Value q q q q q q q q q q q q q q q q q q q −1 q q q q q q q q q q q q q q q q q q q q MLE 3 4 L2E: w = 1 q q q q q 8 q L2E: w = w.opt q q 6 q q q q q q q q q q q q q q q q q q q q 6 q q q q q q q q q 4 q q q q q q q q q q q q q q q q q q q q q q 4 2 2 0 0 0 1 5 10 15 20 0 1 5 10 15 20 Number of outliers
  • 46. Simulations: Variable Selection n = 200, p = 1000 Xi | Group 1 ∼ i.i.d. N(µ, σ) Xi | Group 2 ∼ i.i.d. N(−µ, σ) β = (1, 1, 1, 1, 0, . . . , 0) Yi |Xi ∼ i.d. Bern(F (XiT β)) 1,000 replicates. Single Outlier Moved along ray starting at centroid of one group and moving away along (1, 1, 1, 1, 0, . . . , 0).
  • 47. Average number of correct variables selected AIC BIC 4 3 method Expectation MLE 2 L2E: w = 1 L2E: w = wopt 1 0 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 Outlier Relative Position
  • 48. Average number of incorrect variables selected AIC BIC 140 120 100 method Expectation 80 MLE L2E: w = 1 60 L2E: w = wopt 40 20 0 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 0 1.06 2.11 3.17 4.22 5.28 6.33 7.39 8.44 9.5 Outlier Relative Position
  • 49. Variable Selection Implosion breakdown =⇒ reduced SNR =⇒ missed detections
  • 50. Outline The binary regression problem The L2 E Method Estimation Variable Selection Simulations Conclusion
  • 51. Summary MLE logistic regression is sensitive to implosion breakdown. Estimation and variable selection are affected: contaminants reduce SNR. L2 E is robust because it is zero forcing. Majorization-Minimization + Coordinate Descent facilitate fast and stable optimization.
  • 52. Future work Is w worth optimizing over? What is the correct AIC or BIC formulation? What are the degrees of freedom in the L2 E loss model?
  • 53. References D.W. Scott. Parametric statistical modeling by minimum integrated square error. Technometrics, 43(3):274–285, 2001. A. Basu et al. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549–559, 1998 H. Zou et al. On the “degrees of freedom” of the lasso. Annals of Statistics, 35(5):2173–2192, 2007