Cooperative-Lasso for sparse groups

1. Sparsity with sign-coherent groups of variables via the cooperative-Lasso Julien Chiquet1 , Yves Grandvalet2 , Camille Charbonnier1 1 e ´ Statistique et G´nome, CNRS & Universit´ d’Evry Val d’Essonne e 2 Heudiasyc, CNRS & Universit´ de Technologie de Compi`gne e e SSB – 29 mars 2011 arXiv preprint. http://arxiv.org/abs/1103.2697 R-package scoop. http://stat.genopole.cnrs.fr/logiciels/scoop cooperative-Lasso 1

2. Notations Let Y be the output random variable, X = (X 1 , . . . , X p ) be the input random variables, where X j is the jth predictor. The data Given a sample {(yi , xi ), i = 1, . . . , n} of i.id. realizations of (Y, X), denote y = (y1 , . . . , yn ) the response vector, xj = (xj , . . . , xj ) the vector of data for the jth predictor, 1 n X the n × p design matrix of data whose jth column is xj , D = {i : (yi , xi ) ∈ training set}, T = {i : (yi , xi ) ∈ test set}. cooperative-Lasso 2

3. Generalized linear models Suppose Y depends linearly on X through a function g: E(Y ) = g(Xβ ). ˆ We predict a response yi by yi = g(xi β) for any i ∈ T by solving ˆ ˆ β = arg max D (β) = arg min Lg (yi , xi β), β β i∈D where Lg is a loss function depending on the function g. Typically, if Y is Gaussian and g = Id (OLS), Lg (y, xβ) = (y − xβ)2 if Y is binary and g : t → g(t) = (1 + e−t )−1 (logistic regression) Lg (y, xβ) = − y · xβ − log 1 + exβ or any negative log-likelihood of an exponential family distribution. cooperative-Lasso 3

4. Generalized linear models Suppose Y depends linearly on X through a function g: E(Y ) = g(Xβ ). ˆ We predict a response yi by yi = g(xi β) for any i ∈ T by solving ˆ ˆ β = arg max D (β) = arg min Lg (yi , xi β), β β i∈D where Lg is a loss function depending on the function g. Typically, if Y is Gaussian and g = Id (OLS), Lg (y, xβ) = (y − xβ)2 if Y is binary and g : t → g(t) = (1 + e−t )−1 (logistic regression) Lg (y, xβ) = − y · xβ − log 1 + exβ or any negative log-likelihood of an exponential family distribution. cooperative-Lasso 3

5. Estimation and selection at the group level 1. Structure: the set I = {1, . . . , p} splits into a known partition. K I= Gk , with Gk ∩ G = ∅, k = . k=1 2. Sparsity: the support S of β has few entries. S = {i : βi = 0}, such as |S| p. The group-Lasso estimator Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06 K ˆgroup = arg min − β D (β) +λ wk β Gk . β∈Rp k=1 λ ≥ 0 controls the overall amount of penalty, wk > 0 adapts the penalty between groups (dropped hereafter). cooperative-Lasso 4

6. Estimation and selection at the group level 1. Structure: the set I = {1, . . . , p} splits into a known partition. K I= Gk , with Gk ∩ G = ∅, k = . k=1 2. Sparsity: the support S of β has few entries. S = {i : βi = 0}, such as |S| p. The group-Lasso estimator Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06 K ˆgroup = arg min − β D (β) +λ wk β Gk . β∈Rp k=1 λ ≥ 0 controls the overall amount of penalty, wk > 0 adapts the penalty between groups (dropped hereafter). cooperative-Lasso 4

7. Toy example: the prostate dataset Examines the correlation between the prostate speciﬁc antigen and 8 clinical measures for 97 patients. svi lweight lcavol lcavol log(cancer volume) lweight log(prostate weight) age age coeﬃcients lbph log(benign prostatic hyperplasia amount) svi seminal vesicle invasion lcp log(capsular penetration) lbph gleason gleason Gleason score pgg45 age pgg45 percentage Gleason scores 4 or 5 lcp -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 lambda (log scale) Figure: Lasso cooperative-Lasso 5

8. Toy example: the prostate dataset Examines the correlation between the prostate speciﬁc antigen and 8 clinical measures for 97 patients. 600 age 500 lcavol log(cancer volume) 400 lweight log(prostate weight) age age 300 Height lbph log(benign prostatic pgg45 200 hyperplasia amount) svi seminal vesicle invasion 100 lcp log(capsular penetration) 0 gleason Gleason score lweight gleason pgg45 percentage Gleason scores 4 lbph lcavol svi lcp or 5 Figure: hierarchical clustering cooperative-Lasso 5

9. Toy example: the prostate dataset Examines the correlation between the prostate speciﬁc antigen and 8 clinical measures for 97 patients. svi lweight lcavol lcavol log(cancer volume) lweight log(prostate weight) age age coeﬃcients lbph log(benign prostatic hyperplasia amount) svi seminal vesicle invasion lcp log(capsular penetration) lbph gleason gleason Gleason score pgg45 age pgg45 percentage Gleason scores 4 or 5 lcp -3 -2 -1 0 lambda (log scale) Figure: group-Lasso cooperative-Lasso 5

10. Toy example: the prostate dataset Examines the correlation between the prostate speciﬁc antigen and 8 clinical measures for 97 patients. svi lweight lcavol lcavol log(cancer volume) lweight log(prostate weight) age age coeﬃcients lbph log(benign prostatic hyperplasia amount) svi seminal vesicle invasion lcp log(capsular penetration) lbph gleason gleason Gleason score pgg45 age pgg45 percentage Gleason scores 4 or 5 lcp -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 lambda (log scale) Figure: Lasso cooperative-Lasso 5

11. Application to splice site detection Predict splice site status (0/1) by a sequence of 7 bases and their interactions. 2 1.5 order 0: 7 factors with 4 levels, Information content order 1: C7 factors with 42 levels, 2 1 order 2: C7 factors with 43 levels, 3 using dummy coding for factor, 0.5 we form groups. 0 1 2 3 4 5 6 7 8 9 Position L. Meier, S. van de Geer, P. B¨hlmann, 2008. u The group-Lasso for logistic regression, JRSS series B. cooperative-Lasso 6

12. Application to splice site detection Predict splice site status (0/1) by a sequence of 7 bases and their interactions. order 0 g49 g45 g61 order 1 order 2 order 0: 7 factors with 4 levels, g44 g54 g42 order 1: C7 factors with 42 levels, 2 order 2: C7 factors with 43 levels, 3 using dummy coding for factor, g4 we form groups. g18 g5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 L. Meier, S. van de Geer, P. B¨hlmann, 2008. u The group-Lasso for logistic regression, JRSS series B. cooperative-Lasso 6

13. Group-Lasso limitations 1. Not a single zero should belong to a group with non-zeros Strong group sparsity (Huang and Zhang, ’10 arXiv) establish the conditions where the group-Lasso outperforms the Lasso, and conversely. 2. No sign-coherence within group Required if groups gather consonant variables e.g., groups deﬁned by clusters of positively correlated variables. The cooperative-Lasso A penalty which assumes a sign-coherent group structure, that is to say, groups which gather either non-positive, non-negative, or null parameters. cooperative-Lasso 7

14. Group-Lasso limitations 1. Not a single zero should belong to a group with non-zeros Strong group sparsity (Huang and Zhang, ’10 arXiv) establish the conditions where the group-Lasso outperforms the Lasso, and conversely. 2. No sign-coherence within group Required if groups gather consonant variables e.g., groups deﬁned by clusters of positively correlated variables. The cooperative-Lasso A penalty which assumes a sign-coherent group structure, that is to say, groups which gather either non-positive, non-negative, or null parameters. cooperative-Lasso 7

15. Motivation: multiple network inference experiment 1 experiment 2 experiment 3 inference inference inference A group is a set of corresponding edges across tasks (e.g., red or blue ones): sign-coherence matters! J. Chiquet, Y. Grandvalet, C. Ambroise, 2010. Inferring multiple graphical structures, Statistics and Computing. cooperative-Lasso 8

16. Motivation: joint segmentation of aCGH profiles 2  minimize β − y  ,   β∈Rp p  s.t   |βi − βi−1 | < s, i=1 1 where log-ratio (CNVs) y a vector in Rp , β a vector in Rp , 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9

17. Motivation: joint segmentation of aCGH profiles 2  minimize β − Y  ,  β∈Rn×p  p  s.t   βi − βi−1 < s, i=1 1 where log-ratio (CNVs) Y a n × p matrix with n profiles with size p. 0 βi a size-n vector with ith probes for the n profiles. a group gathers every position i -1 across profiles. Sign-coherence may avoid inconsistent variations across profiles. -2 0 50 100 150 200 position on chromosom K. Bleakley and J.-P. Vert, 2010. Joint segmentation of many aCGH profiles using fast group LARS, NIPS. cooperative-Lasso 9

23. Outline Deﬁnition Resolution Consistency Model selection Simulation studies Sibling probe sets and gene selection cooperative-Lasso 10

25. The cooperative-Lasso estimator Deﬁnition ˆcoop = arg min J(β), with J(β) = − β D (β) +λ β coop , β∈Rp where, for any v ∈ Rp , K + − v coop = v+ group + v − group = vGk + vGk , k=1 and + + v+ = (v1 , . . . , vp ), vj = max(0, vj ), + − + v− = (v1 , . . . , vp ), vj = max(0, −vj ). − cooperative-Lasso 12

26. A geometric view of sparsity minimize − (β1 , β2 ) + λΩ(β1 , β2 ) (β1 , β2 ) β1 ,β2 maximize (β1 , β2 ) β1 ,β2 s.t. Ω(β1 , β2 ) ≤ c β2 β1 cooperative-Lasso 13

27. A geometric view of sparsity minimize − (β1 , β2 ) + λΩ(β1 , β2 ) β1 ,β2 maximize (β1 , β2 ) β2 β1 ,β2 s.t. Ω(β1 , β2 ) ≤ c β1 cooperative-Lasso 13

28. Ball crafting: group-Lasso β4 = 0 β4 = 0.3 Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1 Unit ball β1 β1 β group ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1 cooperative-Lasso 14

32. Ball crafting: cooperative-Lasso β4 = 0 β4 = 0.3 Admissible set β = (β1 , β2 , β3 , β4 ) , 1 1 β2 = 0 G1 = {1, 2}, G2 = {3, 4}. β3 β3 −1 1 −1 1 −1 −1 Unit ball β1 β1 β coop ≤1 β2 = 0.3 1 1 β3 β3 −1 1 −1 1 −1 −1 β1 β1 cooperative-Lasso 15

37. Convex analysis Supporting Hyperplane An hyperplane supports a set iﬀ the set is contained in one half-space the set has at least one point on the hyperplane β2 β1 cooperative-Lasso 17

38. Convex analysis Supporting Hyperplane An hyperplane supports a set iﬀ the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β1 β1 cooperative-Lasso 17

41. Convex analysis Supporting Hyperplane An hyperplane supports a set iﬀ the set is contained in one half-space the set has at least one point on the hyperplane β2 β2 β2 β1 β1 β1 There are Supporting Hyperplane at all points of convex sets: Generalize tangents cooperative-Lasso 17

42. Convex analysis Dual Cone and subgradient Generalizes normals β2 β2 β2 β1 β1 β1 g is a subgradient at x the vector (g, −1) is normal to the supporting hyperplane at this point The subdiﬀerential at x is the set of all subgradient at x. cooperative-Lasso 18

46. Optimality conditions Theorem A necessary and sufficient condition for the optimality of β is that the null vector 0 belong to the subdifferential of the convex function J: 0 ∂β J(β) = {v ∈ Rp : v = − β (β) + λθ}, where θ ∈ Rp belongs to the subdifferential of the coop-norm. Define ϕj (v) = (sign(vj )v)+ , then θ is such as βj ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , θj = , ϕj (β Gk ) c ∀k ∈ {1, . . . , K} , ∀j ∈ Sk (β) , ϕj (θ Gk ) ≤ 1. We derive a subset algorithm to solve that problem (that you can enjoy in the paper and the package). cooperative-Lasso 19

49. Linear regression with orthonormal design Consider ˆ 1 2 β = arg min y − Xβ + λΩ(β) , β 2 ôls with X X = I. Hence, (xj ) (Xβ − y) = βj − β and ˆ 1 ôls β = arg min β (β − β ) + λΩ(β) . β 2 We may find a closed-form of β for, e.g., 1. Ω(β) = β lasso , 2. Ω(β) = β group , 3. Ω(β) = β coop . cooperative-Lasso 20

50. Linear regression with orthonormal design Consider ˆ 1 2 β = arg min y − Xβ + λΩ(β) , β 2 ôls with X X = I. Hence, (xj ) (Xβ − y) = βj − β and ˆ 1 ôls β = arg min β (β − β ) + λΩ(β) . β 2 We may find a closed-form of β for, e.g., 1. Ω(β) = β lasso , 2. Ω(β) = β group , 3. Ω(β) = β coop . cooperative-Lasso 20

51. Linear regression with orthonormal design ˆlasso β1 ∀j ∈ {1, . . . , p} ,  + ˆlasso λ  ôls βj = 1 − βj , ˆ β olsj + ˆlasso = βj ôls βj − λ . ôls β2 ôls β1 Fig.: Lasso as a function of the OLS coefficients cooperative-Lasso 20

52. Linear regression with orthonormal design ˆgroup β1 ∀k ∈ {1, . . . , K} , ∀j ∈ Gk ,  + ˆgroup = 1 − λ  ôls βj βj , βôls Gk + ˆgroup = β Gk ôls β Gk − λ . ôls β2 ôls β1 Fig.: Group-Lasso as a function of the OLS coefficients cooperative-Lasso 20

53. Linear regression with orthonormal design ˆcoop β1 ∀k ∈ {1, . . . , K} , ∀j ∈ Gk ,  + ˆcoop λ ôls βj = 1 − ols  βj , ˆ ϕ (β ) j Gk + ˆcoop ϕj (β Gk ) = ôls ϕj (β Gk ) − λ . ôls β2 ôls β1 Fig.: Coop-Lasso as a function of the OLS coefficients cooperative-Lasso 20

55. Linear regression setup Technical assumptions (A1) X and Y have ﬁnite fourth order moments 4 E X < ∞, E|Y |4 < ∞, (A2) the covariance matrix Ψ = EXX ∈ Rp×p is invertible, (A3) for every k = 1, . . . , K, if (β )+ > 0 and (β )− > 0 then for every j ∈ Gk β j = 0. (All sign-coherent groups are either included or excluded from the true support). cooperative-Lasso 22

56. Irrepresentability condition Define Sk = S ∩ Gk the support within a group and −1 [D(β)]jj = [sign(βj )β Gk ]+ . Assume there exists η > 0 such that (A4) For every group Gk including at least one null coefficient: max( (ΨSk S Ψ−1 D(β S )β S )+ , (ΨSk S Ψ−1 D(β S )β S )− ) ≤ 1 − η, c SS c SS (A5) For every group Gk intersecting the support and including either positive or negative coefficients, let νk be the sign of these coefficients (νk = 1 if (β Gk )+ > 0 and νk = −1 if (β Gk )− > 0): νk ΨSk S Ψ−1 D(β S )β S c SS 0, where denotes componentwise inequality. cooperative-Lasso 23

57. Consistency results Theorem If assumptions (A1-5) are satisﬁed and if there exists η > 0, then for every sequence λn such that λn = λ0 n−γ , γ ∈]0, 1/2[, ˆcoop −→ β β P ˆ and P(S(β coop ) = S) → 1. Asymptotically, the cooperative-Lasso is unbiased and enjoys exact support recovery (even when there are irrelevant variables within a group). cooperative-Lasso 24

58. Sketch of the proof ˜ 1. Construct an artifical estimator β S restricted to the true support S and extend it with 0 coefficients on S c . ˜ 2. Consider the event En on which β satisfies the original optimality coop ˜ conditions. On En , β = β ˆ ˆcoop and β c = 0, by uniqueness. S S S 3. We need to prove that limn→∞ P(En ) = 1. 4. Derive the asymptotic distribution of the derivative of the loss ˜ function X (y − Xβ) from TCL on second order moments, ˜ Optimality conditions on β S . Right choice of λn provides convergence in probability. 5. Assumptions (A4-5) assume that the limits in probability satisfy optimality constraints with strict inequalities. 6. As a result, optimility conditions are satisfied (with large inequalities) with probability tending to 1. cooperative-Lasso 25

62. Illustration 1.0 0.5 Generate data y = Xβ + σε, coeﬃcients β = (1, 1, −1, −1, 0, 0, 0, 0) G = {{1, 2}, {3, 4}, {5, 6}, {7, 8}} 0.0 σ = 0.1, R2 ≈ 0.99, n = 20, irrepresentability conditions -0.5 holds for the coop-Lasso, holds not for the group-Lasso. average over 100 simulations. -1.0 -3 -2 -1 0 1 log10 (λ) Fig.:: 50% coverage intervals (upper / lower quartiles) cooperative-Lasso 26

63. Illustration 1.0 0.5 Generate data y = Xβ + σε, coeﬃcients β = (1, 1, −1, −1, 0, 0, 0, 0) G = {{1, 2}, {3, 4}, {5, 6}, {7, 8}} 0.0 σ = 0.1, R2 ≈ 0.99, n = 20, irrepresentability conditions -0.5 holds for the coop-Lasso, holds not for the group-Lasso. average over 100 simulations. -1.0 -3 -2 -1 0 1 log10 (λ) Fig.:group-Lasso: 50% coverage intervals (upper / lower quartiles) cooperative-Lasso 26

64. Illustration 1.0 0.5 Generate data y = Xβ + σε, coeﬃcients β = (1, 1, −1, −1, 0, 0, 0, 0) G = {{1, 2}, {3, 4}, {5, 6}, {7, 8}} 0.0 σ = 0.1, R2 ≈ 0.99, n = 20, irrepresentability conditions -0.5 holds for the coop-Lasso, holds not for the group-Lasso. average over 100 simulations. -1.0 -3 -2 -1 0 1 log10 (λ) Fig.:coop-Lasso: 50% coverage intervals (upper / lower quartiles) cooperative-Lasso 26

66. Optimism of the training error The training error: 1 ˆ err = L(yi , xi β). |D| i∈D The test error (“extra-sample” error): ˆ Errex = EX,Y [L(Y, X β)|D]. The “in-sample” error 1 ˆ Errin = EY L(Yi , xi β)|D . |D| i∈D Deﬁnition (Optimism) Errin = err + ”optimism”. cooperative-Lasso 28

67. Optimism of the training error The training error: 1 ˆ err = L(yi , xi β). |D| i∈D The test error (“extra-sample” error): ˆ Errex = EX,Y [L(Y, X β)|D]. The “in-sample” error 1 ˆ Errin = EY L(Yi , xi β)|D . |D| i∈D Deﬁnition (Optimism) Errin = err + ”optimism”. cooperative-Lasso 28

68. Cp statistics For squared-error loss (and some other loss), 2 Errin = err + cov(î , yi ). y |D| i∈D The amount by which err underestimates the true error depends on how strongly yi affects its own prediction. The harder we fit the data, the greater the covariance will be thereby increasing the optimism (ESLII 5th print). Mallows’ Cp Statistic ˆ For a linear regression fit yi with p inputs i∈D cov(î , yi ) = pσ 2 : y df 2 Cp = err + 2 · ˆ σ , with df = p. |D| cooperative-Lasso 29

69. Cp statistics For squared-error loss (and some other loss), 2 Errin = err + cov(î , yi ). y |D| i∈D The amount by which err underestimates the true error depends on how strongly yi affects its own prediction. The harder we fit the data, the greater the covariance will be thereby increasing the optimism (ESLII 5th print). Mallows’ Cp Statistic ˆ For a linear regression fit yi with p inputs i∈D cov(î , yi ) = pσ 2 : y df 2 Cp = err + 2 · ˆ σ , with df = p. |D| cooperative-Lasso 29

70. Generalized degrees of freedom ˆ ˆ Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator. Proposition (Efron (’04)+ Stein’s Lemma (’81)) . 1 ˆ ∂ yλ df(λ) = 2 cov(ˆi (λ), yi ) = Ey tr y . σ ∂y i∈D For the Lasso, Zou et al. (’07) show that ˆlasso (λ) df lasso (λ) = β . 0 Assuming X X = I Yuan and Lin (’06) show for the group-Lasso that the trace term equals ˆgroup   K β Gk (λ) df group (λ) = ˆgroup 1 β Gk (λ) > 0 1 + (pk − 1) . k=1 β ols Gk cooperative-Lasso 30

73. Approximated degrees of freedom for the coop-Lasso Proposition Assuming that data are generated according to a linear regression model and that X is orthonormal, the following expression of df coop (λ) is an unbiased estimate of df(λ) +   K ˆcoop β Gk (λ) 1 + (pk − 1)   df coop (λ) = 1 + +  ˆcoop + β G (λ) >0 ˆols   k=1 k β Gk −   ˆcoop β Gk (λ) 1 + (pk − 1)   +1 − −  , ˆcoop − β G (λ) >0 β ols   k Gk where pk and pk are respectively the number of positive and negative + − ˆols entries in β (γ). Gk cooperative-Lasso 31

74. Approximated degrees of freedom for the coop-Lasso Proposition Assuming that data are generated according to a linear regression model and that X is orthonormal, the following expression of df coop (λ) is an unbiased estimate of df(λ) +   K ˆcoop β Gk (λ) k 1 + p+ − 1   df coop (λ) = 1 +  ˆcoop 1+γ + β G (λ) >0 ˆridge   k=1 k β Gk (γ) −   ˆcoop β Gk (λ) k 1 + p− − 1   +1 −  , ˆcoop 1+γ − β G (λ) >0 ˆridge   k β Gk (γ) where pk and pk are respectively the number of positive and negative + − entries in βˆridge (γ). Gk cooperative-Lasso 31

75. Approximated information criteria Following Zou et al, we extend the Cp stat to an “approximated” AIC y − y(λ) ˆ ˜ AIC(λ) = + 2df(λ), σ2 and from the AIC, there is (small) step to BIC: y − y(λ) ˆ ˜ BIC(λ) = + log(n)df(λ). σ2 The K–fold cross-validation works well but is computationally intensive. It is required when we do not meet the linear regression setup. . . cooperative-Lasso 32

77. Revisiting Elastic-Net experiments (1) q Generate data y = Xβ + σε, 70 q q q β = q q (0, . . . , 0, 2, . . . , 2, 0, . . . , 0, 2, . . . , 2) 60 q q q q q 10 10 10 10 50 G1 = {1, . . . , 10}, G2 = {11, . . . , 20}, MSE G3 = {21, . . . , 30}, 40 G4 = {31, . . . , 40}. σ = 15, corr(xi , xj ) = 0.5, 30 training/validation/test/ = 100/100/400, 20 q average over 100 simulations. 10 lasso enet group coop cooperative-Lasso 34

78. Revisiting Elastic-Net experiments (2) Generate data y = Xβ + σε, β = (3, . . . , 3, 0, . . . , 0) q 250 15 25 q q σ = 15, 200 G1 = {1, . . . , 5}, G2 = {6, . . . , 10}, q G3 = {11, . . . , 15}, 150 G4 = {16, . . . , 40}. MSE xj = Z1 + ε, Z1 ∼ N (0, 1), ∀j ∈ G1 100 q q q q q xj = Z3 + ε, Z2 ∼ N (0, 1), ∀j ∈ G2 q xj = Z3 + ε, Z3 ∼ N (0, 1), ∀j ∈ G3 50 xj ∼ N (0, 1), ∀j ∈ G4 . training/validation/test/ = 50/50/400, 0 lasso enet group coop average over 100 simulations. cooperative-Lasso 35

79. Breiman’s setup Simulations setting A wave-like vector of parameters β p = 90 variables partitioned into K = 10 groups of size pk = 9, 3 (partially) active groups, 6 groups of zeros, in active groups, β j ∝ (h − |5 − j|) with h = 1, . . . , 5. 0 20 40 60 80 Figure: β with h = 1, |Sk | = 1 non-zero coeﬃcients in each active group. cooperative-Lasso 36

84. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. Remark Covariance structure is purposely disconnected from the group structure. None of the support recovery conditions are fulﬁlled. cooperative-Lasso 37

85. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. One shot sample with n = 120 cooperative-Lasso 37

86. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. 0.6 0.5 0.4 0.4 0.3 ˆlasso ˆlasso 0.2 True signal 0.2 β β Estimated signal 0.1 0.0 0.0 -0.2 -0.1 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 log10 (λ) i Figure: Lasso cooperative-Lasso 37

87. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. 0.5 0.5 0.4 0.4 0.3 0.3 ˆgroup ˆgroup 0.2 True signal 0.2 β β 0.1 Estimated signal 0.1 0.0 0.0 -0.1 -0.1 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 log10 (λ) i Figure: Group-Lasso cooperative-Lasso 37

88. Breiman’s setup Example of path of solution and signal recovery with BIC choice The signal strength is generated so as y = Xβ + σ , with σ = 1, n = 30 to 500, X ∼ N (0, Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example), magnitude in β chosen so as R2 ≈ 0.75. 0.5 0.5 0.4 0.4 0.3 0.3 ˆcoop ˆcoop True signal 0.2 0.2 β β Estimated signal 0.1 0.1 0.0 0.0 -0.1 -0.1 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 log10 (λ) i Figure: Coop-Lasso cooperative-Lasso 37

89. Breiman’s setup Errors as a function of the sample size n 0.30 1.2 0.25 1.0 prediction error 0.20 0.8 sign error 0.15 0.6 0.10 0.4 0.05 0.2 0.00 0.0 100 200 300 400 500 100 200 300 400 500 n n Figure: h = 3, |Sk | = 5 (favoring Lasso). lasso group coop cooperative-Lasso 38

90. Breiman’s setup Errors as a function of the sample size n 0.30 1.2 0.25 1.0 prediction error 0.20 0.8 sign error 0.15 0.6 0.10 0.4 0.05 0.2 0.00 0.0 100 200 300 400 500 100 200 300 400 500 n n Figure: h = 4, |Sk | = 7 (intermediate). lasso group coop cooperative-Lasso 38

Cooperative-Lasso for sparse groups

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Cooperative-Lasso for sparse groups

Similaire à Cooperative-Lasso for sparse groups (20)

Cooperative-Lasso for sparse groups