This document presents a new method called social-sparsity for brain decoding from fMRI data that provides faster computation times compared to other spatial sparsity methods. Social-sparsity uses a heuristic that forgoes couplings between neighboring voxels during soft-thresholding, allowing it to run 10x faster than total variation regularization and 3x faster than graph-net methods while maintaining or improving prediction accuracy. Evaluation on visual recognition tasks found social-sparsity produced brain maps that segmented relevant regions well with improved run times over other methods.
2. Brain decoding with linear models
Design
matrix
× Coefficients =
Coefficients are
brain maps
Target
G Varoquaux 2
3. Brain decoding with linear models
Design
matrix
× Coefficients =
Coefficients are
brain maps
Minimize the error:
l(y − Xw)
Target
G Varoquaux 2
4. Brain decoder maps and prediction accuracy
Face vs house visual recognition [Haxby... 2001]
SVM
error: 26%
G Varoquaux 3
5. Brain decoder maps and prediction accuracy
Face vs house visual recognition [Haxby... 2001]
Ridge
error: 15%
G Varoquaux 3
6. Brain decoder maps and prediction accuracy
Face vs house visual recognition [Haxby... 2001]
Sparse model
error: 19%
Which decoder predicts best?
How to get good decoder maps?
G Varoquaux 3
7. Sparse models
Ill-posed inverse problem
minw
l(y − Xw) + λ w 1
A priori:
A small fraction of the voxels are predictive
Sparse models to select relevant regions?
[Yamashita... 2008, Carroll... 2009]
G Varoquaux 4
8. Sparse models
Ill-posed inverse problem ⇒ regularization
minw
l(y − Xw) + λ w 1
sparsity inducing norm 1 norm,
or elastic net
Elastic Net
G Varoquaux 4
9. Sparse models
Ill-posed inverse problem ⇒ regularization
minw
l(y − Xw) + λ w 1
sparsity inducing norm 1 norm,
or elastic net
Elastic Net Can only select a
subset of relevant
voxels.
[Varoquaux... 2012]
G Varoquaux 4
10. Spatial sparse penalties
Spatial regularization, total variation
minw
l(y − Xw) + λ
i
( w)i 2
12 norm: 1 norm of the
gradient magnitudePenalize the image gradient:
Shrinks jointly x , y , and z
Elastic Net TV + 1
[Gramfort... 2013]
G Varoquaux 5
11. Spatial sparse penalties
Spatial regularization, total variation
minw
l(y − Xw) + λ
i
( w)i 2
More generally: analysis sparsity [Eickenberg... 2015]
Sparse in a transformation of the weights:
minw
l(y − Xw) + λ K w 21
For instance: overlapping blocks
(Kw)1 ↔ G1
(Kw)2 ↔ G2
... G2
G1
G Varoquaux 6
12. Good convergence of solvers is important
Spatial regularization, total variation
minw
l(y − Xw) + λ
i
( w)i 2
x=17
L R
z=-17
Stopping: ∆E < 10−1
x=17
L R
z=-17
Stopping: ∆E < 10−5
[Dohmatob... 2014]
G Varoquaux 7
13. Sparse solvers
Iterative Shrinkage-Thresholding Algorithm
minw
l(y − Xw) + λ
i
Kw 1
Settings: min l + p; l smooth, p non-smooth
Minimize successively: (quadratic approx of l) + p
1. Gradient descent on smooth term
FISTA loop
2. Proximal operator
proxpx = miny
1
2 x − y
2
2
+ p(y)
G Varoquaux 8
14. Sparse solvers
Iterative Shrinkage-Thresholding Algorithm
minw
l(y − Xw) + λ
i
w 1
Settings: min l + p; l smooth, p non-smooth
Minimize successively: (quadratic approx of l) + p
1. Gradient descent on smooth term
FISTA loop
2. Proximal operator
proxpx = miny
1
2 x − y
2
2
+ p(y)
1 penalty: “soft thresholding”:
prox 1
: ∀i wi ← wi
1 −
λ
|wi|
+
G Varoquaux 8
15. Sparse solvers: proximals and co.
1 penalty: “soft thresholding”:
prox 1
: ∀i wi ← wi
1 −
λ
|wi|
+
Group sparsity:
prox 21
on G:
∀i ∈ G wi ← wi
1−
λ
j∈G w2
j
+
G2
G1
G Varoquaux 9
16. Sparse solvers: proximals and co.
1 penalty: “soft thresholding”:
prox 1
: ∀i wi ← wi
1 −
λ
|wi|
+
Group sparsity:
prox 21
on G:
∀i ∈ G wi ← wi
1−
λ
j∈G w2
j
+
G2
G1
Overlapping groups, TV:
Inner loop iterative solver
G2
G1
G Varoquaux 9
17. Sparse solvers: proximals and co.
Group sparsity:
prox 21
on G:
∀i ∈ G wi ← wi
1−
λ
j∈G w2
j
+
G2
G1
Overlapping groups, TV:
Inner loop iterative solver
G2
G1
Social sparsity shrinkage:
∀i wi ← wi
1−
λ
j∈N(i) w2
j
+
N1
x1
N2
x2
G Varoquaux 9
18. Social sparsity: “soft-threshold” neighboring voxels
Sparsity must be combined with spatial structure
Convex solvers for non-local sparsity are expensive
Not separable
Social sparsity:
forget the coupling between
soft thresholding
[Kowalski... 2013]
N1
x1
N2
x2
G Varoquaux 10
19. Empirical evaluation for decoding
25% 10% 0% +10%
TVl1
graph
net
social
sparsity
SVM
+ anova
Prediction accuracy 1
20x 1
5x 1
2x 1x 2x 5x
Run time
bottle/scramble
bottle/shoe
cat/bottle
cat/chair
cat/face
cat/house
cat/scramble
cat/shoe
chair/scramble
chair/shoe
face/house
face/scissors
scissors/scramble
shoe/scramble
OASIS VBM
male vs femaleG Varoquaux 11
20. Empirical evaluation for decoding
25% 10% 0% +10%
TVl1
graph
net
social
sparsity
SVM
+ anova
Prediction accuracy 1
20x 1
5x 1
2x 1x 2x 5x
Run time
bottle/scramble
bottle/shoe
cat/bottle
cat/chair
cat/face
cat/house
cat/scramble
cat/shoe
chair/scramble
chair/shoe
face/house
face/scissors
scissors/scramble
shoe/scramble
OASIS VBM
male vs femaleG Varoquaux 11
21. Social sparsity maps
L R
z=16
y=34
face vs house
TV- 1 Graph-net Social sparsity
G Varoquaux 12
22. Social sparsity maps
L R
z=16
y=34
face vs house
TV- 1 Graph-net Social sparsity
G Varoquaux 12
23. @GaelVaroquaux
Social-sparsity brain decoders: faster spatial sparsity
Spatial sparsity improves prediction
and denoises maps
TV- 1 “space-net” very successful, but slow
Social-sparsity: heuristic that forgoes couplings
10× faster than TV- 1 almost as accurate
3× faster than graph-net more accurate
Maps segment well regions
ni
24. References I
M. K. Carroll, G. A. Cecchi, I. Rish, R. Garg, and A. R. Rao.
Prediction and interpretation of distributed neural activity with
sparse models. NeuroImage, 44(1):112 – 122, 2009.
E. Dohmatob, A. Gramfort, B. Thirion, and G. Varoquaux.
Benchmarking solvers for TV-l1 least-squares and logistic
regression in brain imaging. PRNI, 2014.
M. Eickenberg, E. Dohmatob, B. Thirion, and G. Varoquaux.
Total variation meets sparsity: statistical learning with
segmenting penalties. MICCAI, 2015.
A. Gramfort, B. Thirion, and G. Varoquaux. Identifying predictive
regions from fMRI with TV-L1 prior. In PRNI, pages 17–20,
2013.
J. Haxby, I. Gobbini, M. Furey, ... Distributed and overlapping
representations of faces and objects in ventral temporal cortex.
Science, 293:2425, 2001.
25. References II
M. Kowalski, K. Siedenburg, and M. Dorfler. Social sparsity!
neighborhood systems enrich structured shrinkage operators.
Transactions on Signal Processing, 61:2498, 2013.
G. Varoquaux, A. Gramfort, and B. Thirion. Small-sample brain
mapping: sparse recovery on spatially correlated designs with
randomization and clustering. In ICML, page 1375, 2012.
O. Yamashita, M. aki Sato, T. Yoshioka, F. Tong, and
Y. Kamitani. Sparse estimation automatically selects voxels
relevant for the decoding of fMRI activity patterns. NeuroImage,
42(4):1414 – 1429, 2008.