Medical imaging involves high-dimensional data, yet their acquisition is obtained for limited samples. Multivariate predictive models have become popular in the last decades to fit some external variables from imaging data, and standard algorithms yield point estimates of the model parameters. It is however challenging to attribute confidence to
these parameter estimates, which makes solutions hardly trustworthy.
In this talk, I will present a new algorithm that assesses parameters statistical significance and that can scale even when the number of predictors p ≥ 10^5 is much higher than the number of samples n ≤ 10^3 , by leveraging structure among features. Our algorithm combines three main ingredients: a powerful inference procedure for linear models –the so-called Desparsified Lasso– feature clustering and an ensembling step. We first establish that Desparsified Lasso alone cannot handle n << p regimes; then we demonstrate that the combination of clustering and ensembling provides an accurate solution, whose specificity is controlled. We also demonstrate stability improvements on two brain imaging datasets.
Statistical inference in high-dimension & application to brain imaging
1. 03/04/2019 Bertrand Thirion 1
Statistical inference in
high-dimension & application to
brain imaging
Imaging and machine learning workshop
Bertrand Thirion,
bertrand.thirion@inria.fr
2. 03/04/2019 Bertrand Thirion 2
Cognitive neuroscience
How are cognitive activities affected or controlled
by neural circuits in the brain ?
3. 03/04/2019 Bertrand Thirion 3
The brain, the mind and the
scanner
Cognitive theories
Brain
Scanner
FMRI data
Brain
mapping
Experimental
paradigm
stimuli
4. 03/04/2019 Bertrand Thirion 4
The brain, the mind and the
scanner
Cognitive theories
Brain
Scanner
FMRI data
Decoding
Encoding
Brain
mapping
Experimental
paradigm
stimuli
5. 03/04/2019 Bertrand Thirion 5
Encoding: mapping cognitive
functions to brain activity
right hand-
left hand
listen-read
Button press -
reading
Computation -
instructions
expression
- intention
Guess the
gender
Face
trustworthiness
false belief –
mechanistic
(auditory) False belief –
mechanistic
(visual)
Grasping-
orientation
judgement
Hand – side
judgement
Intention -
random
Sentence -
checkerboards
saccades -
fixation
Speech-non
speech
6. 03/04/2019 Bertrand Thirion 6
Resolution increases
2007:
3 mm
2014:
1.5 mm
2021:
0.5 mm ?
p = 50,000 p = 400,000 p = 107
7. 03/04/2019 Bertrand Thirion 7
better estimators for large-scale
brain imaging
●
A causal framework for brain activity decoding
●
Dimension reduction for images
●
Fast regularized ensembles of models
●
Statistical inference for high-dimensional models
19. 03/04/2019 Bertrand Thirion 19
Outline
●
A causal framework for brain activity decoding
●
Dimension reduction for images
●
Fast regularized ensembles of Models
●
Statistical inference for high-dimensional
models
20. 03/04/2019 Bertrand Thirion 20
Compression in the image
domain
●
Reduce the complexity of learning algorithms:
p→k ≪ p
●
Random projections = fast generic solution, but
– Sub-optimal for structured signals
– Not invertible when p and k are large
●
Local redundancy → feature grouping
strategies / clustering: “super-pixels”
– Fast clustering procedures needed (large-k regime)
22. 03/04/2019 Bertrand Thirion 22
Crafting good image compression
●
Key assumption: signal of interest L-Lipschitz
●
Feature grouping matrix
●
almost trivially:
●
Worst case
Need a fast method to learn balanced clusters
24. 03/04/2019 Bertrand Thirion 24
Recursive neighbor Agglomeration
Based on local decisions = fast (linear time) – avoid percolation
[Thirion et al. Stamlins 2015, Hoyos Idrobo PAMI 2018]
ReNA
25. 03/04/2019 Bertrand Thirion 25
Effect on data analysis tasks
Impressive speed-up and increased accuracy with
respect to non-compressed representation
– Clustering has a denoising effect
[Hoyos Idrobo IEEE PAMI 2018]
26. 03/04/2019 Bertrand Thirion 26
Outline
●
A causal framework for brain activity decoding
●
Dimension reduction for images
●
Fast regularized ensembles of Models
●
Statistical inference for high-dimensional
models
27. 03/04/2019 Bertrand Thirion 27
Bagging of clustered models
X
Clustering
(create
contiguous
regions)
y
Solve
regression
on cluster-
based
representation
average
...
various
clusterings
28. 03/04/2019 Bertrand Thirion 28
Computationally efficient structure
State of the art
solution: not
very stable, but
cheap
“fast regularized ensembles of models”
37. 03/04/2019 Bertrand Thirion 37
Large p → need dimension reduction
Large p kills statistical power CDL tames variance
p=2000, n=100
[Chevalier et al. subm. To MICCAI]
38. 03/04/2019 Bertrand Thirion 38
Adaptation to brain imaging
Step 1: compression by clustering
Step 2: inference on compressed representations
Step 3: ensembling iterate with different parcellations
→ aggregate p-values (see also FReM)
Clustered
Desparsified
Lasso
Ensemble of
Clustered
Desparsified
Lasso
46. 03/04/2019 Bertrand Thirion 46
Experiments: PR and FWER control
Better PR with ECDL + More accurate FWER control
[Chevalier et al. MICCAI 2018]
47. 03/04/2019 Bertrand Thirion 47
Effects on real data
[Nguyen et al. IPMI 2019, Chevalier et al. MICCAI 2018]
Social cognition
Visual feature discrimination
Language vs maths
HCP dataset, n=900
48. 03/04/2019 Bertrand Thirion 48
Conclusion
●
Causal reasoning →
conditional association analysis
●
Large-p data bring challenges:
– Computation cost
– Difficulty of statistical inference
●
Solutions: ensembling,
subsampling, compression
●
Efficient stochastic regularizers
●
Ongoing comparison with
knockoff
WIP
●
Classification setting
●
Use of bootstrap
[Nguyen et al. IPMI 2019] [Aydore et al. subm]
49. 03/04/2019 Bertrand Thirion 49
From good ideas to good practices:
software
●
Machine learning in Python
●
Machine learning for neuroimaging
http://nilearn.github.io
●
BSD, Python, OSS
– Classification of (neuroimaging) data
– Network analysis
50. 03/04/2019 Bertrand Thirion 50
Acknowledgements
Parietal
G. Varoquaux,
A. Gramfort,
P. Ciuciu,
D. Wassermann,
D. Engemann,
B. Nguyen
A.L. Grilo Pinho,
E. Dohmatob,
A. Mensch,
J.A. Chevalier,
A. Hoyos idrobo,
D. Bzdok,
J. Dockès,
P. Cerda,
C. Lazarus
D. La Rocca
G. Lemaitre
L. El Gueddari
O. Grisel
M. Massias
P. Ablin
H. Janati
J. Massich
K. Dadi
H. Richard
C. Petitot
Other collaborators
R. Poldrack,
J. Haxby
C. F. Gorgolevski
J. Salmon
S. Arlot
M. Lerasle