preprint: https://hal.inria.fr/hal-03720273
presented at SPLC 2022 research track
Linux kernels are used in a wide variety of appliances, many of them having strong requirements on the kernel size due to constraints such as limited memory or instant boot. With more than nine thousands of configuration options to choose from, developers and users of Linux actually spend significant effort to document, understand, and eventually tune (combinations of) options for meeting a kernel size. In this paper, we describe a large-scale endeavour automating this task and predicting a given Linux kernel binary size out of unmeasured configurations. We first experiment that state-of-theart solutions specifically made for configurable systems such as performance-influence models cannot cope with that number of options, suggesting that software product line techniques may need to be adapted to such huge configuration spaces. We then show that tree-based feature selection can learn a model achieving low prediction errors over a reduced set of options. The resulting model, trained on 95 854 kernel configurations, is fast to compute, simple to interpret and even outperforms the accuracy of learning without feature selection.
4. Dimensionality reduction with feature selection
Huge configuration space ≈106000
configurations
Large option/feature* set: 9K+ options for x86_64
Hypothesis: only a
subset of options
matter when
predicting
properties of
variants
4
*options (~Linux features) are encoded as features (~predictive variables in learning problems)
5. Dimensionality reduction with feature selection
Huge configuration space ≈106000
configurations
Large option/feature set: 9K+ options for x86_64
Hypothesis: only a subset of
options matter when predicting
properties of variants.
Very few studies at this scale
p options p’ options with p’ << p
n configurations
5
6. Hypothesis: Only a subset of options matter when predicting
properties of variants. Key results:
● Some state-of-the-art solutions are not scaling
due to “too many feature interactions” (think
about combinatorial with thousands of features!)
● Only ~300 features* (instead of 9K+) are
sufficient to efficiently predict and even
outperforms the accuracy of “learning over all
features/options”
● Training time can be decreased
● Identification of influential options is
consistent with, and can even improve, the
expert knowledge about Linux kernel
configuration.
6
*options (~Linux features) are encoded as features (~predictive
variables in learning problems)
11. Challenge: you cannot build ≈106000
configurations; sampling and
learning to the rescue but…
Is it accurate? Is it effective with p’ features and feature selection?
How many features*? Which options* matter?
7.1Mb
176.8Mb
?
11
p’ options with p’ << p
*options (~Linux features) are encoded as features (~predictive
variables in learning problems)
12. A challenging case
● Targeted non-functional, quantitative
property: binary size
○ interest for maintainers/users of the Linux
kernel (embedded systems, cloud, etc.)
○ challenging to predict (cross-cutting
options, interplay with compilers/build
systems, etc.)
● Dataset: version 4.13.3, x86_64 arch,
measurements of 95K+ random
configurations
○ paranoiac about deep variability since
2017: Docker to control the build
environment and scale
○ build: 8 minutes on average
○ diversity: from 7Mb to 1.9Gb 12
13. TUXML: Sampling, Measuring, Learning
13
Most of the work consider a relatively low number of options (<50) Linux has 9K+ options for x86_64
Feature subset selection vs recursive feature elimination: scale? accuracy?
*EX: execution, SI: simulation, SA: static analysis, UF: user feedback, SM: synthetic measurements.
14. TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
14
15. TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python3 kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
15
16. Data: version 4.13.3 (x86_64)
95K+ configurations for Linux 4.13.3
(and 15K hours of computation on a grid computing)
16
17. RQ1: How do SOTA
techniques perform on
huge configuration spaces?
● Linear-based algorithms : high error rate (it’s not additive!)
● Polynomial regression & performance-influence model : Out Of Memory (too
much interactions and not designed for 9K+ options)
● Tree-based algorithms & neural networks: low error rate
Mean Absolute Percentage Error
(MAPE): the lower the better
17
N : percentage of the
dataset used to training
18. Dimensionality reduction with feature selection
Huge configuration space ≈106000
configurations
Large options/feature set: 9K+ options for x86_64
Only a subset of options matter when
predicting properties of variants.
RQ2: How accurate is the prediction
model with and without feature selection?
p options p’ options with p’ << p
n configurations
18
19. Dimensionality reduction with Tree-based feature selection
Tree-based algorithm
(Random Forest)
p=8.743 options
Learn on
Full dataset
p’ <<<<< p options
Reduced dataset
Filter
Any learning algorithm
Learn on
DEBUG_INFO (0.33)
active_options (0.19)
group_129 (0.14)
DEBUG_INFO_REDUCED (0.11)
DEBUG_INFO_SPLIT (0.08)
feature ranking
list
(based on
feature
importance)
19
20. RQ2: Tree-based Feature Selection pays off!
● Tree-based algorithms & neural
networks:
○ Lower error rate
○ Lower training time
■ Random forest : 18x
■ Gradient Boosting Tree : 5x
● Simpler models, easier to train,
and improved accuracy
● Bonus: interpretable and
consistent with domain
knowledge
20
21. RQ2: Optimal number of features/options when performing
feature selection
● Depending on algorithm
○ Gradient Boosting Trees &
Neural networks : 1500
● Depending on training set size
● Random forest : 250 options
Sweet spot where only ~300
features are sufficient to efficiently
train a Random Forest and a
Gradient Boosting Tree to obtain a
prediction model that outperforms
other baselines operating over the
full set of features (6% prediction
errors for 40K configurations). 21
22. RQ3+4: Stability of influential options and Training time
reduction
Using an ensemble of Random
Forest allows the creation of a far
more stable list, with more than 95%
common features in top 300 between
multiple list
Tree-based feature selection speeds
the model training at least 5 times
up to 48 times (since p’ <<<< p)
22
23. RQ5: How do feature ranking lists, as computed by tree-based
feature selection, relate to Linux knowledge? Top
influential
options
147 documented
options in Kconfig
0 - 50 7
50 - 250 6
250 - 500 6
500 - 1500 28
1500 - 69
Top 50 options in the feature ranking list represents 95% of the feature
importance; collinearity and interpretability: beware!
Incompleteness of Linux documentation:
● Vast majority of influential options is either not documented or not
referring to size: only 7 options of the top 50 are documented as
having a clear influence on size
● Leveraging all the 147 options in the Linux documentation (and
only them) leads to prediction error of 23.6% (instead of <6% for
our feature ranking list)
Relevance: Investigations and exchanges with domain experts confirm
the relevance of the top 50, giving 6 categories of options.
Effective identification of important features:
● consistent with Linux knowledge (Kconfig documentation and
expert insight)
● can be used to refine or augment the incomplete
documentation of the Linux kernel.
23
24. Kaggle competition using our dataset
https://www.kaggle.com/competitions/linux-kernel-size/overview
24
We can benefit from contributions of the
machine learning community…
And our dataset/problems are raising interests.
25. Conclusion Feature subset selection is effective over
the huge configuration space of Linux:
● only ~300 features out of 9K+
● accuracy is better with than without tree-based
● training time is decreased
● interpretability: identification of influential options is consistent with, and can
even improve, the expert knowledge about Linux kernel configuration
Future work
● Replication on different versions of Linux
● Does feature ranking list transfer to other versions?
https://www.kaggle.com/competitions/linux-kernel-size/overview
25
28. Decision Tree
● Ability to handle interactions between features
● Low impact of combinatorial explosion
● Competitive accuracy
● Interpretability
○ Decision rules
○ Feature importance
● Ensembles : Random Forests, Gradient Boosting Trees...
○ More accurate, less interpretable
28
29. Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Works for many kernel versions and any configuration x86_64
Error : ≃ 6.3%
97% of the predictions are below 20% error
H. Martin, M. Acher, J. A. Pereira, L. Lesoil, J. Jézéquel and D. E. Khelladi, “Transfer learning across variants and versions: The
case of linux kernel size” Transactions on Software Engineering (TSE), 2021 29
30. Published at IEEE Transactions on Software
Engineering (TSE) in 2021
Preprint: https://hal.inria.fr/hal-03358817
30
33. Transfer learning
“Inductive transfer refers to any algorithmic process by which structure or
knowledge derived from a learning problem is used to enhance learning on a
related problem.” - Jeremy West in A theoretical foundation for inductive transfer
● 100.000 configuration measurements, 15.000 hours of computation
● Mission Impossible : Saving Private Model 4.13
○ Budget : 5.000 configurations measurements (one night worth of ISTIC computing power)
33
53. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
53
54. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Source + Shifting Model = Full Model
Incremental Model Shifting
54
55. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.15 + Shifting Model 4.20 = Model 4.20
Source + Shifting Model = Full Model
9
Incremental Model Shifting
55
56. Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.15 + Shifting Model 4.20 = Model 4.20
Model 4.20 + Shifting Model 5.0 = Model 5.0
Model 5.0 + Shifting Model 5.4 = Model 5.4
Model 5.4 + Shifting Model 5.7 = Model 5.7
Model 5.7 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
9
Incremental Model Shifting
56
58. Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
10
58
59. Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
10
59
60. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
10
60
61. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
10
61
62. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
10
62
63. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
10
63
64. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
Budget : 10.000 configurations
● Model shifting :
○ From 6.2% to 6.7% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 6.1% to 6.7%
10
64
65. Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 1.000 configurations
● Model shifting :
○ From 8.5% to 11.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 8.5% to 13.8%
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
Budget : 10.000 configurations
● Model shifting :
○ From 6.2% to 6.7% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 6.1% to 6.7%
10
65
66. Summary
● Model 4.13 is saved
○ Positively reuse old model on new version at lower cost
○ Better than learning from scratch for years
● Incremental Shifting
○ More sensible to previous models error
○ Better use of more transfer budget
11
66
67. Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Error : ≃ 6.3%
97% of the predictions are below 20% error
12
67