This document discusses extracting variability from C code and lifting it to the mbeddr platform. It analyzes several large C codebases to understand how variability is implemented and which constructs can be lifted. The analysis finds that typical preprocessor constructs use identifiers, integers, and logical/comparison operations. Over 90% of symbols behave like constants that could be modeled as configuration parameters. While #error and #warning directives are present, they only make up a small portion of preprocessor statements. The results inform how to model variability in mbeddr's higher-level configuration language. A case study of the ChibiOS real-time OS module shows its use of preprocessor definitions and conditions.
Extracting Variability from C and Lifting it to Mbeddr
1. Federico Tomassetti, Daniel Ratiu
Contribution to Mbeddr
Image
Extracting variability from C and lifting it to mbeddr
2. 1. Variability in C
2. Variability in mbeddr
3. Analysis
4. Results
5. Case study
3.
4. The C preprocessor is evil
• It let you obfuscate everything, even
keywords
• Everything is in global scope
• What a module does, depends on the
context where it is included
• It operates at token level
• It makes the code very difficult
to analyze
5. The C preprocessor is evil
• It let you obfuscate everything, even
keywords
• Everything is in global scope
• What a module does, depends on the
context where it is included
• It operates at token level
• It makes the code very difficult
to analyze
6. The C preprocessor is evil
#define A #define B 50
#include «foo.h» #include «foo.h»
// foo.h
#ifdef A What foo.h declares
struct SomeStruct {
… depend on where it is
} included
#else
int b = B;
void foo();
#endif
What a module does, depends on the
context where it is included
7. The C preprocessor is evil
• It let you obfuscate everything, even
keywords
• Everything is in global scope
• What a module does, depends on the
context where it is included
• It operates at token level
• It makes the code very difficult
to analyze
8. The C preprocessor is evil
• It let you obfuscate everything, even
keywords
• Everything is in global scope
• What a module does, depends on the
context where it is included
• It operates at token level
• It makes the code very difficult
to analyze
9. It is an extensible variant of C built on top of a projectional
editor.
Existing extensions include:
• interfaces with pre- and postconditions,
• components,
• state machines,
• physical units,
• requirements tracing,
• product line variability.
16. Analysis
We analyzed:
• Linux
• Apache Openoffice
• Quake
• VLC
• Mozilla
For a total of circa 73K files and 30M LOCs.
We analyzed these projects to understand how variability
is used in C and what we can do for lifting it to mbeddr.
23. RQ1 Which are the typical building blocks in
presence conditions?
This is important in order to understand which kind
of expressions we need to support in the higher level
configuration language.
24. RQ1 Which are the typical building blocks in
presence conditions?
Kind of expressions Presence conditions
containing them
Identifier references 85-98 %
Logical operators 21-66 %
Number literals 6-16 %
Comparison operators 0-6 %
Others < 2%
25. RQ2 Which changes (re- #defines and #undefs) are
operated on a defined symbol?
Depending on changes upon defined symbols,
defines can be lifted (or not) as constant
configuration values.
26. RQ2 Which changes (re- #defines and #undefs) are
operated on a defined symbol?
We want constant to avoid this situation:
#define A 1
#if A>1 Same condition,
foo1();
one is included,
#endif
#define A 2
one is not
#if A>1
foo2();
#endif
27. RQ2 Which changes (re- #defines and #undefs) are
operated on a defined symbol?
Cases Range
Single definition
Multiple definitions to
the same value
Definitions under
different conditions
Total
28. RQ2 Which changes (re- #defines and #undefs) are
operated on a defined symbol?
#if VERS <= 2
#define A 1
#elif VERS == 3
#define A 2
#else
#define A 3
#endif
Definitions under different conditions
29. RQ2 Which changes (re- #defines and #undefs) are
operated on a defined symbol?
Cases Range
Single definition 69-90 %
Multiple definitions to 2-24 %
the same value
Definitions under 2-9 %
different conditions
Total 95-99 %
30. RQ3 Are #error and #warning used in practice?
If they are, it could be possible to extract feature
model constraints from them.
31. RQ3 Are #error and #warning used in practice?
They are present in 4 out of 5 projects but
they represent between 0 and 0.26% of the
preprocessor statements.
Linux contains more than 800
#error/#warning
Mozilla more than 700
32. Results
RQ1) Which are the typical building blocks in presence
conditions?
RQ2) Which changes (re- #defines and #undefs) are
operated on a defined symbol?
RQ3) Are #error and #warning used in practice?
33. Results
RQ1) Which are the typical building blocks in presence
conditions?
Identifiers, integers, logical and comparison operations
RQ2) Which changes (re- #defines and #undefs) are
operated on a defined symbol?
RQ3) Are #error and #warning used in practice?
34. Results
RQ1) Which are the typical building blocks in presence
conditions?
Identifiers, integers, logical and comparison operations
RQ2) Which changes (re- #defines and #undefs) are
operated on a defined symbol?
More than 90% of symbols behave like constants
RQ3) Are #error and #warning used in practice?
35. Results
RQ1) Which are the typical building blocks in presence
conditions?
Identifiers, integers, logical and comparison operations
RQ2) Which changes (re- #defines and #undefs) are
operated on a defined symbol?
More than 90% of symbols behave like constants
RQ3) Are #error and #warning used in practice?
Depends on the project
36. ChibiOS
ChibiOS is a real-time OS supporting 14 core architectures,
different compilers and platforms.
OS Kernel module Demos/ARMCM3-STM32F103ZG-FATFS
module
41 files
246 presence conditions Definitions for 31 of the 53 features
233 definitions 28 defined to TRUE/FALSE
54 symbols in presence conditions 1 has no value
2 symbols used in definitions of PC 1 has value 0
symbols 1 has value 20
53 symbols not defined in the module
(feat.)
3 defined in the module (derived feat.)