Presentation of EMPOWERING project in the last Workshop of the IEA Annex 58
poster draft 2
1. A Statistical Analysis on the Nutritional Intakes of Secondary School Children
An assessment of the impact of revised school food standards
Adverse outcomes of obesity include cardiovascular disease,
many cancers, type II diabetes, strokes, high blood pressure,
osteoarthritis, fertility problems, reduced life expectancy,
depression, anxiety and low self-esteem.
In 2002, 21.8% of boys and 27.5% of girls aged 2-15 years
were overweight or obese. Furthermore, the direct cost of
obesity to the NHS was estimated at £46-49 million per year.
In response to Jamie Oliver’s Feed Me Better campaign in
2005, the Department for Education and Skills revised the
national school food standards.
513 schoolchildren from 2 time-points (2000 and 2009)
completed ‘food diaries.’ From these, nutritionists devised
each child’s mean daily intake and mean lunchtime intake
for each nutrient (energy, protein, fat etc.).
Aim: Assess impact of standards
Variables that affect food/nutrient intake are:
• YEAR: 2000 or 2009: since changes were made to school
food regulations in this time period.
• LUNCH TYPE: School lunch (SL) or packed lunch (PL): since
regulations applied to school lunches only.
• SEX: Male or Female: since boys eat more than girls.
However, the difference between sexes does not depend
on the new standards and so this effect is not of interest.
The mean lunchtime energy intake in kcal:
Inference: Average energy intake decreased substantially for
school lunches, but not a lot for packed lunches.
Problem: The 4 groups do not contain equal amounts of boys
and girls, and since sex affects energy intake, the year/lunch
effects are confounded with the sex effect which is not of
interest. Therefore the groups are not comparable.
Solution: Adjusted means.
2000 2009 Difference: 2009-2000
SL 711.9 495.9 -216.0
PL 612.3 574.2 -38.2
Method:
1. Fit a linear regression model to the data:
𝑌𝑖𝑗𝑘𝓁 = 𝜇 + 𝛼𝑖 + 𝛽𝑗 + (𝛼𝛽)𝑖𝑗+𝛾 𝑘 + 𝜖𝑖𝑗𝑘𝓁 ,
* If the p-value for the interaction is significant, year affects intake differently for each lunch type, so a two-way table is needed to
present means. If however it is not significant, the interaction complicates the presentation, yet does not add anything worthwhile.
Therefore, the model will be re-fitted without the interaction if not significant and one-way tables used.
† Choice of 𝑆𝑒𝑥 is arbitrary as it does not affect the differences, but one that produces plausible mean values is preferable, so that
practitioners without statistics backgrounds are not disconcerted.
Response for 𝓁 𝑡ℎ subject,
who was from 𝑖 𝑡ℎ year,
𝑗 𝑡ℎ lunch type and 𝑘 𝑡ℎ sex
Overall mean
Effect of
𝑖 𝑡ℎ year
Effect of 𝑗 𝑡ℎ
lunch type
Effect of (𝑖𝑗) 𝑡ℎ
combination of
year and lunch
type*
Effect of 𝑘 𝑡ℎ sex
- to be corrected for
Error of the
𝓁 𝑡ℎ individual
2. Estimate regression coefficients & obtain equation for the fitted mean of each group:
𝑌𝑖𝑗 = 734.5 − 219.7 𝐼 𝑌𝑒𝑎𝑟 = 2009 − 106.4 𝐼 𝐿𝑢𝑛𝑐ℎ = 𝑆𝐿 − 39.1 𝑆𝑒𝑥𝑖𝑗 + 186.1 𝐼[𝑌𝑒𝑎𝑟 = 2009 & 𝐿𝑢𝑛𝑐ℎ = 𝑆𝐿]
where 𝐼[𝐴] is an indicator variable that equals 1 if the event A is true and 0 otherwise,
and 𝑆𝑒𝑥𝑖𝑗 is the proportion of females in the group.
3. Fix sex variable at a constant arbitrary† value, say the mean sex value of the sample:
S𝑒𝑥 = 0.5185
4. Compute the mean for each group at this uniform sex value, instead of using 𝑆𝑒𝑥𝑖𝑗
2000 school lunch: 𝑌0,0 = 734.5 − 219.7 × 0 − 106.4 × 0 − 39.9𝑆𝑒𝑥 + 186.1 × 0 × 0 = 713.8
2000 packed lunch: 𝑌0,1 = 734.5 − 219.7 × 0 − 106.4 × 1 − 39.9𝑆𝑒𝑥 + 186.1 × 0 × 1 = 607.5
2009 school lunch: 𝑌1,0 = 734.5 − 219.7 × 1 − 106.4 × 0 − 39.9𝑆𝑒𝑥 + 186.1 × 1 × 0 = 494.1
2009 packed lunch: 𝑌1,1 = 734.5 − 219.7 × 1 − 106.4 × 1 − 39.9𝑆𝑒𝑥 + 186.1 × 1 × 1 = 573.9
5. The group means have been adjusted for sex imbalance so they are comparable! Inference can now be made
on the differences (estimable quantities). This is because the differences are independent of choice of 𝑆𝑒𝑥
(when one mean is subtracted from another), 𝑆𝑒𝑥 cancels out – so differences are unique!
A package called lsmeans can be downloaded in R, allowing efficient calculation of adjusted group means, for
lunchtime and daily intakes of all nutrients. This package, by default, uses 0.5 for the arbitrary fixed value of 𝑆𝑒𝑥.
Diagnostic checks must be performed for each model, to check for homoscedasticity (constant variance, by residual
plots) and Normality (by Normal probability plots) of the estimated residuals.
For most models, the plots are satisfactory. However, lunchtime and daily vitamin C intake have concerning Normal
probability plots. The obvious curvature means that Normality cannot be assumed.
Normal probability plots for lunchtime and daily vitamin C intake
Problem: Significance tests and
confidence intervals are invalidated.
Solution: Data transformation:
a transformation must not change
the order of values, but can alter
the distance between successive
points to modify the overall shape
of the distribution and achieve a
‘bell curve’.
The Box-Cox power transformation (1964) is the most commonly used tool to remedy the
breakdown of the Normality assumption. For some positive data 𝑌1,…, 𝑌𝑛, it is given by
𝑌𝑖
(𝜆)
=
𝑌𝑖
𝜆
− 1
𝜆
, 𝑖𝑓 𝜆 ≠ 0,
log 𝑌𝑖 , 𝑖𝑓 𝜆 = 0,
where the transformation parameter 𝜆 requires estimation.
For non-positive data, there is a two-parameter version, which allows for a shift before
transformation, given by
𝑌𝑖
(𝜆)
=
(𝑌𝑖 + 𝜆2) 𝜆1−1
𝜆1
, 𝑖𝑓 𝜆1 ≠ 0,
log 𝑌𝑖 + 𝜆2 , 𝑖𝑓 𝜆1 = 0,
where the transformation parameter 𝜆1 and the shift parameter 𝜆2 both require estimation.
The Box-Cox parameters are usually estimated by maximum likelihood and then rounded to
resemble a practical transformation (e.g. square root, cube root, inverse).
The lunchtime vitamin C data is non-positive (contains some zeroes), so the two-parameter
version is used, giving 𝜆1 = 0.4022 and 𝜆2 = 0.0015 to 4 d.p. Rounding gives a square-root
transformation (preceded by a shift of size zero). The daily vitamin C data is positive. The
standard version is therefore used, giving 𝜆 = 0.1997. Rounding gives a log transformation.
Problem: Transformed data will typically be on a scale that is unfamiliar to practitioners.
Solution: Use the inverse transformation to back-transform the results, so that they are put on
the original scale and made accessible to practitioners.
Fitting the model to the square-rooted data gives an acceptable fit to Normality. The adjusted
means for the square-rooted data are calculated, then squared to convert them back to the
original scale:
Problem: It has been made apparent already that data on the original scale violates the
Normality assumption, which is the reason a transformation was sought in the first place.
Confidence intervals for the difference therefore cannot be found. Valid conclusions can only be
drawn from data on the square-root scale, which makes the back-transformation redundant.
Solution: Use log-transformation.
2000 2009 Difference 95% CI of difference
Square-root scale 𝑌2000 𝑌2009 𝑑 = 𝑌2009 − 𝑌2000
𝑑 ± 𝑠. 𝑒. 𝑑
Original scale
𝑌2000
2
𝑌2009
2
𝑌2009
2
- 𝑌2000
2
SL PL Difference 95% CI of difference
Square-root
scale
𝑌𝑆𝐿 𝑌𝑃𝐿 𝑑 = 𝑌𝑃𝐿 − 𝑌𝑆𝐿
𝑑 ± 𝑠. 𝑒. 𝑑
Original scale
𝑌𝑆𝐿
2
𝑌𝑃𝐿
2
𝑌𝑃𝐿
2
− 𝑌𝑆𝐿
2
A useful quality of the log-transformation is that an intuitive interpretation is possible upon back-
transformation. This is owing to the relationship between the geometric and arithmetic means of some
general data 𝑌1, 𝑌2, … , 𝑌𝑛
𝐺𝑀(𝑌𝑖) =
𝑖=1
𝑛
𝑌𝑖
1
𝑛
= exp
1
𝑛
𝑖=1
𝑛
log 𝑌𝑖 = exp 𝐴𝑀(log(𝑌𝑖) ,
where 𝐺𝑀(. ) and 𝐴𝑀(. ) denote the geometric and arithmetic means respectively. Therefore,
𝐴𝑀(log(𝑌𝑖)) = log 𝐺𝑀 𝑌𝑖 .
Hence, the difference between two group (arithmetic) means (of logged data) is given by
log 𝐺𝑀 𝑌𝐺𝑅𝑂𝑈𝑃 𝐴 − log 𝐺𝑀 𝑌𝐺𝑅𝑂𝑈𝑃 𝐵 = log
𝐺𝑀 𝑌𝐺𝑅𝑂𝑈𝑃 𝐴
𝐺𝑀 𝑌𝐺𝑅𝑂𝑈𝑃 𝐵
.
Upon exponentiation, the ‘difference’ simply becomes the ratio of the geometric means. Then, due to the
asymmetry of the log-transformation, the confidence interval of this ratio can be found directly by anti-
logging the confidence interval of the difference.
Unlike the square-root transformation however, a log-transformation cannot be applied to the
zero observations. This is resolved by shifting the data, but a sensible constant must be
determined.
Whichever one minimizes the residual
skewness is a logical choice, since
Normality corresponds to zero residual
skewness. Minimal residual skewness is
achieved with a shift of approximately
15. The log-transformation can then be
performed on the shifted intakes.
After log-transformation, there is still an acceptable fit to Normality. So although the Box-Cox method
indicated square-root, the log-transformation also manages to Normalize the data quite well.
Inference: Vitamin C intake in 2009 was 1.12 times larger than in 2000 and packed lunches on average
contained 1.07 times as much as school lunches.
2000 2009 Difference/Ratio 95% CI of difference
Log-scale 3.638 3.756 PL – SL: 0.118 (0.023, 0.212)
Original scale 38.0 42.8 PL / SL: 1.12 (1.02, 1.24)
SL PL Difference/Ratio 95% CI of difference
Log scale 3.664 3.730 PL – SL: 0.0664 (-0.0299, 0.1626)
Original scale 39.0 41.7 PL / SL: 1.07 (0.97, 1.18)
Lunchtime intakes: Consumption of energy, sodium and saturated fat declined significantly in school lunch
children, but not in packed lunch children. Vitamin C intake increased reasonably over the years, but the
impact was the same in both lunch types.
Daily intakes: Daily consumption of all nutrients did not differ for school and packed lunch children.
Consumption of energy and sodium fell significantly, but there was no evidence to suggest the same for
saturated fat. Vitamin C increased quite reasonably.
Problem: Energy intake is a proxy for amount eaten.
Energy intake decreased over the years, meaning that
children ate less in 2009 than in 2000. What if the
decrease in sodium is simply due to the fact that
they ate less food overall?
Solution: Investigate sodium-density.
To investigate how heavily sodium depends on energy, energy is included as an explanatory variable:
𝑁𝑎 = 𝛼 + 𝛽𝐸 + 𝜖,
where 𝑁𝑎 = daily sodium intake, 𝐸 = daily energy intake, 𝜖 = error, with 𝜖~𝑁(0, 𝜎2
), and 𝛼 incorporates the
effects of all other covariates as well as the general mean.
This time, means are not only adjusted for sex, but also for energy intake.
Hence, even if a child’s energy intake was the same in each year, their average daily sodium intake will still
have decreased by over 170 mg, which is a relatively large amount, suggesting that a reasonable amount of
the Na reduction is not attributed to reduced energy intake. So there has been a reduction in sodium-density.
Overall conclusions: standards have had a positive impact on school children’s diets, particularly in terms of
energy, sodium and sodium density.
2000 2009 Difference 95% CI of difference
Daily Na intake 2497.5 2323.8 -173.7 (-254.2, -93.2)
1. The childhood obesity crisis
2. Revised school food standards – a
response to the rising obesity levels
3. Project objective: were the
standards successful?
4. Factors affecting food intake
5. Simple analysis of lunchtime energy
intake
6. Adjusted means
7. Lsmeans
8. Diagnostic checks
9. Box-Cox power transformation
10. Square-root transformation of lunchtime Vit C intake
11. Shifting the lunchtime Vit C intakes
12. Unique property of log transformation
13. Log transformation of shifted Vit C intake
14. Summary of results