SlideShare a Scribd company logo
1 of 39
Download to read offline
The Analysis of Variance
Tokelo Khalema
University of the Free State
Bloemfontein
November 02, 2012
2
CHAPTER 1
ANALYSIS OF VARIANCE
1.1 Introduction
The analysis of variance (commonly abbreviated as ANOVA or AOV) is a
method of investigating the variability of means between subsamples resulting
from some experiment. In its most basic form it is a multi-sample generaliza-
tion of the t-test but more complex ANOVA models depart greatly from the
two-sample t-test. Analysis of variance was first introduced in the context of
agricultural investigations by Sir Ronald A. Fisher (1890–1962), but is now
commonly used in almost all areas involving scientific research.
1.2 One-way Classification
1.2.1 Normal Theory
Suppose we carry out an experiment on N homogeneous experimental units and
observe the following measurements, y1, y2, . . . , yN . Suppose also that of the N
observations, J were randomly selected to be taken under the same experimen-
tal conditions and that overall, we had I different experimental conditions. We
shall refer to these experimental conditions as treatments. The treatments
could be any categorical or quantitative variable — species, racial group, level
of caloric intake, dietary regime, blood group, genotype, etc. We therefore see
that we could subdivide the N variates into I groups (or treatments), under each
of which there are J observations. Such an experimental design in which the
number of observations or measurements per treatment are the same is termed
a balanced design. A design need not be balanced.
Denote the jth observation under the ith treatment by yij where i = 1, . . . , I
and j = 1, . . . , J. Further, assume that Yij ∼ iidN(µi, σ2
) for all i and j. It
1
2 Chapter 1. Analysis of Variance
might be helpful to visualize the experimental points as forming an array whose
ith column represents the ith treatment and jth row represents the jth ob-
servations under all the treatments. Ordinarily, measurements taken on several
homogeneous experimental units under the same experimental conditions should
differ slightly due to some unexplained measurement errors. We assume these
measurement errors to be independent and normally distributed with mean zero
and constant but unknown variance, σ2
< ∞. That is ij ∼ iidN(0, σ2
) for all i
and j. The assumption of zero mean is natural rather than arbitrary because,
on average, any deviation from the mean in any population should average out
to zero. In the analysis of variance we are interested in the overall variability
of the µi about the grand population mean µ. This implies a fixed differential
effect αi = µi − µ (or deviation from the grand mean), due to treatment i. The
above arguments and assumptions lead us to the following linear model,
Yij = µ + αi + ij = µi + ij (1.1)
for i = 1, . . . , I and j = 1, . . . , J, which describes the underlying data-generating
process. It is easy to show that if αi, as in equation 1.1, is to be interpreted as
the differential effect of the ith treatment, then we have the following constraint,
I
i=1
αi = 0 . (1.2)
The constraint in equation 1.2 above is termed a model identification con-
dition. Without it the model we just formulated is said to be unidentifiable
1
. Different interpretations of the αi lead to different constraints and different
model parametrizations. In the sequel we shall stick to the parametrization
above. Equation 1.1 is usually referred to as the one-way fixed effects model,
or Model I. One-way because the data are classified according to one factor, viz.,
treatment and the term “fixed” arises from the fact that we have assumed the
αi to be fixed instead of random, in which case we would have had a random
effects model or Model II. Later we introduce Model II and demonstrate how
it can be used in practice.
The null hypothesis in the analysis of variance model given in equation 1.1
is that the treatment means are all equal; the alternative is that at least one
pair of means is different. That is
H0 : µi = µj , ∀i = j (1.3)
HA : µk = µl for at least one combination of values k = l.
But since µi = µ + αi, we see that µi = µj where i = j implies that αi = 0 for
all i. This gives an equivalent form of the hypotheses given above, namely
H0 : αi = 0, ∀i ∈ {1, . . . , I} (1.4)
HA : αi = 0 for at least one i ∈ {1, . . . , I}.
1Identifiability is a desirable property of models. A model is called identifiable if all its
parameters can be uniquely estimated and inferences can be drawn from it.
1.2. ONE-WAY CLASSIFICATION 3
This formulation is more commonly met with and is arguably more intuitive
—in words the null hypothesis says that there are no differential effects due to
treatments. Or simply, that there are no treatment effects. So any apparent
differences in sample means is not attributable to the treatments but to random
selection. The alternative hypothesis says that there is at least one treatment
with a differential effect—the negation of the null hypothesis.
Before we present the mathematical derivations of the analysis of variance, let
us consider one practical example of an experiment which should be recognized
as a numerical example of the more general design outlined above. This ex-
ample was taken from a classic reference by Sokal and Rohlf (1968) [1]. Sokal
tested 25 females of each of three lines of Drosophila for significant differences
in fecundity among the three lines. The first of these lines was was selected for
resistance against DDT, the second for susceptibility to DDT, and the third was
a nonselected control strain. This is a balanced design with I = 3 treatments,
J = 25 observations per treatment, and should also be recognized as Model I
since the exact nature of the treatments was determined by the experimenter.
The data are summarized in table 1.1 in which the response is the number of
eggs laid per female per day for the first 14 days of life.
We might want to compute the treatment sample means as a preliminary
check on the heterogeneity among group means. Dataset drosophila contains
the data presented in table 1.1. In R we issue the following commands:
> library(khalema)
> data(drosophila)
> attach(drosophila)
> tapply(fecundity,line,mean)
1 2 3
25.256 23.628 33.372
The first three commands should be old news by now. The first loads pack-
age khalema, the second accesses dataset drosophila and, the third makes the
variables in drosophila available on the search path. The final command com-
putes the sample mean under each of the 3 treatments.
Note that the mean under the nonselected treatment is appreciably higher than
those under the other treatments. Of interest in the analysis of variance is
whether this difference is statistically significant or just a result of noise in the
data.
In deriving a test to investigate the significance of group sample mean dif-
ferences we will need some statistics and their corresponding sampling distri-
butions. Among these are the overall average and the average under the ith
treatment denoted,
¯Y.. =
I
i=1
J
j=1
Yij/N,
4 Chapter 1. Analysis of Variance
Resistant Susceptible Nonselected
12.8 38.4 35.4
21.6 32.9 27.4
14.8 48.5 19.3
23.1 20.9 41.8
34.6 11.6 20.3
19.7 22.3 37.6
22.6 30.2 36.9
29.6 33.4 37.3
16.4 26.7 28.2
20.3 39.0 23.4
29.3 12.8 33.7
14.9 14.6 29.2
27.3 12.2 41.7
22.4 23.1 22.6
27.5 29.4 40.4
20.3 16.0 34.4
38.7 20.1 30.4
26.4 23.3 14.9
23.7 22.9 51.8
26.1 22.5 33.8
29.5 15.1 37.9
38.6 31.0 29.5
44.4 16.9 42.4
23.2 16.1 36.6
23.6 10.8 47.4
Table 1.1: Number of eggs laid per female per day for the 1st
14 days of life.
and
¯Yi. =
J
j=1
Yij/J,
respectively. Recall that N = IJ is the total number of observations. We define
the following statistic which should be interpreted as summarizing the total
variability in the sample,
SST =
I
i=1
J
j=1
(Yij − ¯Y..)2
.
This is called the total sum of squares. But the total variability in a sample
can be partitioned into variability within treatments and variability between
treatments. In fact, it can easily be shown that
SST = SSB + SSW (1.5)
1.2. ONE-WAY CLASSIFICATION 5
where
SSB = J
I
i=1
( ¯Yi. − ¯Y..)2
and
SSW =
I
i=1
J
j=1
(Yij − ¯Yi.)2
denote the sum of squares between and the sum of squares within treatments
respectively. The statistic SSB summarizes variation in the sample attributable
to treatment; SSW summarizes variation attributable to error and is sometimes
written SSE. Note that under the assumption of homoscedastic variance, each
of the I terms,
J
j=1
(Yij − ¯Yi.)2
/(J − 1),
furnishes an estimate of the error variance, σ2
. It is thus reasonable to estimate
σ2
by pooling these terms together to obtain the pooled estimate of the common
variance,
s2
p =
1
I(J − 1)
I
i=1
J
j=1
(Yij − ¯Yi.)2
=
SSW
I(J − 1)
.
The reader will recall that if Yi ∼ iidN(µ, σ2
) for i = 1, . . . , n then,
(n − 1)S2
/σ2
∼ χ2
n−1, (1.6)
where
S2
=
n
i=1
(Yi − ¯Y )2
/(n − 1)
denotes the sample variance and
¯Y =
n
i=1
Yi/n
the sample mean. This now familiar result will be an important template in
proving the following theorem.
6 Chapter 1. Analysis of Variance
Theorem 1. Under the assumption that the random errors, ij ∼ iidN(0, σ2
),
for i = 1, . . . , I and j = 1, . . . , J, we have the following results:
1. SST/σ2
=
I
i=1
J
j=1
(Yij − ¯Y..)2
/σ2
∼ χ2
N−1, if H0 : αi = 0 ∀i is true,
2. SSW/σ2
=
I
i=1
J
j=1
(Yij − ¯Yi.)2
/σ2
∼ χ2
I(J−1), whether or not H0 is true,
3. SSB/σ2
=
I
i=1
J
j=1
( ¯Yi. − ¯Y..)2
/σ2
∼ χ2
I−1, if H0 : αi = 0 ∀i is true, and
4. SSW/σ2
and SSB/σ2
are independently distributed.
Proof. To prove the first part of the theorem we note that if H0 is true, then
we have a common mean µ under each treatment and thus Yij ∼ iidN(µ, σ2
) for
i = 1, . . . , I and j = 1, . . . , J. Accordingly,
I
i=1
J
j=1
(Yij − ¯Y..)2
/(N − 1)
denotes the sample variance of a sample of size N = IJ from a N(µ, σ2
) popu-
lation, hence using the result given in equation 1.6 concludes the proof.
For the second part we note that,
J
j=1
(Yij − ¯Yi.)2
/(J − 1)
denotes the sample variance of the ith treatment, hence, whether or not H0 is
true,
J
j=1
(Yij − ¯Yi.)2
/σ2
∼ χ2
J−1 independently for all i = 1, . . . , I.
Summing all I of these terms and using the property of the sum of independent
Chi-square random variables yields the stated result.
Further, if H0 is true, the third part results from the subtraction property of
the Chi-square distribution. Lastly, to proof the independence of...
In addition to the statistics we have defined thus far, it is customary to
define the mean square due to treatment and the mean square due to error as,
MSB = SSB/(I − 1) and,
MSW = SSW/I(J − 1),
1.2. ONE-WAY CLASSIFICATION 7
respectively. We are now in a position to derive a test for the hypotheses
H0 : αi = 0, ∀i ∈ {1, . . . , I}
versus
HA : αi = 0 for at least one i ∈ {1, . . . , I}.
In the following theorem we use the statistics defined above and their sampling
distributions to derive the generalized likelihood ratio test for H0 and HA.
Theorem 2. The generalized likelihood ratio test statistic for testing the null
hypothesis of no treatment effects as in equation 1.4 is given by:
F =
MSB
MSW
,
and H0 is rejected at 100(1 − α)% if F > F1−α
I−1,I(J−1).
Proof. Recall from our earlier discussion that in addition to some distributional
assumptions we assumed the following:
Yij = µ + αi + ij,
where the restriction
I
i=1
αi = 0
is imposed on the αi. It follows then that, for i = 1, . . . , I and j = 1, . . . , J,
f(yij) =
1
σ
√
2π
exp −
1
2
yij − µ − αi
σ
2
From independence of the yij we have the following likelihood,
L(µ, αi, σ2
|y) = (2πσ2
)−IJ/2
exp



−
1
2σ2
I
i=1
J
j=1
(Yij − µ − αi)2



(1.7)
and log-likelihood
l = log L = −
IJ
2
log(2πσ2
) −
1
2σ2
I
i=1
J
j=1
(Yij − µ − αi)2
Under the alternative hypothesis we have the following parameter space,
Ω = {(µ, αi, σ2
)| − ∞ < µ, αi < ∞, σ2
> 0}.
8 Chapter 1. Analysis of Variance
Differentiating the log-likelihood with respect to µ and equating the derivative
to zero gives,
∂l
∂µ
=
1
σ2
I
i=1
J
j=1
(Yij − µ − αi) = 0,
which implies that
ˆµΩ = ¯Y..
Once again we differentiate with respect to αi to obtain,
∂l
∂αi
=
1
σ2
J
j=1
(Yij − µ − αi) = 0.
This yields
ˆαiΩ
= ¯Yi. − ¯Y..
Finally we differentiate with respect to σ2
and proceed just as we did above.
We have,
∂l
∂σ2
= −
IJ
2σ2
+
1
2σ4
I
i=1
J
j=1
(Yij − µ − αi)2
= 0,
which gives the following MLE,
ˆσ2
Ω = N−1
I
i=1
J
j=1
(Yij − ¯Yi.)2
Substituting these estimates into equation 1.7 we have the following likelihood
supremum under H1,
sup
Ω
L(µ, αi, σ2
|y) = exp −
IJ
2
·



2π
IJ
I
i=1
J
j=1
(Yij − ¯Yi.)2



−IJ/2
.
Under the null hypothesis we have one less parameter since the αi are hypoth-
esised to be zero. The parameter space is,
ω = {(µ, σ2
)| − ∞ < µ < ∞, σ2
> 0}.
In this case we maximize the following log-likelihood,
l = log L = −
IJ
2
log(2πσ2
) −
1
2σ2
I
i=1
J
j=1
(Yij − µ)2
.
It is left to the reader to show that the parameter estimates in this case are,
ˆµω = ¯Y..
1.2. ONE-WAY CLASSIFICATION 9
and
ˆσ2
ω = N−1
I
i=1
J
j=1
(Yij − ¯Y..)2
The likelihood supremum is then given by,
sup
ω
L(µ, σ2
|y) = exp −
IJ
2
·



2π
IJ
I
i=1
J
j=1
(Yij − ¯Y..)2



−IJ/2
.
After some cancellation and the use of the identity we established earlier, the
generalized likelihood ratio test statistic takes the following form,
Λ =
sup
ω
L
sup
Ω
L
=







I
i=1
J
j=1
(Yij − ¯Y..)2
I
i=1
J
j=1
(Yij − ¯Yi.)2







−N/2
=







I
i=1
J
j=1
(Yij − ¯Yi.)2
+ J
I
i=1
( ¯Yi. − ¯Y..)2
I
i=1
J
j=1
(Yij − ¯Yi.)2







−N/2
.
The generalized likelihood ratio test rejects H0 for small values of Λ and we
see that small values of Λ correspond to large values of SSB/SSW . That is we
reject H0 if
SSB
SSW
> k
or if
F =
SSB/(I − 1)
SSW/I(J − 1)
=
MSB
MSW
> k
I(J − 1)
I − 1
= c
where c is chosen such that Pr(F > c|H0) = α, the desired type I error. But we
have already derived the null distribution of F from which we have,
c = F1−α
I−1,I(J−1)
or the 100(1−α) percentile of the F-distribution with I −1 and I(J −1) degrees
of freedom. This completes of the proof.
The reader who closely followed the foregoing proof should have been aware that
the likelihood ratio test statistic would not have been arrived at had the iden-
tification condition not been taken into account. We see then that inferences
10 Chapter 1. Analysis of Variance
cannot be drawn from an unidentifiable model. In fact, this is what unidentifi-
able means in statistical literature. Cassella & Berger (1992) [2] touch lightly
on model identification.
For obvious reasons, the test just derived is called the F-test. We will proceed
to demonstrate how it can be applied in practice.
Example 1. Consider the data presented earlier in table 1.1. It is vital to test
for any significant violations of model assumptions before we draw inferences.
First let us test the validity of the constant variance assumption. Figure 1.1
affords a visual check on the group variances. There is not much reason to
believe that the constant variance assumption could be unduly flawed. The
distributions also look reasonably symmetrical, hence normal theory could be
applied safely.
Resistant Susceptible Nonselected
10
15
20
25
30
35
40
45
50
Response
Figure 1.1: Side-by-side boxplots for the Drosophila fecundity data.
We proceed with the analysis and calculate the sum of squares, mean squares,
and the F-statistic. In R the command to fit the linear model is:
> lm(fecundity~line,drosophila)
And the command,
> anova(lm(fecundity~line,drosophila))
1.2. ONE-WAY CLASSIFICATION 11
Source of variation df SS MS F p-value
Between 2 1362.2 681.11 8.6657 0.0004
Within 72 5659.0 78.60
Total 74 7021.2
Table 1.2: Anova table for the Drosophila fecundity data.
gives the anova table. An anova table compactly summarizes the results of an
F-test.
From the table above, the F-statistic is significant at a level of 5%. Say the
p-value was not reported, as would be the case if one were not using a computer.
Then we would refer to the F table in the appendix, approximate F2,72(.97) by
F2,62(.97) and report
p-value = Pr(F ≥ 8.6657) < Pr(F ≥ 3.15) = 5%.
But before we run into conclusions we test the validity of the distributional
assumption of the random errors. To estimate these, we plug in the MLE’s of
µ and αi into equation 1.1 to obtain,
ˆij = Yij − ¯Y.. − ¯Yi. + ¯Y.. = Yij − ¯Yi.
for i = 1, . . . , I and j = 1, . . . , J. These are termed model residuals. By virtue
of the invariance property of maximum likelihood estimates, ˆij furnishes a
maximum likelihood estimate of ij. We are interested in testing whether these
residuals can be considered as Gaussian white noise. But recall that maximum
likelihood estimates are asymptotically normal. To obtain the residuals in R we
issue the command below:
> Residuals <- lm(fecundity~line,drosophila)$residuals
but this is only one of several ways to obtain model residuals in R. A look at
figure 1.2 shows that the residuals are not far from normal. In particular, the
histogram shows a sense of symmetry about zero. Hence we can safely read the
anova table and conclude that the F-test conclusively rejects the null hypothesis
of no treatment effects. In ordinary parlance this means that of the I = 3 lines,
at least one was much more or much less fecund than the rest. Figure 1.1 reveals
that the nonselected line had much more fecundity than the resistant and the
susceptible lines.
At this point we find it worthwhile to interpolate some comments on the
assumptions underlying the analysis of variance which should always be borne
in mind each time an analysis of variance is carried out. We assume that in the
model given in equation 1.1, we have,
1. normally distributed random errors ij,
12 Chapter 1. Analysis of Variance
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 −1 0 1 2
−20−1001020
Theoretical Quantiles
OrderedResiduals
Residuals
Frequency
−20 0 10 20
051015
Figure 1.2: A histogram and a normal quantile-quantile plot of the model resid-
uals.
2. constant (or homoscedastic) error variance σ2
, and
3. independent random errors.
The assumption of normality is not a particularly stringent one. The F-test
has been shown to be robust against mild to moderate departures from normal-
ity, especially if the distribution is not saliently skewed. Several good tests of
normality exist in the literature. The Shapiro-Wilk test is one of those most
commonly used in practice. Its R directive is shapiro.test() and its null
hypothesis is that the sample comes from a normal parent distribution. Apply-
ing this test on the residuals from our previous example we obtain a p-value of
0.45. So the Shapiro-Wilk test conclusively accepts the hypothesis of normally
distributed random errors. You will recall from example 1 that we were quite
content with the validity of the normality assumption from the qq-plot and the
histogram created therein. In examples to follow, we shall stick to the same
diagnostic procedure with the hope that any undue departures from normality
will be noticed by the naked eye and not bother ourselves with carrying out the
normality test.
The problem of heteroscedasticity (or nonconstant variance) has slightly dif-
ferent implications depending on whether a design is balanced or otherwise. In
the former case, slightly lower p-values than actual ones will be reported; in the
1.2. ONE-WAY CLASSIFICATION 13
latter, higher or lower p-values than actual ones will be reported according as
large σ2
i are associated with large ni, or large σ2
i are associated with small ni
(see Miller (1997) [3] pp. 89-91).
While there will usually be remedies to non-normality and heteroscedastic vari-
ance, dependence of errors will usually not be amenable to any alternative
method available to the investigator, at least if it is in the form of serial corre-
lation. Dependence due to blocking, on the other hand, can easily be handled
by adding an extra parameter to the model to represent the presence of block-
ing. We will see later how blocking can purposely be introduced to optimize an
experimental plan. It has been shown (see...) that if there is serial correlation
within (rather than across) samples, then the significance level of the F-test
will be smaller or larger than desired according as the correlation is negative or
positive. The presence of serial correlation of lag 1 can be detected by visually
inspecting plots of variate pairs (yij, yi,j+1). The hope should be not to spot
any apparent linear relationship between the lagged pairs if the F-test is to be
employed.
Outliers can also be a nuisance in applying the F-test. Since the sample mean
and variance are not robust against outliers, such outlying observations can
greatly augment the within-group mean square which in turn would render the
F−test conservative2
. Usually no transformation will remedy the situation of
outlying observations. One option to deal with outliers would be to use the
trimmed mean in the calculation of the sum of squares. Another is the use of
nonparametric methods. We discuss nonparametric methods in section 1.2.3.
Usually for a design to yield observations that have all three of the charac-
teristics enumerated above, the experimenter should ensure random allocation
of treatments. That is, experimental units must be allocated at random to the
treatments. Randomization is very critical in all of experimental design. It also
makes possible the calculation of unbiased estimates of the treatment effects.
One important concept that has thus far only received brief mention is that
of unbalanced designs. If in stead of the same number J of replicates under
each treatment we suppose that we have ni observations under treatment i,
where the ni need not be equal, then it can easily be shown that the identity in
equation 1.5 becomes
I
i=1
ni
j=1
(Yij − ¯Y..)2
=
I
i=1
ni( ¯Yi. − ¯Y..)2
+
I
i=1
ni
j=1
(Yij − ¯Yi.)2
.
Otherwise the analysis remains the same as in the balanced design and an
analogous F-test can be derived. The next example, adapted from Snedecor
& Cochran (1980) [4], illustrates points we made in the last few paragraphs
including the possibility of an unbalanced design.
Example 2. For five regions in the United States in 1977, public school ex-
penditures per pupil per state were recorded. The data are shown in table 1.3.
2A conservative test is “reluctant” to reject—i.e. it has a smaller type I error than desired.
14 Chapter 1. Analysis of Variance
South North Mountain
Northeast Southeast Central Central Pacific
1.33 1.66 1.16 1.74 1.76
1.26 1.37 1.07 1.78 1.75
2.33 1.21 1.25 1.39 1.60
2.10 1.21 1.11 1.28 1.69
1.44 1.19 1.15 1.88 1.42
1.55 1.48 1.15 1.27 1.60
1.89 1.19 1.16 1.67 1.56
1.88 1.26 1.40 1.24
1.86 1.30 1.51 1.45
1.99 1.74 1.35
1.53 1.16
Table 1.3: Public school expenditures per pupil per state (in $1 000).
Otherwise for R users the relevant data-frame is named pupil. The question
of interest is the same old one, namely, are the region to region expenditure
differences statistically significant or are they due to chance alone?
Figure 1.3 shows that the distribution cannot be judged to be very symmet-
rical, nor can we be overly optimistic about constant variance. Since overall,
there is not too much skewness, it is about the latter that we should be most
worried. No outliers are visible so there really is not much that calls normal
theory into question. The R command for creating the plot in figure 1.3 is
plot(expenditure~region,pupil).
We seek now for an appropriate variance stabilizing transformation. Since all
the values are nonnegative, we could try the log-transformation, or even the
square-root transformation. A plot of the log-transformed data is shown in
figure 1.4.
The log-transformed distribution does not look vaguely more symmetrical.
After a few trials, we finally take the reciprocal of the square of the observations,
which yields the plot depicted in figure 1.5.
This time the variance looks reasonably constant across treatments. A little
question mark over symmetry remains though. But there is not strong enough
skewness to warrant too much concern. To investigate this further, we create a
normal qqplot and a histogram of residuals. These are shown in figure 1.6 from
which we see a slight deviation from normality.
But earlier we pointed out that the F-test is not too sensitive to moderate
departures from normality. The anova table on the transformed response is ob-
tained by issuing the command, anova(lm(expenditure^-2~region,pupil))
in R, and is shown in table 1.4.
From table 1.4 we see a highly significant F-statistic. That is, strong evidence
suggests that expenditures vary from region to region.
1.2. ONE-WAY CLASSIFICATION 15
Northeast Southeast S. Central N. Central M. Pacific
1.2
1.4
1.6
1.8
2
2.2
Response
Figure 1.3: Side-by-side boxplots for the public school expenditures data.
Source of variation df SS MS F p-value
Between 4 0.78114 0.195285 11.62 0.0000
Within 43 0.72263 0.016805
Total 47 1.50377
Table 1.4: Anova table for the expenditures per pupil per state data.
1.2.2 Multiple Comparisons
Despite all its merits, the omnibus F-test is not without deficiencies of its own.
From the previous example we concluded that expenditures varied from region to
region. For all we know, such a conclusion could have been reached because only
one of the regions had a sample mean much greater or less than the rest. Usually
we would be interested in knowing which pair of groups differ significantly. The
current section addresses this problem by introducing commonly used methods
of multiple comparisons that can be used in lieu of the omnibus F-test, or
after the F-test has rejected the null hypothesis. It was shown earlier that two
treatment means, µi and µi , can be concluded to be different at level α if the
16 Chapter 1. Analysis of Variance
Northeast Southeast S. Central N. Central M. Pacific
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Log−transformedResponse
Figure 1.4: Side-by-side boxplots for the log-transformed data.
100(1 − α)% confidence interval for their difference,
¯Yi. − ¯Yi . ± tν,1−α/2sp
1
ni
+
1
ni
, (1.8)
does not contain zero, or equivalently, if
| ¯Yi. − ¯Yi .| > tν,1−α/2sp
1
ni
+
1
ni
.
If all k = I
2 intervals are to be considered as a family, the statement given by
equation 1.8 above does not hold with probability 1 − α; the coverage proba-
bility, or as commonly called, the family-wise rate (FWR), will be lower. For
the special case of ni = ni = J, one commonly used remedial measure was
developed by John Tukey. He showed that the variate,
max
i,i
|( ¯Yi. − µi) − ( ¯Yi . − µi )|
sp/
√
J
,
follows the so-called Tukey studentized range distribution with parameters I and
I(J − 1), where the pooled sample variance s2
p equals the mean square of error.
If we denote the 100(1 − α) percentile of this distribution by qI,I(J−1)(α), then
we have the following probability statement,
1.2. ONE-WAY CLASSIFICATION 17
Northeast Southeast S. Central N. Central M. Pacific
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Reciprocal−of−square−transfomedResponse
Figure 1.5: Side-by-side boxplots for the reciprocal-of-square-transformed data.
Pr max
i,i
|( ¯Yi. − µi) − ( ¯Yi . − µi )| ≤ qI,I(J−1)(α)sp/
√
J = 1 − α, (1.9)
from which we obtain the following family of confidence intervals of the differ-
ences µi − µi ,
¯Yi. − ¯Yi . ± qI,I(J−1)(α)sp/
√
J,
with family-wise error rate exactly equal to α. Accordingly, any pair of treat-
ment sample means will be significantly different at level α if
| ¯Yi. − ¯Yi .| > qI,I(J−1)(α)sp/
√
J.
Methods to deal with unbalanced designs have also been devised. One
method that gives very good results despite its crudity is due to Bonferroni.
From the Bonferroni equality, it can be shown that to ensure a family-wise er-
ror rate of at most α, then each of the k tests of µi = µi should be carried
out at significance level α/k. Where N =
I
i=1 ni denotes the total number of
observations, we then have the following family of confidence intervals,
¯Yi. − ¯Yi . ± t
α/2k
N−I sp
1
ni
+
1
ni
, where k =
I
2
,
18 Chapter 1. Analysis of Variance
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−2 −1 0 1 2
−0.2−0.10.00.10.2
Theoretical Quantiles
OrderedResiduals
Residuals
Frequency
−0.3 −0.1 0.1 0.3
02468101214
Figure 1.6: A histogram and a normal quantile-quantile plot of the model resid-
uals.
which should have coverage probability of at least 1 − α. We call these the
Bonferroni confidence intervals. Let us consider an example.
Example 3. Since the previous example dealt with unequal sample sizes, we
employ the Bonferroni method to carry out multiple comparisons. We calculated
the pooled sample variance to be s2
p = MSE = .017. We also have k = 10
comparisons. According to Bonferroni’s method, a pair of sample means (of
sizes ni and ni ) that differ by an absolute amount greater than
.1296 × t43(.9975) ×
1
ni
+
1
ni
,
will be considered significantly different at level α = .05. Following are R
commands to compute and compactly display absolute differences of all possible
combinations of sample means in a 5 by 5 array.
> data(pupil)
> attach(pupil)
> X <- Y <- tapply(expenditure^-2,region,mean)
> diff <- abs(outer(X,Y,"-"));diff
Consider for instance, the Mountain Pacific and North Central regions. The
absolute value of their sample means difference is 0.033, which is far less than
1.2. ONE-WAY CLASSIFICATION 19
the critical value of .164. In fact, the 99.75% confidence interval of the difference
of means can be shown to be (−0.130, 0.197) or (−0.197, 0.130), depending on
how the difference is taken. This interval obviously contains zero. So the two
regions’ levels of expenditure cannot be considered to be statistically different.
Next, let us consider the Northeast and South Central regions. Their sample
means differ by an absolute amount of 0.366, which exceeds the critical value
of .176. The corresponding confidence interval is (.190, .543) or (−.543, −.190).
The last 8 comparisons can be made similarly. The reader will find that, overall,
4 pairs are significantly different, viz., Northeast and South Central, Northeast
and Southeast, South Central and Mountain Pacific, and South Central and
North Central.
Other commonly used multiple comparison methods for unbalanced designs
include that due to Sheff´e and a variant of Tukey’s method which we discussed
earlier, called the Tukey-Kramer method. Both give conservative results, as does
the Bonferroni method. Because of their “conservatism”, one should consider
using Tukey’s method whenever a balanced design is dealt with, which should
give shorter confidence intervals. Sheff´e’s confidence intervals for the difference
µi − µi are given by,
¯Yi. − ¯Yi . ± sp (I − 1)Fα
I−1,N−I
1
ni
+
1
ni
,
where Fα
I−1,N−I denotes the 100(1 − α) percentile of the F-distribution with I
and N − I degrees of freedom. On applying Sheff´e’s method to the expenditure
data, it is striking to see that we reach similar conclusions as those reached in
example 3 above under Bonferroni’s method. But Sheff´e’s intervals are signifi-
cantly broader.
It is still not too clear whether the Tukey-Kramer method gives intervals with
coverage probability of at least 1−α or approximately 1−α. But it too gives re-
sults good enough to merit its mention. Confidence intervals under this method
are given by,
¯Yi. − ¯Yi . ± qI,N−I(α)sp
1
2
1
ni
+
1
ni
.
An abundance of other multiple comparison procedures have been proposed but
not all are good enough to enter the fray.
1.2.3 Nonparametric Methods
If the assumptions underlying the analysis of variance do not hold and no trans-
formation is available to make the F-test more applicable, nonparametric meth-
ods are often used in stead. The Kruskal-Wallis test is by far the most com-
monly used nonparametric analog of the one way analysis of variance. Unlike
the F-test, it makes no distributional assumptions about the observations; for
it to be applicable, the observations need only be independent.
20 Chapter 1. Analysis of Variance
In this method, we denote by Rij, the rank of yij in the combined sample of all
N =
I
i=1 ni observations. Then define
¯Ri. =
I
i=1
Rij/ni,
and
¯R.. =
I
i=1
¯Ri./N,
as the average rank score of the ith sample and the grand rank score, respec-
tively. Finally we compute the following statistic,
K =
12
N(N + 1)
I
i=1
ni( ¯Ri. − ¯R..)2
=
12
N(N + 1)
I
i=1
ni
¯R2
i. − 3(N + 1),
which has been shown to have a limiting χ2
distribution with I − 1 degrees of
freedom under the null hypothesis of equal location parameters under each of
the I groups. The null hypothesis is rejected for large values of K.
Just as in the two sample case, tied observations will be assigned average ranks.
The K-statistic defined above should perform reasonably well if there are not
too many ties. Otherwise some correction factor will have to be applied.
Example 4. Table 1.5 presents ranks of the expenditure data from example 2.
From these data we calculate a highly significant value of K = 21.83. The R
command to compute the p-value is, 1-pchisq(21.83,4).
It is well to realize the sum of squares occurring in the expression for the K-
statistic as the between-groups sum of squares in the analysis of variance. Then
the value of K can easily be calculated by performing the usual analysis of vari-
ance on the ranks and then multiplying the between-groups sum of squares by
12/N(N + 1).
The Kruskal-Wallis test has an implementation in R. However, it will usually
give a different value for K than that obtained from using the foregoing ex-
pression. This is because in calculating the statistic, R uses some weights that
will make the distribution of the K-statistic as χ2
as possible. Here are the R
commands and output for the previous example.
> kruskal.test(expenditure,region,data=pupil)
Kruskal-Wallis rank sum test
data: expenditure and region
Kruskal-Wallis chi-squared = 24.0387, df = 4, p-value = 7.846e-05
1.3. TWO-WAY CLASSIFICATION 21
South North Mountain
Northeast Southeast Central Central Pacific
18 33 6 36.5 39
13.5 20 1 40 38
47 9.5 12 21 31.5
46 9.5 2 16 35
24 8 3.5 42.5 23
29 26 3.5 15 31.5
44 8 6 34 30
42.5 13.5 22 11
41 17 27 25
45 36.5 19
28 6
Table 1.5: Ranks of the Public school expenditures data.
Since the Kruskal-Wallis test works with ranks rather than actual numerical
values of the observations, it will greatly eliminate the effect of outliers. In
practice, one will usually resort to this test if there are too many outliers in the
data, if normal theory is not applicable, or if the data are already in the form
of ranks.
1.3 Two-way Classification
1.3.1 Introduction
Up to this point we have assumed, at least tacitly, that the experiments we
deal with yield observations that can only be grouped according to one factor.
This need not be the case; several factors can be considered simultaneously.
For example, consider an experiment in which the amount of milk produced by
a hundred cows is studied. It is natural to consider breed and age-group as
possible factors in such a study. There could also be a third, and even a fourth
factor, etc., all of which are considered simultaneously. We introduce herein
methods of analyzing such experimental designs. We will only treat the case of
two factors in which case the design is called two-way analysis of variance , but
the reader should, however, be aware that the order of classification is abitrary.
In the general case we speak of N-way analysis of variance.
For the ease of reference we shall call the factors with which we deal, factor
A and factor B. It is also common in the literature to call these row and column
factors. It is natural then to speak of a treatment/column or row effect according
as the effect due to factor A or that due to factor B is referred to. Treatment
and row effects are also referred to as main efects to distinguish them from the
so-called interaction effect. We explain what interaction means shortly.
22 Chapter 1. Analysis of Variance
1.3.2 Normal Theory
The analysis in the two-way classification departs slightly from that in the one-
way classification as more variables come into play. In particular, the...occassions
the need to extend our notation from the previous sections. If we assume that
factor A has I levels and factor B has J levels and that in the cell determined
by level i of factor A and level j of factor B there are k observations (or repli-
cations), then we use yijk to symbolize the kth observation under such a cell.
If each of factors A and B contributes to the response variable an amount inde-
pendent of that contributed by the other, the model is termed as an additive
model and is formulated,
Yijk = µ + αi + βj + ijk, (1.10)
with identification conditions,
I
i=1
αi = 0,
and
J
j=1
βj = 0,
where i = 1, . . . , I and j = 1, . . . , J. Just as before, the random errors, ijk, are
assumed to be independently and identically normally distributed about zero
mean with constant variance σ2
.
If the contribution to the response variable by factor A depends on the
level of factor B, or conversely, then the simple additive model is not totally
representative of the design and a phenomenon called interaction is said to
exist. We introduce another variable, ij, that will represent this interaction
effect. Hence for example, 23 will be negative or positive according as factors
A and B have opposing or synergistic effects under level 2 of factor A and level
3 of factor B. This full model which takes interaction into account is given by,
Yijk = µ + αi + βj + ij + ijk, (1.11)
with identification conditions,
I
i=1
αi = 0,
J
j=1
βj = 0,
and
I
i=1
ij =
J
j=1
ij = 0,
1.3. TWO-WAY CLASSIFICATION 23
where i = 1, . . . , I and j = 1, . . . , J.
In addition to testing the significance of the main effects in two-way analysis
of variance (or any factorial anova for that matter), there is need to also test
for interaction effects. We thus have a total of three null hypotheses to test.
In dealing with many null hypotheses we will have reason to vary our usual
notation. Specifically, we superscript each null hypothesis with a naught to
avoid confusing HA for an alternative hypothesis, for instance. That is the no
main effects null hypotheses are denoted,
H0
A : αi = 0 ∀i ∈ {1, . . . , I},
and
H0
B : βj = 0 ∀j ∈ {1, . . . , J},
and the no interaction effect null hypothesis is written,
H0
I : ij = 0 for all combinations of i and j.
In anticipation of their need ahead, we give expressions for the sums of
squares, which are a little more involved than those in the one-way layout.
Also, some identities and statistics other than the sum of sqaures which will
provide tests of the hypotheses stated above will be derived just as we did in
the one-way layout.
The next theorem constructs a generalized likelihood ratio test for H0
A, H0
B,
and H0
I .
Theorem 3. The generalized likelihood ratio test statistics for testing the null
hypotheses of no main and interaction effects are given by:
1.
FA =
MSA
MSE
,
where H0
A is rejected at 100(1 − α)% if FA > F1−α
I−1,IJ(K−1),
2.
FB =
MSB
MSE
,
where H0
B is rejected at 100(1 − α)% if FB > F1−α
J−1,IJ(K−1), and
3.
FI =
MSI
MSE
,
where H0
I is rejected at 100(1 − α)% if FI > F1−α
(I−1)(J−1),IJ(K−1).
Proof. Since a complete proof to each part of the theorem can easily span two
and half pages, we will proof the first part and leave the last two to the reader.
We have for i = 1, . . . , I, j = 1, . . . , J, and k = 1, . . . , K,
f(yijk) =
1
σ
√
2π
exp −
1
2
yijk − µ − αi − βj − ij
σ
2
.
24 Chapter 1. Analysis of Variance
Thus the likelihood is given by
L(µ, αi, βj, ij, σ2
|y) =(2πσ2
)−IJK/2
×
exp



−
1
2
I
i=1
J
j=1
K
k=1
Yijk − µ − αi − βj − ij
σ
2



,
from the assumption of independence. For ease of maximization we use the
log-likelihood,
l = log L = −
IJK
2
log (2πσ2
) −
1
2σ2
I
i=1
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij)2
.
The parameter space under the general alternative hypothesis which states that
all effects are non-zero is denoted,
Ω = { (µ, αi, βj, ij, σ2
)| − ∞ < µ, αi, βj, ij < ∞, σ2
> 0 }.
Proceeding to find the ML estimates under Ω we have,
∂l
∂µ
=
1
σ2
I
i=1
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij) = 0,
which implies that ˆµΩ = ¯Y.... Similarly, it is easily verified that
∂l
∂αi
=
1
σ2
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij) = 0
implies ˆαiΩ = ¯Yi.. − ¯Y....
∂l
∂βj
=
1
σ2
I
i=1
K
k=1
(Yijk − µ − αi − βj − ij) = 0,
yields ˆβiΩ
= ¯Y.j. − ¯Y.... Likewise,
∂l
∂ ij
=
1
σ2
K
k=1
(Yijk − µ − αi − βj − ij) = 0
implies ˆijΩ
= ¯Yij. − ¯Yi.. − ¯Y.j. + ¯Y.... Finally
∂l
∂σ2
= −
IJK
2σ2
+
1
2σ4
I
i=1
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij)2
= 0
yields
ˆσ2
Ω = N−1
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2
.
1.3. TWO-WAY CLASSIFICATION 25
These give an expression for the supremum of the likelihood under Ω, namely
sup
Ω
L(µ, αi, βj, ij, σ2
) = exp −
IJK
2
·



2π
IJK
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2



−IJK/2
.
Under HA, the parameter space is given by
ωA = { (µ, βj, σ2
)| − ∞ < µ, βj < ∞, σ2
> 0 }.
Similar arguments give the following expression for the supremum of the likeli-
hood,
sup
ωA
L(µ, βj, ij, σ2
|y) = exp −
IJK
2
·



2π
IJK
I
i=1
J
j=1
K
k=1
(Yijk − ¯Y.j.)2



−IJK/2
Hence the generalized likelihood ratio is given by,
ΛA =
sup
ωA
L
sup
Ω
L
=







I
i=1
J
j=1
K
k=1
(Yijk − ¯Y.j.)2
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2







−IJK/2
=







I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2
+ JK
I
i=1
( ¯Yi.. − ¯Y...)2
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2







−IJK/2
The generalized likelihood ratio test then rejects H0
A for large values of SSA/SSW.
That is we reject H0
A if
SSA
SSW
> k,
or equivalently, if
FA =
SSA/(I − 1)
SSW/IJ(K − 1)
=
MSA
MSW
> k
IJ(K − 1)
I − 1
= c
where c is chosen such that Pr(F > c|H0
A) = α. From the distribution of
this F-statistic, which we derived earlier, it is immediately evident that c =
F1−α
I−1,IJ(K−1). This completes the proof to the first part of the theorem. By
noting that similar restrictions have been imposed on the βj as on the αi,
one will note that it is not necessary to construct the proof to part 2 ab initio.
26 Chapter 1. Analysis of Variance
However, he need only permute some subscripts and use the appropriate degrees
of freedom to complete the proof. But the reader who feels unsated with the
proposed logic should convince himself by going through all the steps. The proof
to the last part can be completed similarly to the proof just presented and is
left as an exercise.
Example 5. In an experiment to test 3 types of adhesive, 45 glass to glass
specimens were set up in 3 different types of assemblies and tested for tensile
strength. The types of adhesive were, 047, 00T, and 001 and the types of assem-
blies were cross-lap, square-center, and round-center. Each of the 45 entries of
table 1.6 represents the recorded tensile strength of the glass to glass assemblies
[data from Johnson and Leone [5]]. These data can be found under dataset
glass under this book’s package.
Glass-Glass Assembly
Adhesive Cross-Lap Square-Centre Round-Center
047 16 17 13
14 23 19
19 20 14
18 16 17
19 14 21
00T 23 24 24
18 20 21
21 12 25
20 21 29
21 17 24
001 27 14 17
28 26 18
14 14 13
26 28 16
17 27 18
Table 1.6: Table of bond strength of glass-glass assembly.
Figure 1.7 shows slight symmetry, no outliers, and not enough violation of the
constant variance assumption to warrant suspicion. Not the exact same can be
said about figure 1.8 which calls the constant variance assumption into question.
At least by now we know the risk entailed by blatantly ignoring such a clear
indication of heteroscedasticity. R commands to view both figures at the same
time follow.
> data(glass)
> par(mfrow=c(1,2))
> plot(strength~adhesive+assembly)
1.3. TWO-WAY CLASSIFICATION 27
Cross−lap Square−center Round−center
12
14
16
18
20
22
24
26
28
Response
Figure 1.7: Boxplots for the glass data plotted according to assembly type.
28 Chapter 1. Analysis of Variance
047 00T 001
12
14
16
18
20
22
24
26
28
Response
Figure 1.8: Boxplots for the glass data plotted according to adhesive type.
Fitting a model to the raw (i.e. untransformed) data gives significant results for
adhesives and interactions. But we might want to think twice before concluding
that these factors are indeed significant. To this end, we seek a transformation to
stabilize the variance. The square-root transformation seems to work reasonably
well for us, but it is seen to greatly upset normality. Boxplots for the transformed
data are not shown for purposes of space. A histogram and qqplot of residuals
are shown in figure 1.9.
The histogram shows a slightly ragged character, a long and fat left tail, and
lack of symmetry. The qqplot also shows gross departure from linearity. The
following set of commands will create figure 1.9.
> m2 <- lm(strength^.5~adhesive*assembly,glass)
> r <- m2$resid
> par(mfrow=c(1,2))
> qqnorm(r,ylab="Ordered Residuals",main="");qqline(r,col=2)
> hist(r,xlab="Residuals",main="")
The Shapiro-Wilk test of normality shows that we have not lost much; it gives
a p-value of 0.084, while the untransformed variable has a (slightly higher)
p-value of 0.158. We also know that the F-test is robust against departures
from normality. We therefore accept the square-root transformation as a good
compromise. Table 1.7 summarizes the results of fitting a linear model to the
1.3. TWO-WAY CLASSIFICATION 29
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 −1 0 1 2
−1.0−0.50.00.5
Theoretical Quantiles
OrderedResiduals
Residuals
Frequency
−1.0 0.0 0.5
02468
Figure 1.9: A histogram and a normal quantile-quantile plot of the model resid-
uals.
transformed variable. As you should have expected, the p-values are slightly
lower but the adhesives are still significant. The interactions on the other hand
are slightly short of the 5% significance level. We conclude that the type of
adhesive influences bond strength while the type of assembly does not.
Source of variation df SS MS F p-value
Adhesive 2 1.5682 0.78409 3.4548 0.0424
assembly 2 0.0749 0.03745 0.1650 0.84854
Interaction 4 2.3003 0.57509 2.5339 0.05699
Within 36 8.1704 0.22696
Total 44 12.1138
Table 1.7: Anova table for the glass to glass assembly data.
30 Chapter 1. Analysis of Variance
1.3.3 Multiple Comparisons
1.3.4 Randomized Complete Blocks
Randomized blocks are a form of unreplicated two-way analysis of variance in
which the two factors forming the design are the treatment and another factor
known to have an effect on the response under investigation. This second factor
is called a block. Each block is assigned all treatments at random in such a
way that within each block, each treatment appears once and only once. A
block effect is rarely tested in practice; of primary interest is the treatment
effect since the blocks are, by assumption, already expected to have an effect.
Randomized blocks were first developed for agricultural experiments and much
of the terminology has remained unchanged. The term “block” was traditionally
understood to refer to a block of land, but with the wide appreciation and
popularity of randomized complete blocks over the years, it is now used to refer
to any factor that plays an analogous role in more recent adaptations of such
experiments.
In a study to compare the effects of I fertilizers (or treatments in the more
general case) on the yield, J blocks of land are subdivided into I homogeneous
plots and the fertilizers are allocated at random to these plots. This is a classical
problem for which the method of randomized complete blocks was developed.
Other uses of this design can be found in several other fields.
The statistical model for randomized complete block design is,
Yij = µ + αi + βj + ij,
where
I
i=1
αi =
J
j=1
βj = 0.
The sums of squares are the same as those under the two-way additive model
but with K = 1. The null hypotheses of the no treatment and no block effects
are,
H0
A : αi = 0 ∀i ∈ {1, . . . , I},
and
H0
B : βj = 0 ∀j ∈ {1, . . . , J},
respectively. But remember that only the former is of interest. In the fertilizer
experiment presented above, an experimenter will hardly be as interested in
whether block A was the most productive as he would be in whether fertilizer
II yielded the most crop.
1.4. LATIN SQUARES 31
Theorem 4. The generalized likelihood ratio test statistics for testing the null
hypotheses of no treatment and block effects are given by:
1.
FA =
MSA
MSI
,
where H0
A is rejected at 100(1 − α)% if FA > F1−α
I−1,(I−1)(J−1), and
2.
FB =
MSB
MSI
,
where H0
B is rejected at 100(1 − α)% if FB > F1−α
J−1,(I−1)(J−1).
Proof. The details of this proof are left to the reader.
1.4 Latin Squares
Latin squares arise as natural extensions of randomized complete blocks—they
are a form of three-way analysis of variance without replication. If heterogeneity
is known to be two-dimensional in some investigation, then two blocking factors
can be incorporated in an unreplicated design, effectively forming a square with
N row blocks and N column blocks. We then speak of a row effect, a column
effect, and a treatment effect. But as in randomized blocks, it is only the latter
that will be of concern to the investigator. These designs have found wide
application in industry because of their optimality and impressive performance.
A prototype of a Latin square design is an experiment in which a fertilizer
(i.e. the treatment) is to be tested at N levels on a field that is known to vary
in intrinsic fertility, say, in a north-south direction and in soil depth, say, in
an east-west direction. The field is then subdivided to form an N × N array
of subplots and the fertilizers are randomly allocated to the subplots in both
directions in such a manner that all N levels of the treatment occur once and
only once in either direction.
Let τi denote the differential effect of the ith row block, βj the differential effect
of the jth column block and, γk the differential effect of the kth treatment.
Then the statistical model is
Yijk = µ + τi + βj + γk + ijk, (1.12)
where
N
i=1
τi =
N
j=1
βj =
N
k=1
γk = 0.
32 Chapter 1. Analysis of Variance
Theorem 5. If we assume that the random errors, ijk ∼ iidN(0, σ2
), for i =
1, . . . , N , j = 1, . . . , N, and k = 1, . . . , N, then we have the following results:
1. SST/σ2
=
N
i=1
N
j=1
N
k=1
(Yijk − ¯Y...)2
∼ χ2
N2−1
2. SSA/σ2
= N2
N
i=1
( ¯Yi.. − ¯Y...)2
/σ2
∼ χ2
N−1
3. SSB/σ2
= N2
N
j=1
( ¯Y.j. − ¯Y...)2
/σ2
∼ χ2
N−1
4. SSC/σ2
= N2
N
j=1
( ¯Y..k − ¯Y...)2
/σ2
∼ χ2
N−1
5. SSE/σ2
=
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2
∼ χ2
(N−1)(N−2)
6. The above variates are mutually independent.
Proof. We present the proof shortly....
Theorem 6. The generalized likelihood ratio test statistics for testing the null
hypotheses of no row, no column and no treatment effects are given by:
1.
FA =
MSA
MSE
,
where H0
A is rejected at 100(1 − α)% if FA > F1−α
N−1,(N−1)(N−2),
2.
FB =
MSB
MSE
,
where H0
B is rejected at 100(1 − α)% if FB > F1−α
N−1,(N−1)(N−2), and
3.
FC =
MSB
MSE
,
where H0
C is rejected at 100(1 − α)% if FC > F1−α
N−1,(N−1)(N−2).
Proof. From the statistical model given in equation 1.12 we have,
f(yijk) =
1
σ
√
2π
exp −
1
2
yijk − µ − τi − βj − γk
σ
2
.
1.4. LATIN SQUARES 33
The likelihood function takes the form,
L(µ, τi, βj, γk, σ2
|y) = (2πσ2
)−N3
/2
exp



−
1
2
N
i=1
N
j=1
N
k=1
Yijk − µ − τi − βj − γk
σ
2



.
Then we have,
l = log L = −
N3
2
log(2πσ2
) −
1
2σ2
N
i=1
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk)2
.
Under the hypothesis of no effects, we have the following parameter space.
Ω = {(µ, τi, βj, γk, σ2
)| − ∞ < µ, τi, βj, γk < ∞, σ2
> 0}
The maximum likelihood estimates are obtained in the usual way. From
∂l
∂µ
=
1
σ2
N
i=1
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk) = 0,
we have ˆµΩ = ¯Y....
∂l
∂τi
=
1
σ2
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk) = 0,
∂l
∂βj
=
1
σ2
N
i=1
N
k=1
(Yijk − µ − τi − βj − γk) = 0,
∂l
∂γk
=
1
σ2
N
i=1
N
j=1
(Yijk − µ − τi − βj − γk) = 0,
Finally,
∂l
∂σ2
= −
N3
2σ2
+
1
2σ4
N
i=1
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk)2
= 0,
gives,
ˆσ2
Ω = N−3
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2
.
Putting everything together we obtain,
sup
Ω
L = exp −
N3
2
·



2π
N3
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2



−N3
/2
.
34 Chapter 1. Analysis of Variance
The parameter space under the first null hypothesis HA is,
ωA = {(µ, βj, γk, σ2
)| − ∞ < µ, βj, γk < ∞, σ2
> 0}.
Similar arguments to those above give the following supremum under ωA,
sup
ωA
L = exp −
N3
2
·



2π
N3
N
i=1
N
j=1
N
k=1
(Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2



−N3
/2
.
ΛA =
sup
ωA
L
sup
Ω
L
=







N
i=1
N
j=1
N
k=1
(Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2







−N3
/2
It can be shown that the numerator sum of squares can be decomposed to give,







N
i=1
N
j=1
N
k=1
( ¯Yi.. − ¯Y...)2
+
N
i=1
N
j=1
N
k=1
(Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2







−N3
/2
.
The term in brackets simplifies to 1+SSA/SSE, hence the generalized likelihood
ratio test rejects HA for large values of SSA/SSE, or equivalently, if
FA =
SSA/(N − 1)
SSE/(N − 1)(N − 2)
=
MSA
MSE
> c ,
where it is easily verified that
c = F1−α
N−1,(N−1)(N−2).
Example 6.
1.5 Summary and Addenda
1.6. EXERCISES 35
Source of variation df SS MS F p-value
Carbon Grade 4 1787.4 446.8 2.3894 0.10888
pH 4 14165.4 3541.3 18.9370 0.00004
Quantity 4 3194.6 798.6 4.2706 0.02233
Residuals 12 2244.1 187.0
Total 24 21391.5
Table 1.8: Anova table for the purification process data.
1.6 Exercises
1. Show that...Just a template
2. Given that
Yijk = µ + αi + βj + ij + ijk,
show that...
36 Chapter 1. Analysis of Variance
BIBLIOGRAPHY
[1] Sokal, R. R., and Rohlf, F. J. (1968). Biometry: The principles and practice
of statistics in biological research. Freeman
[2] Cassella, B. and Berger, R. L. (1992). Statistical Inference. Duxbury
[3] Miller, R. G., Jr. (1997). Beyond Anova: Basics of applied statistics. Chap-
man & Hall
[4] Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods. Iowa State
[5] Johnson, N. L. and Leone, F. C. (1964). Statistics and Experimental Design:
in Engineering and the Physical Sciences. volume II. Wiley
[6] Development Core Team (2012). R: A language and environment for statis-
tical computing. R Foundation for Statistical Computing, Vienna, Austria.
ISBN 3-900051-07-0, URLhttp://www.R-project.org/.
37

More Related Content

What's hot

International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) inventionjournals
 
Introduction to Anova
Introduction to AnovaIntroduction to Anova
Introduction to AnovaJi Li
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestAzmi Mohd Tamil
 
Non parametrics tests
Non parametrics testsNon parametrics tests
Non parametrics testsrodrick koome
 
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSS
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSSQUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSS
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSSICFAI Business School
 
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...Ranjani Balu
 
Chi square[1]
Chi square[1]Chi square[1]
Chi square[1]sbarkanic
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSSParag Shah
 

What's hot (20)

International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
Anova statistics
Anova statisticsAnova statistics
Anova statistics
 
Introduction to Anova
Introduction to AnovaIntroduction to Anova
Introduction to Anova
 
Chi sqr
Chi sqrChi sqr
Chi sqr
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
 
T-Test
T-TestT-Test
T-Test
 
Chi square test
Chi square testChi square test
Chi square test
 
Non parametrics tests
Non parametrics testsNon parametrics tests
Non parametrics tests
 
Sampling Distributions and Estimators
Sampling Distributions and EstimatorsSampling Distributions and Estimators
Sampling Distributions and Estimators
 
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSS
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSSQUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSS
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSS
 
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
 
Chi square[1]
Chi square[1]Chi square[1]
Chi square[1]
 
Test for independence
Test for independence Test for independence
Test for independence
 
Applied statistics part 4
Applied statistics part  4Applied statistics part  4
Applied statistics part 4
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
Applied statistics part 3
Applied statistics part 3Applied statistics part 3
Applied statistics part 3
 
T test statistic
T test statisticT test statistic
T test statistic
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
T test
T testT test
T test
 
Crd tutorial
Crd tutorialCrd tutorial
Crd tutorial
 

Similar to AnalysisOfVariance

95720357 a-design-of-experiments
95720357 a-design-of-experiments95720357 a-design-of-experiments
95720357 a-design-of-experimentsSathish Kumar
 
Chapter 6 simple regression and correlation
Chapter 6 simple regression and correlationChapter 6 simple regression and correlation
Chapter 6 simple regression and correlationRione Drevale
 
A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...
A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...
A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...inventionjournals
 
slides Testing of hypothesis.pptx
slides Testing  of  hypothesis.pptxslides Testing  of  hypothesis.pptx
slides Testing of hypothesis.pptxssuser504dda
 
Lecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptLecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptMohammedAbdela7
 
2.AA.anova sesion applied biostat III (2).ppt
2.AA.anova sesion applied biostat III (2).ppt2.AA.anova sesion applied biostat III (2).ppt
2.AA.anova sesion applied biostat III (2).pptssuser504dda
 
Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...
Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...
Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...Sheila Sinclair
 
Chapter 11 Chi-Square Tests and ANOVA 359 Chapter .docx
Chapter 11 Chi-Square Tests and ANOVA  359 Chapter .docxChapter 11 Chi-Square Tests and ANOVA  359 Chapter .docx
Chapter 11 Chi-Square Tests and ANOVA 359 Chapter .docxbartholomeocoombs
 
ANOVA Lec 1 (alternate).pptx
ANOVA Lec 1 (alternate).pptxANOVA Lec 1 (alternate).pptx
ANOVA Lec 1 (alternate).pptxMohsinIqbalQazi
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docxsimonithomas47935
 
Introduction and crd
Introduction and crdIntroduction and crd
Introduction and crdRione Drevale
 
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docxnovabroom
 
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docxhyacinthshackley2629
 

Similar to AnalysisOfVariance (20)

Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
One Way ANOVA.pdf
One Way ANOVA.pdfOne Way ANOVA.pdf
One Way ANOVA.pdf
 
95720357 a-design-of-experiments
95720357 a-design-of-experiments95720357 a-design-of-experiments
95720357 a-design-of-experiments
 
Chapter 6 simple regression and correlation
Chapter 6 simple regression and correlationChapter 6 simple regression and correlation
Chapter 6 simple regression and correlation
 
A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...
A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...
A Moment Inequality for Overall Decreasing Life Class of Life Distributions w...
 
slides Testing of hypothesis.pptx
slides Testing  of  hypothesis.pptxslides Testing  of  hypothesis.pptx
slides Testing of hypothesis.pptx
 
Lecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptLecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.ppt
 
2.AA.anova sesion applied biostat III (2).ppt
2.AA.anova sesion applied biostat III (2).ppt2.AA.anova sesion applied biostat III (2).ppt
2.AA.anova sesion applied biostat III (2).ppt
 
ANOVA.pdf
ANOVA.pdfANOVA.pdf
ANOVA.pdf
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
 
Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...
Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...
Accuracy Study On Numerical Solutions Of Initial Value Problems (IVP) In Ordi...
 
Chapter 11 Chi-Square Tests and ANOVA 359 Chapter .docx
Chapter 11 Chi-Square Tests and ANOVA  359 Chapter .docxChapter 11 Chi-Square Tests and ANOVA  359 Chapter .docx
Chapter 11 Chi-Square Tests and ANOVA 359 Chapter .docx
 
Stat sample test ch 12
Stat sample test ch 12Stat sample test ch 12
Stat sample test ch 12
 
Design of experiments(
Design of experiments(Design of experiments(
Design of experiments(
 
ANOVA Lec 1 (alternate).pptx
ANOVA Lec 1 (alternate).pptxANOVA Lec 1 (alternate).pptx
ANOVA Lec 1 (alternate).pptx
 
Chi-Square test.pptx
Chi-Square test.pptxChi-Square test.pptx
Chi-Square test.pptx
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Introduction and crd
Introduction and crdIntroduction and crd
Introduction and crd
 
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
 
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
11 T(EA) FOR TWO TESTS BETWEEN THE MEANS OF DIFFERENT GROUPS11 .docx
 

AnalysisOfVariance

  • 1. The Analysis of Variance Tokelo Khalema University of the Free State Bloemfontein November 02, 2012
  • 2. 2
  • 3. CHAPTER 1 ANALYSIS OF VARIANCE 1.1 Introduction The analysis of variance (commonly abbreviated as ANOVA or AOV) is a method of investigating the variability of means between subsamples resulting from some experiment. In its most basic form it is a multi-sample generaliza- tion of the t-test but more complex ANOVA models depart greatly from the two-sample t-test. Analysis of variance was first introduced in the context of agricultural investigations by Sir Ronald A. Fisher (1890–1962), but is now commonly used in almost all areas involving scientific research. 1.2 One-way Classification 1.2.1 Normal Theory Suppose we carry out an experiment on N homogeneous experimental units and observe the following measurements, y1, y2, . . . , yN . Suppose also that of the N observations, J were randomly selected to be taken under the same experimen- tal conditions and that overall, we had I different experimental conditions. We shall refer to these experimental conditions as treatments. The treatments could be any categorical or quantitative variable — species, racial group, level of caloric intake, dietary regime, blood group, genotype, etc. We therefore see that we could subdivide the N variates into I groups (or treatments), under each of which there are J observations. Such an experimental design in which the number of observations or measurements per treatment are the same is termed a balanced design. A design need not be balanced. Denote the jth observation under the ith treatment by yij where i = 1, . . . , I and j = 1, . . . , J. Further, assume that Yij ∼ iidN(µi, σ2 ) for all i and j. It 1
  • 4. 2 Chapter 1. Analysis of Variance might be helpful to visualize the experimental points as forming an array whose ith column represents the ith treatment and jth row represents the jth ob- servations under all the treatments. Ordinarily, measurements taken on several homogeneous experimental units under the same experimental conditions should differ slightly due to some unexplained measurement errors. We assume these measurement errors to be independent and normally distributed with mean zero and constant but unknown variance, σ2 < ∞. That is ij ∼ iidN(0, σ2 ) for all i and j. The assumption of zero mean is natural rather than arbitrary because, on average, any deviation from the mean in any population should average out to zero. In the analysis of variance we are interested in the overall variability of the µi about the grand population mean µ. This implies a fixed differential effect αi = µi − µ (or deviation from the grand mean), due to treatment i. The above arguments and assumptions lead us to the following linear model, Yij = µ + αi + ij = µi + ij (1.1) for i = 1, . . . , I and j = 1, . . . , J, which describes the underlying data-generating process. It is easy to show that if αi, as in equation 1.1, is to be interpreted as the differential effect of the ith treatment, then we have the following constraint, I i=1 αi = 0 . (1.2) The constraint in equation 1.2 above is termed a model identification con- dition. Without it the model we just formulated is said to be unidentifiable 1 . Different interpretations of the αi lead to different constraints and different model parametrizations. In the sequel we shall stick to the parametrization above. Equation 1.1 is usually referred to as the one-way fixed effects model, or Model I. One-way because the data are classified according to one factor, viz., treatment and the term “fixed” arises from the fact that we have assumed the αi to be fixed instead of random, in which case we would have had a random effects model or Model II. Later we introduce Model II and demonstrate how it can be used in practice. The null hypothesis in the analysis of variance model given in equation 1.1 is that the treatment means are all equal; the alternative is that at least one pair of means is different. That is H0 : µi = µj , ∀i = j (1.3) HA : µk = µl for at least one combination of values k = l. But since µi = µ + αi, we see that µi = µj where i = j implies that αi = 0 for all i. This gives an equivalent form of the hypotheses given above, namely H0 : αi = 0, ∀i ∈ {1, . . . , I} (1.4) HA : αi = 0 for at least one i ∈ {1, . . . , I}. 1Identifiability is a desirable property of models. A model is called identifiable if all its parameters can be uniquely estimated and inferences can be drawn from it.
  • 5. 1.2. ONE-WAY CLASSIFICATION 3 This formulation is more commonly met with and is arguably more intuitive —in words the null hypothesis says that there are no differential effects due to treatments. Or simply, that there are no treatment effects. So any apparent differences in sample means is not attributable to the treatments but to random selection. The alternative hypothesis says that there is at least one treatment with a differential effect—the negation of the null hypothesis. Before we present the mathematical derivations of the analysis of variance, let us consider one practical example of an experiment which should be recognized as a numerical example of the more general design outlined above. This ex- ample was taken from a classic reference by Sokal and Rohlf (1968) [1]. Sokal tested 25 females of each of three lines of Drosophila for significant differences in fecundity among the three lines. The first of these lines was was selected for resistance against DDT, the second for susceptibility to DDT, and the third was a nonselected control strain. This is a balanced design with I = 3 treatments, J = 25 observations per treatment, and should also be recognized as Model I since the exact nature of the treatments was determined by the experimenter. The data are summarized in table 1.1 in which the response is the number of eggs laid per female per day for the first 14 days of life. We might want to compute the treatment sample means as a preliminary check on the heterogeneity among group means. Dataset drosophila contains the data presented in table 1.1. In R we issue the following commands: > library(khalema) > data(drosophila) > attach(drosophila) > tapply(fecundity,line,mean) 1 2 3 25.256 23.628 33.372 The first three commands should be old news by now. The first loads pack- age khalema, the second accesses dataset drosophila and, the third makes the variables in drosophila available on the search path. The final command com- putes the sample mean under each of the 3 treatments. Note that the mean under the nonselected treatment is appreciably higher than those under the other treatments. Of interest in the analysis of variance is whether this difference is statistically significant or just a result of noise in the data. In deriving a test to investigate the significance of group sample mean dif- ferences we will need some statistics and their corresponding sampling distri- butions. Among these are the overall average and the average under the ith treatment denoted, ¯Y.. = I i=1 J j=1 Yij/N,
  • 6. 4 Chapter 1. Analysis of Variance Resistant Susceptible Nonselected 12.8 38.4 35.4 21.6 32.9 27.4 14.8 48.5 19.3 23.1 20.9 41.8 34.6 11.6 20.3 19.7 22.3 37.6 22.6 30.2 36.9 29.6 33.4 37.3 16.4 26.7 28.2 20.3 39.0 23.4 29.3 12.8 33.7 14.9 14.6 29.2 27.3 12.2 41.7 22.4 23.1 22.6 27.5 29.4 40.4 20.3 16.0 34.4 38.7 20.1 30.4 26.4 23.3 14.9 23.7 22.9 51.8 26.1 22.5 33.8 29.5 15.1 37.9 38.6 31.0 29.5 44.4 16.9 42.4 23.2 16.1 36.6 23.6 10.8 47.4 Table 1.1: Number of eggs laid per female per day for the 1st 14 days of life. and ¯Yi. = J j=1 Yij/J, respectively. Recall that N = IJ is the total number of observations. We define the following statistic which should be interpreted as summarizing the total variability in the sample, SST = I i=1 J j=1 (Yij − ¯Y..)2 . This is called the total sum of squares. But the total variability in a sample can be partitioned into variability within treatments and variability between treatments. In fact, it can easily be shown that SST = SSB + SSW (1.5)
  • 7. 1.2. ONE-WAY CLASSIFICATION 5 where SSB = J I i=1 ( ¯Yi. − ¯Y..)2 and SSW = I i=1 J j=1 (Yij − ¯Yi.)2 denote the sum of squares between and the sum of squares within treatments respectively. The statistic SSB summarizes variation in the sample attributable to treatment; SSW summarizes variation attributable to error and is sometimes written SSE. Note that under the assumption of homoscedastic variance, each of the I terms, J j=1 (Yij − ¯Yi.)2 /(J − 1), furnishes an estimate of the error variance, σ2 . It is thus reasonable to estimate σ2 by pooling these terms together to obtain the pooled estimate of the common variance, s2 p = 1 I(J − 1) I i=1 J j=1 (Yij − ¯Yi.)2 = SSW I(J − 1) . The reader will recall that if Yi ∼ iidN(µ, σ2 ) for i = 1, . . . , n then, (n − 1)S2 /σ2 ∼ χ2 n−1, (1.6) where S2 = n i=1 (Yi − ¯Y )2 /(n − 1) denotes the sample variance and ¯Y = n i=1 Yi/n the sample mean. This now familiar result will be an important template in proving the following theorem.
  • 8. 6 Chapter 1. Analysis of Variance Theorem 1. Under the assumption that the random errors, ij ∼ iidN(0, σ2 ), for i = 1, . . . , I and j = 1, . . . , J, we have the following results: 1. SST/σ2 = I i=1 J j=1 (Yij − ¯Y..)2 /σ2 ∼ χ2 N−1, if H0 : αi = 0 ∀i is true, 2. SSW/σ2 = I i=1 J j=1 (Yij − ¯Yi.)2 /σ2 ∼ χ2 I(J−1), whether or not H0 is true, 3. SSB/σ2 = I i=1 J j=1 ( ¯Yi. − ¯Y..)2 /σ2 ∼ χ2 I−1, if H0 : αi = 0 ∀i is true, and 4. SSW/σ2 and SSB/σ2 are independently distributed. Proof. To prove the first part of the theorem we note that if H0 is true, then we have a common mean µ under each treatment and thus Yij ∼ iidN(µ, σ2 ) for i = 1, . . . , I and j = 1, . . . , J. Accordingly, I i=1 J j=1 (Yij − ¯Y..)2 /(N − 1) denotes the sample variance of a sample of size N = IJ from a N(µ, σ2 ) popu- lation, hence using the result given in equation 1.6 concludes the proof. For the second part we note that, J j=1 (Yij − ¯Yi.)2 /(J − 1) denotes the sample variance of the ith treatment, hence, whether or not H0 is true, J j=1 (Yij − ¯Yi.)2 /σ2 ∼ χ2 J−1 independently for all i = 1, . . . , I. Summing all I of these terms and using the property of the sum of independent Chi-square random variables yields the stated result. Further, if H0 is true, the third part results from the subtraction property of the Chi-square distribution. Lastly, to proof the independence of... In addition to the statistics we have defined thus far, it is customary to define the mean square due to treatment and the mean square due to error as, MSB = SSB/(I − 1) and, MSW = SSW/I(J − 1),
  • 9. 1.2. ONE-WAY CLASSIFICATION 7 respectively. We are now in a position to derive a test for the hypotheses H0 : αi = 0, ∀i ∈ {1, . . . , I} versus HA : αi = 0 for at least one i ∈ {1, . . . , I}. In the following theorem we use the statistics defined above and their sampling distributions to derive the generalized likelihood ratio test for H0 and HA. Theorem 2. The generalized likelihood ratio test statistic for testing the null hypothesis of no treatment effects as in equation 1.4 is given by: F = MSB MSW , and H0 is rejected at 100(1 − α)% if F > F1−α I−1,I(J−1). Proof. Recall from our earlier discussion that in addition to some distributional assumptions we assumed the following: Yij = µ + αi + ij, where the restriction I i=1 αi = 0 is imposed on the αi. It follows then that, for i = 1, . . . , I and j = 1, . . . , J, f(yij) = 1 σ √ 2π exp − 1 2 yij − µ − αi σ 2 From independence of the yij we have the following likelihood, L(µ, αi, σ2 |y) = (2πσ2 )−IJ/2 exp    − 1 2σ2 I i=1 J j=1 (Yij − µ − αi)2    (1.7) and log-likelihood l = log L = − IJ 2 log(2πσ2 ) − 1 2σ2 I i=1 J j=1 (Yij − µ − αi)2 Under the alternative hypothesis we have the following parameter space, Ω = {(µ, αi, σ2 )| − ∞ < µ, αi < ∞, σ2 > 0}.
  • 10. 8 Chapter 1. Analysis of Variance Differentiating the log-likelihood with respect to µ and equating the derivative to zero gives, ∂l ∂µ = 1 σ2 I i=1 J j=1 (Yij − µ − αi) = 0, which implies that ˆµΩ = ¯Y.. Once again we differentiate with respect to αi to obtain, ∂l ∂αi = 1 σ2 J j=1 (Yij − µ − αi) = 0. This yields ˆαiΩ = ¯Yi. − ¯Y.. Finally we differentiate with respect to σ2 and proceed just as we did above. We have, ∂l ∂σ2 = − IJ 2σ2 + 1 2σ4 I i=1 J j=1 (Yij − µ − αi)2 = 0, which gives the following MLE, ˆσ2 Ω = N−1 I i=1 J j=1 (Yij − ¯Yi.)2 Substituting these estimates into equation 1.7 we have the following likelihood supremum under H1, sup Ω L(µ, αi, σ2 |y) = exp − IJ 2 ·    2π IJ I i=1 J j=1 (Yij − ¯Yi.)2    −IJ/2 . Under the null hypothesis we have one less parameter since the αi are hypoth- esised to be zero. The parameter space is, ω = {(µ, σ2 )| − ∞ < µ < ∞, σ2 > 0}. In this case we maximize the following log-likelihood, l = log L = − IJ 2 log(2πσ2 ) − 1 2σ2 I i=1 J j=1 (Yij − µ)2 . It is left to the reader to show that the parameter estimates in this case are, ˆµω = ¯Y..
  • 11. 1.2. ONE-WAY CLASSIFICATION 9 and ˆσ2 ω = N−1 I i=1 J j=1 (Yij − ¯Y..)2 The likelihood supremum is then given by, sup ω L(µ, σ2 |y) = exp − IJ 2 ·    2π IJ I i=1 J j=1 (Yij − ¯Y..)2    −IJ/2 . After some cancellation and the use of the identity we established earlier, the generalized likelihood ratio test statistic takes the following form, Λ = sup ω L sup Ω L =        I i=1 J j=1 (Yij − ¯Y..)2 I i=1 J j=1 (Yij − ¯Yi.)2        −N/2 =        I i=1 J j=1 (Yij − ¯Yi.)2 + J I i=1 ( ¯Yi. − ¯Y..)2 I i=1 J j=1 (Yij − ¯Yi.)2        −N/2 . The generalized likelihood ratio test rejects H0 for small values of Λ and we see that small values of Λ correspond to large values of SSB/SSW . That is we reject H0 if SSB SSW > k or if F = SSB/(I − 1) SSW/I(J − 1) = MSB MSW > k I(J − 1) I − 1 = c where c is chosen such that Pr(F > c|H0) = α, the desired type I error. But we have already derived the null distribution of F from which we have, c = F1−α I−1,I(J−1) or the 100(1−α) percentile of the F-distribution with I −1 and I(J −1) degrees of freedom. This completes of the proof. The reader who closely followed the foregoing proof should have been aware that the likelihood ratio test statistic would not have been arrived at had the iden- tification condition not been taken into account. We see then that inferences
  • 12. 10 Chapter 1. Analysis of Variance cannot be drawn from an unidentifiable model. In fact, this is what unidentifi- able means in statistical literature. Cassella & Berger (1992) [2] touch lightly on model identification. For obvious reasons, the test just derived is called the F-test. We will proceed to demonstrate how it can be applied in practice. Example 1. Consider the data presented earlier in table 1.1. It is vital to test for any significant violations of model assumptions before we draw inferences. First let us test the validity of the constant variance assumption. Figure 1.1 affords a visual check on the group variances. There is not much reason to believe that the constant variance assumption could be unduly flawed. The distributions also look reasonably symmetrical, hence normal theory could be applied safely. Resistant Susceptible Nonselected 10 15 20 25 30 35 40 45 50 Response Figure 1.1: Side-by-side boxplots for the Drosophila fecundity data. We proceed with the analysis and calculate the sum of squares, mean squares, and the F-statistic. In R the command to fit the linear model is: > lm(fecundity~line,drosophila) And the command, > anova(lm(fecundity~line,drosophila))
  • 13. 1.2. ONE-WAY CLASSIFICATION 11 Source of variation df SS MS F p-value Between 2 1362.2 681.11 8.6657 0.0004 Within 72 5659.0 78.60 Total 74 7021.2 Table 1.2: Anova table for the Drosophila fecundity data. gives the anova table. An anova table compactly summarizes the results of an F-test. From the table above, the F-statistic is significant at a level of 5%. Say the p-value was not reported, as would be the case if one were not using a computer. Then we would refer to the F table in the appendix, approximate F2,72(.97) by F2,62(.97) and report p-value = Pr(F ≥ 8.6657) < Pr(F ≥ 3.15) = 5%. But before we run into conclusions we test the validity of the distributional assumption of the random errors. To estimate these, we plug in the MLE’s of µ and αi into equation 1.1 to obtain, ˆij = Yij − ¯Y.. − ¯Yi. + ¯Y.. = Yij − ¯Yi. for i = 1, . . . , I and j = 1, . . . , J. These are termed model residuals. By virtue of the invariance property of maximum likelihood estimates, ˆij furnishes a maximum likelihood estimate of ij. We are interested in testing whether these residuals can be considered as Gaussian white noise. But recall that maximum likelihood estimates are asymptotically normal. To obtain the residuals in R we issue the command below: > Residuals <- lm(fecundity~line,drosophila)$residuals but this is only one of several ways to obtain model residuals in R. A look at figure 1.2 shows that the residuals are not far from normal. In particular, the histogram shows a sense of symmetry about zero. Hence we can safely read the anova table and conclude that the F-test conclusively rejects the null hypothesis of no treatment effects. In ordinary parlance this means that of the I = 3 lines, at least one was much more or much less fecund than the rest. Figure 1.1 reveals that the nonselected line had much more fecundity than the resistant and the susceptible lines. At this point we find it worthwhile to interpolate some comments on the assumptions underlying the analysis of variance which should always be borne in mind each time an analysis of variance is carried out. We assume that in the model given in equation 1.1, we have, 1. normally distributed random errors ij,
  • 14. 12 Chapter 1. Analysis of Variance q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q −2 −1 0 1 2 −20−1001020 Theoretical Quantiles OrderedResiduals Residuals Frequency −20 0 10 20 051015 Figure 1.2: A histogram and a normal quantile-quantile plot of the model resid- uals. 2. constant (or homoscedastic) error variance σ2 , and 3. independent random errors. The assumption of normality is not a particularly stringent one. The F-test has been shown to be robust against mild to moderate departures from normal- ity, especially if the distribution is not saliently skewed. Several good tests of normality exist in the literature. The Shapiro-Wilk test is one of those most commonly used in practice. Its R directive is shapiro.test() and its null hypothesis is that the sample comes from a normal parent distribution. Apply- ing this test on the residuals from our previous example we obtain a p-value of 0.45. So the Shapiro-Wilk test conclusively accepts the hypothesis of normally distributed random errors. You will recall from example 1 that we were quite content with the validity of the normality assumption from the qq-plot and the histogram created therein. In examples to follow, we shall stick to the same diagnostic procedure with the hope that any undue departures from normality will be noticed by the naked eye and not bother ourselves with carrying out the normality test. The problem of heteroscedasticity (or nonconstant variance) has slightly dif- ferent implications depending on whether a design is balanced or otherwise. In the former case, slightly lower p-values than actual ones will be reported; in the
  • 15. 1.2. ONE-WAY CLASSIFICATION 13 latter, higher or lower p-values than actual ones will be reported according as large σ2 i are associated with large ni, or large σ2 i are associated with small ni (see Miller (1997) [3] pp. 89-91). While there will usually be remedies to non-normality and heteroscedastic vari- ance, dependence of errors will usually not be amenable to any alternative method available to the investigator, at least if it is in the form of serial corre- lation. Dependence due to blocking, on the other hand, can easily be handled by adding an extra parameter to the model to represent the presence of block- ing. We will see later how blocking can purposely be introduced to optimize an experimental plan. It has been shown (see...) that if there is serial correlation within (rather than across) samples, then the significance level of the F-test will be smaller or larger than desired according as the correlation is negative or positive. The presence of serial correlation of lag 1 can be detected by visually inspecting plots of variate pairs (yij, yi,j+1). The hope should be not to spot any apparent linear relationship between the lagged pairs if the F-test is to be employed. Outliers can also be a nuisance in applying the F-test. Since the sample mean and variance are not robust against outliers, such outlying observations can greatly augment the within-group mean square which in turn would render the F−test conservative2 . Usually no transformation will remedy the situation of outlying observations. One option to deal with outliers would be to use the trimmed mean in the calculation of the sum of squares. Another is the use of nonparametric methods. We discuss nonparametric methods in section 1.2.3. Usually for a design to yield observations that have all three of the charac- teristics enumerated above, the experimenter should ensure random allocation of treatments. That is, experimental units must be allocated at random to the treatments. Randomization is very critical in all of experimental design. It also makes possible the calculation of unbiased estimates of the treatment effects. One important concept that has thus far only received brief mention is that of unbalanced designs. If in stead of the same number J of replicates under each treatment we suppose that we have ni observations under treatment i, where the ni need not be equal, then it can easily be shown that the identity in equation 1.5 becomes I i=1 ni j=1 (Yij − ¯Y..)2 = I i=1 ni( ¯Yi. − ¯Y..)2 + I i=1 ni j=1 (Yij − ¯Yi.)2 . Otherwise the analysis remains the same as in the balanced design and an analogous F-test can be derived. The next example, adapted from Snedecor & Cochran (1980) [4], illustrates points we made in the last few paragraphs including the possibility of an unbalanced design. Example 2. For five regions in the United States in 1977, public school ex- penditures per pupil per state were recorded. The data are shown in table 1.3. 2A conservative test is “reluctant” to reject—i.e. it has a smaller type I error than desired.
  • 16. 14 Chapter 1. Analysis of Variance South North Mountain Northeast Southeast Central Central Pacific 1.33 1.66 1.16 1.74 1.76 1.26 1.37 1.07 1.78 1.75 2.33 1.21 1.25 1.39 1.60 2.10 1.21 1.11 1.28 1.69 1.44 1.19 1.15 1.88 1.42 1.55 1.48 1.15 1.27 1.60 1.89 1.19 1.16 1.67 1.56 1.88 1.26 1.40 1.24 1.86 1.30 1.51 1.45 1.99 1.74 1.35 1.53 1.16 Table 1.3: Public school expenditures per pupil per state (in $1 000). Otherwise for R users the relevant data-frame is named pupil. The question of interest is the same old one, namely, are the region to region expenditure differences statistically significant or are they due to chance alone? Figure 1.3 shows that the distribution cannot be judged to be very symmet- rical, nor can we be overly optimistic about constant variance. Since overall, there is not too much skewness, it is about the latter that we should be most worried. No outliers are visible so there really is not much that calls normal theory into question. The R command for creating the plot in figure 1.3 is plot(expenditure~region,pupil). We seek now for an appropriate variance stabilizing transformation. Since all the values are nonnegative, we could try the log-transformation, or even the square-root transformation. A plot of the log-transformed data is shown in figure 1.4. The log-transformed distribution does not look vaguely more symmetrical. After a few trials, we finally take the reciprocal of the square of the observations, which yields the plot depicted in figure 1.5. This time the variance looks reasonably constant across treatments. A little question mark over symmetry remains though. But there is not strong enough skewness to warrant too much concern. To investigate this further, we create a normal qqplot and a histogram of residuals. These are shown in figure 1.6 from which we see a slight deviation from normality. But earlier we pointed out that the F-test is not too sensitive to moderate departures from normality. The anova table on the transformed response is ob- tained by issuing the command, anova(lm(expenditure^-2~region,pupil)) in R, and is shown in table 1.4. From table 1.4 we see a highly significant F-statistic. That is, strong evidence suggests that expenditures vary from region to region.
  • 17. 1.2. ONE-WAY CLASSIFICATION 15 Northeast Southeast S. Central N. Central M. Pacific 1.2 1.4 1.6 1.8 2 2.2 Response Figure 1.3: Side-by-side boxplots for the public school expenditures data. Source of variation df SS MS F p-value Between 4 0.78114 0.195285 11.62 0.0000 Within 43 0.72263 0.016805 Total 47 1.50377 Table 1.4: Anova table for the expenditures per pupil per state data. 1.2.2 Multiple Comparisons Despite all its merits, the omnibus F-test is not without deficiencies of its own. From the previous example we concluded that expenditures varied from region to region. For all we know, such a conclusion could have been reached because only one of the regions had a sample mean much greater or less than the rest. Usually we would be interested in knowing which pair of groups differ significantly. The current section addresses this problem by introducing commonly used methods of multiple comparisons that can be used in lieu of the omnibus F-test, or after the F-test has rejected the null hypothesis. It was shown earlier that two treatment means, µi and µi , can be concluded to be different at level α if the
  • 18. 16 Chapter 1. Analysis of Variance Northeast Southeast S. Central N. Central M. Pacific 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Log−transformedResponse Figure 1.4: Side-by-side boxplots for the log-transformed data. 100(1 − α)% confidence interval for their difference, ¯Yi. − ¯Yi . ± tν,1−α/2sp 1 ni + 1 ni , (1.8) does not contain zero, or equivalently, if | ¯Yi. − ¯Yi .| > tν,1−α/2sp 1 ni + 1 ni . If all k = I 2 intervals are to be considered as a family, the statement given by equation 1.8 above does not hold with probability 1 − α; the coverage proba- bility, or as commonly called, the family-wise rate (FWR), will be lower. For the special case of ni = ni = J, one commonly used remedial measure was developed by John Tukey. He showed that the variate, max i,i |( ¯Yi. − µi) − ( ¯Yi . − µi )| sp/ √ J , follows the so-called Tukey studentized range distribution with parameters I and I(J − 1), where the pooled sample variance s2 p equals the mean square of error. If we denote the 100(1 − α) percentile of this distribution by qI,I(J−1)(α), then we have the following probability statement,
  • 19. 1.2. ONE-WAY CLASSIFICATION 17 Northeast Southeast S. Central N. Central M. Pacific 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Reciprocal−of−square−transfomedResponse Figure 1.5: Side-by-side boxplots for the reciprocal-of-square-transformed data. Pr max i,i |( ¯Yi. − µi) − ( ¯Yi . − µi )| ≤ qI,I(J−1)(α)sp/ √ J = 1 − α, (1.9) from which we obtain the following family of confidence intervals of the differ- ences µi − µi , ¯Yi. − ¯Yi . ± qI,I(J−1)(α)sp/ √ J, with family-wise error rate exactly equal to α. Accordingly, any pair of treat- ment sample means will be significantly different at level α if | ¯Yi. − ¯Yi .| > qI,I(J−1)(α)sp/ √ J. Methods to deal with unbalanced designs have also been devised. One method that gives very good results despite its crudity is due to Bonferroni. From the Bonferroni equality, it can be shown that to ensure a family-wise er- ror rate of at most α, then each of the k tests of µi = µi should be carried out at significance level α/k. Where N = I i=1 ni denotes the total number of observations, we then have the following family of confidence intervals, ¯Yi. − ¯Yi . ± t α/2k N−I sp 1 ni + 1 ni , where k = I 2 ,
  • 20. 18 Chapter 1. Analysis of Variance q q q q q q qqq q q q qq q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q −2 −1 0 1 2 −0.2−0.10.00.10.2 Theoretical Quantiles OrderedResiduals Residuals Frequency −0.3 −0.1 0.1 0.3 02468101214 Figure 1.6: A histogram and a normal quantile-quantile plot of the model resid- uals. which should have coverage probability of at least 1 − α. We call these the Bonferroni confidence intervals. Let us consider an example. Example 3. Since the previous example dealt with unequal sample sizes, we employ the Bonferroni method to carry out multiple comparisons. We calculated the pooled sample variance to be s2 p = MSE = .017. We also have k = 10 comparisons. According to Bonferroni’s method, a pair of sample means (of sizes ni and ni ) that differ by an absolute amount greater than .1296 × t43(.9975) × 1 ni + 1 ni , will be considered significantly different at level α = .05. Following are R commands to compute and compactly display absolute differences of all possible combinations of sample means in a 5 by 5 array. > data(pupil) > attach(pupil) > X <- Y <- tapply(expenditure^-2,region,mean) > diff <- abs(outer(X,Y,"-"));diff Consider for instance, the Mountain Pacific and North Central regions. The absolute value of their sample means difference is 0.033, which is far less than
  • 21. 1.2. ONE-WAY CLASSIFICATION 19 the critical value of .164. In fact, the 99.75% confidence interval of the difference of means can be shown to be (−0.130, 0.197) or (−0.197, 0.130), depending on how the difference is taken. This interval obviously contains zero. So the two regions’ levels of expenditure cannot be considered to be statistically different. Next, let us consider the Northeast and South Central regions. Their sample means differ by an absolute amount of 0.366, which exceeds the critical value of .176. The corresponding confidence interval is (.190, .543) or (−.543, −.190). The last 8 comparisons can be made similarly. The reader will find that, overall, 4 pairs are significantly different, viz., Northeast and South Central, Northeast and Southeast, South Central and Mountain Pacific, and South Central and North Central. Other commonly used multiple comparison methods for unbalanced designs include that due to Sheff´e and a variant of Tukey’s method which we discussed earlier, called the Tukey-Kramer method. Both give conservative results, as does the Bonferroni method. Because of their “conservatism”, one should consider using Tukey’s method whenever a balanced design is dealt with, which should give shorter confidence intervals. Sheff´e’s confidence intervals for the difference µi − µi are given by, ¯Yi. − ¯Yi . ± sp (I − 1)Fα I−1,N−I 1 ni + 1 ni , where Fα I−1,N−I denotes the 100(1 − α) percentile of the F-distribution with I and N − I degrees of freedom. On applying Sheff´e’s method to the expenditure data, it is striking to see that we reach similar conclusions as those reached in example 3 above under Bonferroni’s method. But Sheff´e’s intervals are signifi- cantly broader. It is still not too clear whether the Tukey-Kramer method gives intervals with coverage probability of at least 1−α or approximately 1−α. But it too gives re- sults good enough to merit its mention. Confidence intervals under this method are given by, ¯Yi. − ¯Yi . ± qI,N−I(α)sp 1 2 1 ni + 1 ni . An abundance of other multiple comparison procedures have been proposed but not all are good enough to enter the fray. 1.2.3 Nonparametric Methods If the assumptions underlying the analysis of variance do not hold and no trans- formation is available to make the F-test more applicable, nonparametric meth- ods are often used in stead. The Kruskal-Wallis test is by far the most com- monly used nonparametric analog of the one way analysis of variance. Unlike the F-test, it makes no distributional assumptions about the observations; for it to be applicable, the observations need only be independent.
  • 22. 20 Chapter 1. Analysis of Variance In this method, we denote by Rij, the rank of yij in the combined sample of all N = I i=1 ni observations. Then define ¯Ri. = I i=1 Rij/ni, and ¯R.. = I i=1 ¯Ri./N, as the average rank score of the ith sample and the grand rank score, respec- tively. Finally we compute the following statistic, K = 12 N(N + 1) I i=1 ni( ¯Ri. − ¯R..)2 = 12 N(N + 1) I i=1 ni ¯R2 i. − 3(N + 1), which has been shown to have a limiting χ2 distribution with I − 1 degrees of freedom under the null hypothesis of equal location parameters under each of the I groups. The null hypothesis is rejected for large values of K. Just as in the two sample case, tied observations will be assigned average ranks. The K-statistic defined above should perform reasonably well if there are not too many ties. Otherwise some correction factor will have to be applied. Example 4. Table 1.5 presents ranks of the expenditure data from example 2. From these data we calculate a highly significant value of K = 21.83. The R command to compute the p-value is, 1-pchisq(21.83,4). It is well to realize the sum of squares occurring in the expression for the K- statistic as the between-groups sum of squares in the analysis of variance. Then the value of K can easily be calculated by performing the usual analysis of vari- ance on the ranks and then multiplying the between-groups sum of squares by 12/N(N + 1). The Kruskal-Wallis test has an implementation in R. However, it will usually give a different value for K than that obtained from using the foregoing ex- pression. This is because in calculating the statistic, R uses some weights that will make the distribution of the K-statistic as χ2 as possible. Here are the R commands and output for the previous example. > kruskal.test(expenditure,region,data=pupil) Kruskal-Wallis rank sum test data: expenditure and region Kruskal-Wallis chi-squared = 24.0387, df = 4, p-value = 7.846e-05
  • 23. 1.3. TWO-WAY CLASSIFICATION 21 South North Mountain Northeast Southeast Central Central Pacific 18 33 6 36.5 39 13.5 20 1 40 38 47 9.5 12 21 31.5 46 9.5 2 16 35 24 8 3.5 42.5 23 29 26 3.5 15 31.5 44 8 6 34 30 42.5 13.5 22 11 41 17 27 25 45 36.5 19 28 6 Table 1.5: Ranks of the Public school expenditures data. Since the Kruskal-Wallis test works with ranks rather than actual numerical values of the observations, it will greatly eliminate the effect of outliers. In practice, one will usually resort to this test if there are too many outliers in the data, if normal theory is not applicable, or if the data are already in the form of ranks. 1.3 Two-way Classification 1.3.1 Introduction Up to this point we have assumed, at least tacitly, that the experiments we deal with yield observations that can only be grouped according to one factor. This need not be the case; several factors can be considered simultaneously. For example, consider an experiment in which the amount of milk produced by a hundred cows is studied. It is natural to consider breed and age-group as possible factors in such a study. There could also be a third, and even a fourth factor, etc., all of which are considered simultaneously. We introduce herein methods of analyzing such experimental designs. We will only treat the case of two factors in which case the design is called two-way analysis of variance , but the reader should, however, be aware that the order of classification is abitrary. In the general case we speak of N-way analysis of variance. For the ease of reference we shall call the factors with which we deal, factor A and factor B. It is also common in the literature to call these row and column factors. It is natural then to speak of a treatment/column or row effect according as the effect due to factor A or that due to factor B is referred to. Treatment and row effects are also referred to as main efects to distinguish them from the so-called interaction effect. We explain what interaction means shortly.
  • 24. 22 Chapter 1. Analysis of Variance 1.3.2 Normal Theory The analysis in the two-way classification departs slightly from that in the one- way classification as more variables come into play. In particular, the...occassions the need to extend our notation from the previous sections. If we assume that factor A has I levels and factor B has J levels and that in the cell determined by level i of factor A and level j of factor B there are k observations (or repli- cations), then we use yijk to symbolize the kth observation under such a cell. If each of factors A and B contributes to the response variable an amount inde- pendent of that contributed by the other, the model is termed as an additive model and is formulated, Yijk = µ + αi + βj + ijk, (1.10) with identification conditions, I i=1 αi = 0, and J j=1 βj = 0, where i = 1, . . . , I and j = 1, . . . , J. Just as before, the random errors, ijk, are assumed to be independently and identically normally distributed about zero mean with constant variance σ2 . If the contribution to the response variable by factor A depends on the level of factor B, or conversely, then the simple additive model is not totally representative of the design and a phenomenon called interaction is said to exist. We introduce another variable, ij, that will represent this interaction effect. Hence for example, 23 will be negative or positive according as factors A and B have opposing or synergistic effects under level 2 of factor A and level 3 of factor B. This full model which takes interaction into account is given by, Yijk = µ + αi + βj + ij + ijk, (1.11) with identification conditions, I i=1 αi = 0, J j=1 βj = 0, and I i=1 ij = J j=1 ij = 0,
  • 25. 1.3. TWO-WAY CLASSIFICATION 23 where i = 1, . . . , I and j = 1, . . . , J. In addition to testing the significance of the main effects in two-way analysis of variance (or any factorial anova for that matter), there is need to also test for interaction effects. We thus have a total of three null hypotheses to test. In dealing with many null hypotheses we will have reason to vary our usual notation. Specifically, we superscript each null hypothesis with a naught to avoid confusing HA for an alternative hypothesis, for instance. That is the no main effects null hypotheses are denoted, H0 A : αi = 0 ∀i ∈ {1, . . . , I}, and H0 B : βj = 0 ∀j ∈ {1, . . . , J}, and the no interaction effect null hypothesis is written, H0 I : ij = 0 for all combinations of i and j. In anticipation of their need ahead, we give expressions for the sums of squares, which are a little more involved than those in the one-way layout. Also, some identities and statistics other than the sum of sqaures which will provide tests of the hypotheses stated above will be derived just as we did in the one-way layout. The next theorem constructs a generalized likelihood ratio test for H0 A, H0 B, and H0 I . Theorem 3. The generalized likelihood ratio test statistics for testing the null hypotheses of no main and interaction effects are given by: 1. FA = MSA MSE , where H0 A is rejected at 100(1 − α)% if FA > F1−α I−1,IJ(K−1), 2. FB = MSB MSE , where H0 B is rejected at 100(1 − α)% if FB > F1−α J−1,IJ(K−1), and 3. FI = MSI MSE , where H0 I is rejected at 100(1 − α)% if FI > F1−α (I−1)(J−1),IJ(K−1). Proof. Since a complete proof to each part of the theorem can easily span two and half pages, we will proof the first part and leave the last two to the reader. We have for i = 1, . . . , I, j = 1, . . . , J, and k = 1, . . . , K, f(yijk) = 1 σ √ 2π exp − 1 2 yijk − µ − αi − βj − ij σ 2 .
  • 26. 24 Chapter 1. Analysis of Variance Thus the likelihood is given by L(µ, αi, βj, ij, σ2 |y) =(2πσ2 )−IJK/2 × exp    − 1 2 I i=1 J j=1 K k=1 Yijk − µ − αi − βj − ij σ 2    , from the assumption of independence. For ease of maximization we use the log-likelihood, l = log L = − IJK 2 log (2πσ2 ) − 1 2σ2 I i=1 J j=1 K k=1 (Yijk − µ − αi − βj − ij)2 . The parameter space under the general alternative hypothesis which states that all effects are non-zero is denoted, Ω = { (µ, αi, βj, ij, σ2 )| − ∞ < µ, αi, βj, ij < ∞, σ2 > 0 }. Proceeding to find the ML estimates under Ω we have, ∂l ∂µ = 1 σ2 I i=1 J j=1 K k=1 (Yijk − µ − αi − βj − ij) = 0, which implies that ˆµΩ = ¯Y.... Similarly, it is easily verified that ∂l ∂αi = 1 σ2 J j=1 K k=1 (Yijk − µ − αi − βj − ij) = 0 implies ˆαiΩ = ¯Yi.. − ¯Y.... ∂l ∂βj = 1 σ2 I i=1 K k=1 (Yijk − µ − αi − βj − ij) = 0, yields ˆβiΩ = ¯Y.j. − ¯Y.... Likewise, ∂l ∂ ij = 1 σ2 K k=1 (Yijk − µ − αi − βj − ij) = 0 implies ˆijΩ = ¯Yij. − ¯Yi.. − ¯Y.j. + ¯Y.... Finally ∂l ∂σ2 = − IJK 2σ2 + 1 2σ4 I i=1 J j=1 K k=1 (Yijk − µ − αi − βj − ij)2 = 0 yields ˆσ2 Ω = N−1 I i=1 J j=1 K k=1 (Yijk − ¯Yij.)2 .
  • 27. 1.3. TWO-WAY CLASSIFICATION 25 These give an expression for the supremum of the likelihood under Ω, namely sup Ω L(µ, αi, βj, ij, σ2 ) = exp − IJK 2 ·    2π IJK I i=1 J j=1 K k=1 (Yijk − ¯Yij.)2    −IJK/2 . Under HA, the parameter space is given by ωA = { (µ, βj, σ2 )| − ∞ < µ, βj < ∞, σ2 > 0 }. Similar arguments give the following expression for the supremum of the likeli- hood, sup ωA L(µ, βj, ij, σ2 |y) = exp − IJK 2 ·    2π IJK I i=1 J j=1 K k=1 (Yijk − ¯Y.j.)2    −IJK/2 Hence the generalized likelihood ratio is given by, ΛA = sup ωA L sup Ω L =        I i=1 J j=1 K k=1 (Yijk − ¯Y.j.)2 I i=1 J j=1 K k=1 (Yijk − ¯Yij.)2        −IJK/2 =        I i=1 J j=1 K k=1 (Yijk − ¯Yij.)2 + JK I i=1 ( ¯Yi.. − ¯Y...)2 I i=1 J j=1 K k=1 (Yijk − ¯Yij.)2        −IJK/2 The generalized likelihood ratio test then rejects H0 A for large values of SSA/SSW. That is we reject H0 A if SSA SSW > k, or equivalently, if FA = SSA/(I − 1) SSW/IJ(K − 1) = MSA MSW > k IJ(K − 1) I − 1 = c where c is chosen such that Pr(F > c|H0 A) = α. From the distribution of this F-statistic, which we derived earlier, it is immediately evident that c = F1−α I−1,IJ(K−1). This completes the proof to the first part of the theorem. By noting that similar restrictions have been imposed on the βj as on the αi, one will note that it is not necessary to construct the proof to part 2 ab initio.
  • 28. 26 Chapter 1. Analysis of Variance However, he need only permute some subscripts and use the appropriate degrees of freedom to complete the proof. But the reader who feels unsated with the proposed logic should convince himself by going through all the steps. The proof to the last part can be completed similarly to the proof just presented and is left as an exercise. Example 5. In an experiment to test 3 types of adhesive, 45 glass to glass specimens were set up in 3 different types of assemblies and tested for tensile strength. The types of adhesive were, 047, 00T, and 001 and the types of assem- blies were cross-lap, square-center, and round-center. Each of the 45 entries of table 1.6 represents the recorded tensile strength of the glass to glass assemblies [data from Johnson and Leone [5]]. These data can be found under dataset glass under this book’s package. Glass-Glass Assembly Adhesive Cross-Lap Square-Centre Round-Center 047 16 17 13 14 23 19 19 20 14 18 16 17 19 14 21 00T 23 24 24 18 20 21 21 12 25 20 21 29 21 17 24 001 27 14 17 28 26 18 14 14 13 26 28 16 17 27 18 Table 1.6: Table of bond strength of glass-glass assembly. Figure 1.7 shows slight symmetry, no outliers, and not enough violation of the constant variance assumption to warrant suspicion. Not the exact same can be said about figure 1.8 which calls the constant variance assumption into question. At least by now we know the risk entailed by blatantly ignoring such a clear indication of heteroscedasticity. R commands to view both figures at the same time follow. > data(glass) > par(mfrow=c(1,2)) > plot(strength~adhesive+assembly)
  • 29. 1.3. TWO-WAY CLASSIFICATION 27 Cross−lap Square−center Round−center 12 14 16 18 20 22 24 26 28 Response Figure 1.7: Boxplots for the glass data plotted according to assembly type.
  • 30. 28 Chapter 1. Analysis of Variance 047 00T 001 12 14 16 18 20 22 24 26 28 Response Figure 1.8: Boxplots for the glass data plotted according to adhesive type. Fitting a model to the raw (i.e. untransformed) data gives significant results for adhesives and interactions. But we might want to think twice before concluding that these factors are indeed significant. To this end, we seek a transformation to stabilize the variance. The square-root transformation seems to work reasonably well for us, but it is seen to greatly upset normality. Boxplots for the transformed data are not shown for purposes of space. A histogram and qqplot of residuals are shown in figure 1.9. The histogram shows a slightly ragged character, a long and fat left tail, and lack of symmetry. The qqplot also shows gross departure from linearity. The following set of commands will create figure 1.9. > m2 <- lm(strength^.5~adhesive*assembly,glass) > r <- m2$resid > par(mfrow=c(1,2)) > qqnorm(r,ylab="Ordered Residuals",main="");qqline(r,col=2) > hist(r,xlab="Residuals",main="") The Shapiro-Wilk test of normality shows that we have not lost much; it gives a p-value of 0.084, while the untransformed variable has a (slightly higher) p-value of 0.158. We also know that the F-test is robust against departures from normality. We therefore accept the square-root transformation as a good compromise. Table 1.7 summarizes the results of fitting a linear model to the
  • 31. 1.3. TWO-WAY CLASSIFICATION 29 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −2 −1 0 1 2 −1.0−0.50.00.5 Theoretical Quantiles OrderedResiduals Residuals Frequency −1.0 0.0 0.5 02468 Figure 1.9: A histogram and a normal quantile-quantile plot of the model resid- uals. transformed variable. As you should have expected, the p-values are slightly lower but the adhesives are still significant. The interactions on the other hand are slightly short of the 5% significance level. We conclude that the type of adhesive influences bond strength while the type of assembly does not. Source of variation df SS MS F p-value Adhesive 2 1.5682 0.78409 3.4548 0.0424 assembly 2 0.0749 0.03745 0.1650 0.84854 Interaction 4 2.3003 0.57509 2.5339 0.05699 Within 36 8.1704 0.22696 Total 44 12.1138 Table 1.7: Anova table for the glass to glass assembly data.
  • 32. 30 Chapter 1. Analysis of Variance 1.3.3 Multiple Comparisons 1.3.4 Randomized Complete Blocks Randomized blocks are a form of unreplicated two-way analysis of variance in which the two factors forming the design are the treatment and another factor known to have an effect on the response under investigation. This second factor is called a block. Each block is assigned all treatments at random in such a way that within each block, each treatment appears once and only once. A block effect is rarely tested in practice; of primary interest is the treatment effect since the blocks are, by assumption, already expected to have an effect. Randomized blocks were first developed for agricultural experiments and much of the terminology has remained unchanged. The term “block” was traditionally understood to refer to a block of land, but with the wide appreciation and popularity of randomized complete blocks over the years, it is now used to refer to any factor that plays an analogous role in more recent adaptations of such experiments. In a study to compare the effects of I fertilizers (or treatments in the more general case) on the yield, J blocks of land are subdivided into I homogeneous plots and the fertilizers are allocated at random to these plots. This is a classical problem for which the method of randomized complete blocks was developed. Other uses of this design can be found in several other fields. The statistical model for randomized complete block design is, Yij = µ + αi + βj + ij, where I i=1 αi = J j=1 βj = 0. The sums of squares are the same as those under the two-way additive model but with K = 1. The null hypotheses of the no treatment and no block effects are, H0 A : αi = 0 ∀i ∈ {1, . . . , I}, and H0 B : βj = 0 ∀j ∈ {1, . . . , J}, respectively. But remember that only the former is of interest. In the fertilizer experiment presented above, an experimenter will hardly be as interested in whether block A was the most productive as he would be in whether fertilizer II yielded the most crop.
  • 33. 1.4. LATIN SQUARES 31 Theorem 4. The generalized likelihood ratio test statistics for testing the null hypotheses of no treatment and block effects are given by: 1. FA = MSA MSI , where H0 A is rejected at 100(1 − α)% if FA > F1−α I−1,(I−1)(J−1), and 2. FB = MSB MSI , where H0 B is rejected at 100(1 − α)% if FB > F1−α J−1,(I−1)(J−1). Proof. The details of this proof are left to the reader. 1.4 Latin Squares Latin squares arise as natural extensions of randomized complete blocks—they are a form of three-way analysis of variance without replication. If heterogeneity is known to be two-dimensional in some investigation, then two blocking factors can be incorporated in an unreplicated design, effectively forming a square with N row blocks and N column blocks. We then speak of a row effect, a column effect, and a treatment effect. But as in randomized blocks, it is only the latter that will be of concern to the investigator. These designs have found wide application in industry because of their optimality and impressive performance. A prototype of a Latin square design is an experiment in which a fertilizer (i.e. the treatment) is to be tested at N levels on a field that is known to vary in intrinsic fertility, say, in a north-south direction and in soil depth, say, in an east-west direction. The field is then subdivided to form an N × N array of subplots and the fertilizers are randomly allocated to the subplots in both directions in such a manner that all N levels of the treatment occur once and only once in either direction. Let τi denote the differential effect of the ith row block, βj the differential effect of the jth column block and, γk the differential effect of the kth treatment. Then the statistical model is Yijk = µ + τi + βj + γk + ijk, (1.12) where N i=1 τi = N j=1 βj = N k=1 γk = 0.
  • 34. 32 Chapter 1. Analysis of Variance Theorem 5. If we assume that the random errors, ijk ∼ iidN(0, σ2 ), for i = 1, . . . , N , j = 1, . . . , N, and k = 1, . . . , N, then we have the following results: 1. SST/σ2 = N i=1 N j=1 N k=1 (Yijk − ¯Y...)2 ∼ χ2 N2−1 2. SSA/σ2 = N2 N i=1 ( ¯Yi.. − ¯Y...)2 /σ2 ∼ χ2 N−1 3. SSB/σ2 = N2 N j=1 ( ¯Y.j. − ¯Y...)2 /σ2 ∼ χ2 N−1 4. SSC/σ2 = N2 N j=1 ( ¯Y..k − ¯Y...)2 /σ2 ∼ χ2 N−1 5. SSE/σ2 = N i=1 N j=1 N k=1 (Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2 ∼ χ2 (N−1)(N−2) 6. The above variates are mutually independent. Proof. We present the proof shortly.... Theorem 6. The generalized likelihood ratio test statistics for testing the null hypotheses of no row, no column and no treatment effects are given by: 1. FA = MSA MSE , where H0 A is rejected at 100(1 − α)% if FA > F1−α N−1,(N−1)(N−2), 2. FB = MSB MSE , where H0 B is rejected at 100(1 − α)% if FB > F1−α N−1,(N−1)(N−2), and 3. FC = MSB MSE , where H0 C is rejected at 100(1 − α)% if FC > F1−α N−1,(N−1)(N−2). Proof. From the statistical model given in equation 1.12 we have, f(yijk) = 1 σ √ 2π exp − 1 2 yijk − µ − τi − βj − γk σ 2 .
  • 35. 1.4. LATIN SQUARES 33 The likelihood function takes the form, L(µ, τi, βj, γk, σ2 |y) = (2πσ2 )−N3 /2 exp    − 1 2 N i=1 N j=1 N k=1 Yijk − µ − τi − βj − γk σ 2    . Then we have, l = log L = − N3 2 log(2πσ2 ) − 1 2σ2 N i=1 N j=1 N k=1 (Yijk − µ − τi − βj − γk)2 . Under the hypothesis of no effects, we have the following parameter space. Ω = {(µ, τi, βj, γk, σ2 )| − ∞ < µ, τi, βj, γk < ∞, σ2 > 0} The maximum likelihood estimates are obtained in the usual way. From ∂l ∂µ = 1 σ2 N i=1 N j=1 N k=1 (Yijk − µ − τi − βj − γk) = 0, we have ˆµΩ = ¯Y.... ∂l ∂τi = 1 σ2 N j=1 N k=1 (Yijk − µ − τi − βj − γk) = 0, ∂l ∂βj = 1 σ2 N i=1 N k=1 (Yijk − µ − τi − βj − γk) = 0, ∂l ∂γk = 1 σ2 N i=1 N j=1 (Yijk − µ − τi − βj − γk) = 0, Finally, ∂l ∂σ2 = − N3 2σ2 + 1 2σ4 N i=1 N j=1 N k=1 (Yijk − µ − τi − βj − γk)2 = 0, gives, ˆσ2 Ω = N−3 N i=1 N j=1 N k=1 (Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2 . Putting everything together we obtain, sup Ω L = exp − N3 2 ·    2π N3 N i=1 N j=1 N k=1 (Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2    −N3 /2 .
  • 36. 34 Chapter 1. Analysis of Variance The parameter space under the first null hypothesis HA is, ωA = {(µ, βj, γk, σ2 )| − ∞ < µ, βj, γk < ∞, σ2 > 0}. Similar arguments to those above give the following supremum under ωA, sup ωA L = exp − N3 2 ·    2π N3 N i=1 N j=1 N k=1 (Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2    −N3 /2 . ΛA = sup ωA L sup Ω L =        N i=1 N j=1 N k=1 (Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2 N i=1 N j=1 N k=1 (Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2        −N3 /2 It can be shown that the numerator sum of squares can be decomposed to give,        N i=1 N j=1 N k=1 ( ¯Yi.. − ¯Y...)2 + N i=1 N j=1 N k=1 (Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2 N i=1 N j=1 N k=1 (Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2        −N3 /2 . The term in brackets simplifies to 1+SSA/SSE, hence the generalized likelihood ratio test rejects HA for large values of SSA/SSE, or equivalently, if FA = SSA/(N − 1) SSE/(N − 1)(N − 2) = MSA MSE > c , where it is easily verified that c = F1−α N−1,(N−1)(N−2). Example 6. 1.5 Summary and Addenda
  • 37. 1.6. EXERCISES 35 Source of variation df SS MS F p-value Carbon Grade 4 1787.4 446.8 2.3894 0.10888 pH 4 14165.4 3541.3 18.9370 0.00004 Quantity 4 3194.6 798.6 4.2706 0.02233 Residuals 12 2244.1 187.0 Total 24 21391.5 Table 1.8: Anova table for the purification process data. 1.6 Exercises 1. Show that...Just a template 2. Given that Yijk = µ + αi + βj + ij + ijk, show that...
  • 38. 36 Chapter 1. Analysis of Variance
  • 39. BIBLIOGRAPHY [1] Sokal, R. R., and Rohlf, F. J. (1968). Biometry: The principles and practice of statistics in biological research. Freeman [2] Cassella, B. and Berger, R. L. (1992). Statistical Inference. Duxbury [3] Miller, R. G., Jr. (1997). Beyond Anova: Basics of applied statistics. Chap- man & Hall [4] Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods. Iowa State [5] Johnson, N. L. and Leone, F. C. (1964). Statistics and Experimental Design: in Engineering and the Physical Sciences. volume II. Wiley [6] Development Core Team (2012). R: A language and environment for statis- tical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URLhttp://www.R-project.org/. 37