AnalysisOfVariance

The Analysis of Variance
Tokelo Khalema
University of the Free State
Bloemfontein
November 02, 2012

CHAPTER 1
ANALYSIS OF VARIANCE
1.1 Introduction
The analysis of variance (commonly abbreviated as ANOVA or AOV) is a
method of investigating the variability of means between subsamples resulting
from some experiment. In its most basic form it is a multi-sample generaliza-
tion of the t-test but more complex ANOVA models depart greatly from the
two-sample t-test. Analysis of variance was first introduced in the context of
agricultural investigations by Sir Ronald A. Fisher (1890–1962), but is now
commonly used in almost all areas involving scientific research.
1.2 One-way Classification
1.2.1 Normal Theory
Suppose we carry out an experiment on N homogeneous experimental units and
observe the following measurements, y1, y2, . . . , yN . Suppose also that of the N
observations, J were randomly selected to be taken under the same experimen-
tal conditions and that overall, we had I different experimental conditions. We
shall refer to these experimental conditions as treatments. The treatments
could be any categorical or quantitative variable — species, racial group, level
of caloric intake, dietary regime, blood group, genotype, etc. We therefore see
that we could subdivide the N variates into I groups (or treatments), under each
of which there are J observations. Such an experimental design in which the
number of observations or measurements per treatment are the same is termed
a balanced design. A design need not be balanced.
Denote the jth observation under the ith treatment by yij where i = 1, . . . , I
and j = 1, . . . , J. Further, assume that Yij ∼ iidN(µi, σ2
) for all i and j. It
1

2 Chapter 1. Analysis of Variance
might be helpful to visualize the experimental points as forming an array whose
ith column represents the ith treatment and jth row represents the jth ob-
servations under all the treatments. Ordinarily, measurements taken on several
homogeneous experimental units under the same experimental conditions should
differ slightly due to some unexplained measurement errors. We assume these
measurement errors to be independent and normally distributed with mean zero
and constant but unknown variance, σ2
< ∞. That is ij ∼ iidN(0, σ2
) for all i
and j. The assumption of zero mean is natural rather than arbitrary because,
on average, any deviation from the mean in any population should average out
to zero. In the analysis of variance we are interested in the overall variability
of the µi about the grand population mean µ. This implies a fixed differential
effect αi = µi − µ (or deviation from the grand mean), due to treatment i. The
above arguments and assumptions lead us to the following linear model,
Yij = µ + αi + ij = µi + ij (1.1)
for i = 1, . . . , I and j = 1, . . . , J, which describes the underlying data-generating
process. It is easy to show that if αi, as in equation 1.1, is to be interpreted as
the differential effect of the ith treatment, then we have the following constraint,
I
i=1
αi = 0 . (1.2)
The constraint in equation 1.2 above is termed a model identification con-
dition. Without it the model we just formulated is said to be unidentifiable
1
. Different interpretations of the αi lead to different constraints and different
model parametrizations. In the sequel we shall stick to the parametrization
above. Equation 1.1 is usually referred to as the one-way fixed effects model,
or Model I. One-way because the data are classified according to one factor, viz.,
treatment and the term “fixed” arises from the fact that we have assumed the
αi to be fixed instead of random, in which case we would have had a random
effects model or Model II. Later we introduce Model II and demonstrate how
it can be used in practice.
The null hypothesis in the analysis of variance model given in equation 1.1
is that the treatment means are all equal; the alternative is that at least one
pair of means is different. That is
H0 : µi = µj , ∀i = j (1.3)
HA : µk = µl for at least one combination of values k = l.
But since µi = µ + αi, we see that µi = µj where i = j implies that αi = 0 for
all i. This gives an equivalent form of the hypotheses given above, namely
H0 : αi = 0, ∀i ∈ {1, . . . , I} (1.4)
HA : αi = 0 for at least one i ∈ {1, . . . , I}.
1Identifiability is a desirable property of models. A model is called identifiable if all its
parameters can be uniquely estimated and inferences can be drawn from it.

1.2. ONE-WAY CLASSIFICATION 3
This formulation is more commonly met with and is arguably more intuitive
—in words the null hypothesis says that there are no differential effects due to
treatments. Or simply, that there are no treatment effects. So any apparent
differences in sample means is not attributable to the treatments but to random
selection. The alternative hypothesis says that there is at least one treatment
with a differential effect—the negation of the null hypothesis.
Before we present the mathematical derivations of the analysis of variance, let
us consider one practical example of an experiment which should be recognized
as a numerical example of the more general design outlined above. This ex-
ample was taken from a classic reference by Sokal and Rohlf (1968) [1]. Sokal
tested 25 females of each of three lines of Drosophila for significant differences
in fecundity among the three lines. The first of these lines was was selected for
resistance against DDT, the second for susceptibility to DDT, and the third was
a nonselected control strain. This is a balanced design with I = 3 treatments,
J = 25 observations per treatment, and should also be recognized as Model I
since the exact nature of the treatments was determined by the experimenter.
The data are summarized in table 1.1 in which the response is the number of
eggs laid per female per day for the first 14 days of life.
We might want to compute the treatment sample means as a preliminary
check on the heterogeneity among group means. Dataset drosophila contains
the data presented in table 1.1. In R we issue the following commands:
> library(khalema)
> data(drosophila)
> attach(drosophila)
> tapply(fecundity,line,mean)
1 2 3
25.256 23.628 33.372
The first three commands should be old news by now. The first loads pack-
age khalema, the second accesses dataset drosophila and, the third makes the
variables in drosophila available on the search path. The final command com-
putes the sample mean under each of the 3 treatments.
Note that the mean under the nonselected treatment is appreciably higher than
those under the other treatments. Of interest in the analysis of variance is
whether this difference is statistically significant or just a result of noise in the
data.
In deriving a test to investigate the significance of group sample mean dif-
ferences we will need some statistics and their corresponding sampling distri-
butions. Among these are the overall average and the average under the ith
treatment denoted,
¯Y.. =
I
i=1
J
j=1
Yij/N,

Resistant Susceptible Nonselected
12.8 38.4 35.4
21.6 32.9 27.4
14.8 48.5 19.3
23.1 20.9 41.8
34.6 11.6 20.3
19.7 22.3 37.6
22.6 30.2 36.9
29.6 33.4 37.3
16.4 26.7 28.2
20.3 39.0 23.4
29.3 12.8 33.7
14.9 14.6 29.2
27.3 12.2 41.7
22.4 23.1 22.6
27.5 29.4 40.4
20.3 16.0 34.4
38.7 20.1 30.4
26.4 23.3 14.9
23.7 22.9 51.8
26.1 22.5 33.8
29.5 15.1 37.9
38.6 31.0 29.5
44.4 16.9 42.4
23.2 16.1 36.6
23.6 10.8 47.4
Table 1.1: Number of eggs laid per female per day for the 1st
14 days of life.
and
¯Yi. =
J
j=1
Yij/J,
respectively. Recall that N = IJ is the total number of observations. We deﬁne
the following statistic which should be interpreted as summarizing the total
variability in the sample,
SST =
I
i=1
J
j=1
(Yij − ¯Y..)2
.
This is called the total sum of squares. But the total variability in a sample
can be partitioned into variability within treatments and variability between
treatments. In fact, it can easily be shown that
SST = SSB + SSW (1.5)

where
SSB = J
I
i=1
( ¯Yi. − ¯Y..)2
and
SSW =
I
i=1
J
j=1
(Yij − ¯Yi.)2
denote the sum of squares between and the sum of squares within treatments
respectively. The statistic SSB summarizes variation in the sample attributable
to treatment; SSW summarizes variation attributable to error and is sometimes
written SSE. Note that under the assumption of homoscedastic variance, each
of the I terms,
J
j=1
(Yij − ¯Yi.)2
/(J − 1),
furnishes an estimate of the error variance, σ2
. It is thus reasonable to estimate
σ2
by pooling these terms together to obtain the pooled estimate of the common
variance,
s2
p =
1
I(J − 1)
I
i=1
J
j=1
(Yij − ¯Yi.)2
=
SSW
I(J − 1)
.
The reader will recall that if Yi ∼ iidN(µ, σ2
) for i = 1, . . . , n then,
(n − 1)S2
/σ2
∼ χ2
n−1, (1.6)
where
S2
=
n
i=1
(Yi − ¯Y )2
/(n − 1)
denotes the sample variance and
¯Y =
n
i=1
Yi/n
the sample mean. This now familiar result will be an important template in
proving the following theorem.

Theorem 1. Under the assumption that the random errors, ij ∼ iidN(0, σ2
),
for i = 1, . . . , I and j = 1, . . . , J, we have the following results:
1. SST/σ2
=
I
i=1
J
j=1
(Yij − ¯Y..)2
/σ2
∼ χ2
N−1, if H0 : αi = 0 ∀i is true,
2. SSW/σ2
=
I
i=1
J
j=1
(Yij − ¯Yi.)2
/σ2
∼ χ2
I(J−1), whether or not H0 is true,
3. SSB/σ2
=
I
i=1
J
j=1
( ¯Yi. − ¯Y..)2
/σ2
∼ χ2
I−1, if H0 : αi = 0 ∀i is true, and
4. SSW/σ2
and SSB/σ2
are independently distributed.
Proof. To prove the first part of the theorem we note that if H0 is true, then
we have a common mean µ under each treatment and thus Yij ∼ iidN(µ, σ2
) for
i = 1, . . . , I and j = 1, . . . , J. Accordingly,
I
i=1
J
j=1
(Yij − ¯Y..)2
/(N − 1)
denotes the sample variance of a sample of size N = IJ from a N(µ, σ2
) popu-
lation, hence using the result given in equation 1.6 concludes the proof.
For the second part we note that,
J
j=1
(Yij − ¯Yi.)2
/(J − 1)
denotes the sample variance of the ith treatment, hence, whether or not H0 is
true,
J
j=1
(Yij − ¯Yi.)2
/σ2
∼ χ2
J−1 independently for all i = 1, . . . , I.
Summing all I of these terms and using the property of the sum of independent
Chi-square random variables yields the stated result.
Further, if H0 is true, the third part results from the subtraction property of
the Chi-square distribution. Lastly, to proof the independence of...
In addition to the statistics we have defined thus far, it is customary to
define the mean square due to treatment and the mean square due to error as,
MSB = SSB/(I − 1) and,
MSW = SSW/I(J − 1),

respectively. We are now in a position to derive a test for the hypotheses
H0 : αi = 0, ∀i ∈ {1, . . . , I}
versus
HA : αi = 0 for at least one i ∈ {1, . . . , I}.
In the following theorem we use the statistics deﬁned above and their sampling
distributions to derive the generalized likelihood ratio test for H0 and HA.
Theorem 2. The generalized likelihood ratio test statistic for testing the null
hypothesis of no treatment eﬀects as in equation 1.4 is given by:
F =
MSB
MSW
,
and H0 is rejected at 100(1 − α)% if F > F1−α
I−1,I(J−1).
Proof. Recall from our earlier discussion that in addition to some distributional
assumptions we assumed the following:
Yij = µ + αi + ij,
where the restriction
I
i=1
αi = 0
is imposed on the αi. It follows then that, for i = 1, . . . , I and j = 1, . . . , J,
f(yij) =
1
σ
√
2π
exp −
1
2
yij − µ − αi
σ
2
From independence of the yij we have the following likelihood,
L(µ, αi, σ2
|y) = (2πσ2
)−IJ/2
exp



−
1
2σ2
I
i=1
J
j=1
(Yij − µ − αi)2



(1.7)
and log-likelihood
l = log L = −
IJ
2
log(2πσ2
) −
1
2σ2
I
i=1
J
j=1
Under the alternative hypothesis we have the following parameter space,
Ω = {(µ, αi, σ2
)| − ∞ < µ, αi < ∞, σ2
> 0}.

Differentiating the log-likelihood with respect to µ and equating the derivative
to zero gives,
∂l
∂µ
=
1
σ2
I
i=1
J
j=1
(Yij − µ − αi) = 0,
which implies that
ˆµΩ = ¯Y..
Once again we differentiate with respect to αi to obtain,
∂l
∂αi
=
1
σ2
J
j=1
(Yij − µ − αi) = 0.
This yields
ˆαiΩ
= ¯Yi. − ¯Y..
Finally we differentiate with respect to σ2
and proceed just as we did above.
We have,
∂l
∂σ2
= −
IJ
2σ2
+
1
2σ4
I
i=1
J
j=1
= 0,
which gives the following MLE,
ˆσ2
Ω = N−1
I
i=1
J
j=1
(Yij − ¯Yi.)2
Substituting these estimates into equation 1.7 we have the following likelihood
supremum under H1,
sup
Ω
L(µ, αi, σ2
|y) = exp −
IJ
2
·



2π
IJ
I
i=1
J
j=1
(Yij − ¯Yi.)2



−IJ/2
.
Under the null hypothesis we have one less parameter since the αi are hypoth-
esised to be zero. The parameter space is,
ω = {(µ, σ2
)| − ∞ < µ < ∞, σ2
> 0}.
In this case we maximize the following log-likelihood,
l = log L = −
IJ
2
log(2πσ2
) −
1
2σ2
I
i=1
J
j=1
(Yij − µ)2
.
It is left to the reader to show that the parameter estimates in this case are,
ˆµω = ¯Y..

and
ˆσ2
ω = N−1
I
i=1
J
j=1
(Yij − ¯Y..)2
The likelihood supremum is then given by,
sup
ω
L(µ, σ2
|y) = exp −
IJ
2
·



2π
IJ
I
i=1
J
j=1
(Yij − ¯Y..)2



−IJ/2
.
After some cancellation and the use of the identity we established earlier, the
generalized likelihood ratio test statistic takes the following form,
Λ =
sup
ω
L
sup
Ω
L
=







I
i=1
J
j=1
(Yij − ¯Y..)2
I
i=1
J
j=1
(Yij − ¯Yi.)2







−N/2
=







I
i=1
J
j=1
(Yij − ¯Yi.)2
+ J
I
i=1
( ¯Yi. − ¯Y..)2
I
i=1
J
j=1
(Yij − ¯Yi.)2







−N/2
.
The generalized likelihood ratio test rejects H0 for small values of Λ and we
see that small values of Λ correspond to large values of SSB/SSW . That is we
reject H0 if
SSB
SSW
> k
or if
F =
SSB/(I − 1)
SSW/I(J − 1)
=
MSB
MSW
> k
I(J − 1)
I − 1
= c
where c is chosen such that Pr(F > c|H0) = α, the desired type I error. But we
have already derived the null distribution of F from which we have,
c = F1−α
I−1,I(J−1)
or the 100(1−α) percentile of the F-distribution with I −1 and I(J −1) degrees
of freedom. This completes of the proof.
The reader who closely followed the foregoing proof should have been aware that
the likelihood ratio test statistic would not have been arrived at had the iden-
tiﬁcation condition not been taken into account. We see then that inferences

cannot be drawn from an unidentifiable model. In fact, this is what unidentifi-
able means in statistical literature. Cassella & Berger (1992) [2] touch lightly
on model identification.
For obvious reasons, the test just derived is called the F-test. We will proceed
to demonstrate how it can be applied in practice.
Example 1. Consider the data presented earlier in table 1.1. It is vital to test
for any significant violations of model assumptions before we draw inferences.
First let us test the validity of the constant variance assumption. Figure 1.1
affords a visual check on the group variances. There is not much reason to
believe that the constant variance assumption could be unduly flawed. The
distributions also look reasonably symmetrical, hence normal theory could be
applied safely.
Resistant Susceptible Nonselected
10
15
20
25
30
35
40
45
50
Response
Figure 1.1: Side-by-side boxplots for the Drosophila fecundity data.
We proceed with the analysis and calculate the sum of squares, mean squares,
and the F-statistic. In R the command to fit the linear model is:
> lm(fecundity~line,drosophila)
And the command,
> anova(lm(fecundity~line,drosophila))

Source of variation df SS MS F p-value
Between 2 1362.2 681.11 8.6657 0.0004
Within 72 5659.0 78.60
Total 74 7021.2
Table 1.2: Anova table for the Drosophila fecundity data.
gives the anova table. An anova table compactly summarizes the results of an
F-test.
From the table above, the F-statistic is significant at a level of 5%. Say the
p-value was not reported, as would be the case if one were not using a computer.
Then we would refer to the F table in the appendix, approximate F2,72(.97) by
F2,62(.97) and report
p-value = Pr(F ≥ 8.6657) < Pr(F ≥ 3.15) = 5%.
But before we run into conclusions we test the validity of the distributional
assumption of the random errors. To estimate these, we plug in the MLE’s of
µ and αi into equation 1.1 to obtain,
îj = Yij − ¯Y.. − ¯Yi. + ¯Y.. = Yij − ¯Yi.
for i = 1, . . . , I and j = 1, . . . , J. These are termed model residuals. By virtue
of the invariance property of maximum likelihood estimates, îj furnishes a
maximum likelihood estimate of ij. We are interested in testing whether these
residuals can be considered as Gaussian white noise. But recall that maximum
likelihood estimates are asymptotically normal. To obtain the residuals in R we
issue the command below:
> Residuals <- lm(fecundity~line,drosophila)$residuals
but this is only one of several ways to obtain model residuals in R. A look at
figure 1.2 shows that the residuals are not far from normal. In particular, the
histogram shows a sense of symmetry about zero. Hence we can safely read the
anova table and conclude that the F-test conclusively rejects the null hypothesis
of no treatment effects. In ordinary parlance this means that of the I = 3 lines,
at least one was much more or much less fecund than the rest. Figure 1.1 reveals
that the nonselected line had much more fecundity than the resistant and the
susceptible lines.
At this point we find it worthwhile to interpolate some comments on the
assumptions underlying the analysis of variance which should always be borne
in mind each time an analysis of variance is carried out. We assume that in the
model given in equation 1.1, we have,
1. normally distributed random errors ij,

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 −1 0 1 2
−20−1001020
Theoretical Quantiles
OrderedResiduals
Residuals
Frequency
−20 0 10 20
051015
Figure 1.2: A histogram and a normal quantile-quantile plot of the model resid-
uals.
2. constant (or homoscedastic) error variance σ2
, and
3. independent random errors.
The assumption of normality is not a particularly stringent one. The F-test
has been shown to be robust against mild to moderate departures from normal-
ity, especially if the distribution is not saliently skewed. Several good tests of
normality exist in the literature. The Shapiro-Wilk test is one of those most
commonly used in practice. Its R directive is shapiro.test() and its null
hypothesis is that the sample comes from a normal parent distribution. Apply-
ing this test on the residuals from our previous example we obtain a p-value of
0.45. So the Shapiro-Wilk test conclusively accepts the hypothesis of normally
distributed random errors. You will recall from example 1 that we were quite
content with the validity of the normality assumption from the qq-plot and the
histogram created therein. In examples to follow, we shall stick to the same
diagnostic procedure with the hope that any undue departures from normality
will be noticed by the naked eye and not bother ourselves with carrying out the
normality test.
The problem of heteroscedasticity (or nonconstant variance) has slightly dif-
ferent implications depending on whether a design is balanced or otherwise. In
the former case, slightly lower p-values than actual ones will be reported; in the

latter, higher or lower p-values than actual ones will be reported according as
large σ2
i are associated with large ni, or large σ2
i are associated with small ni
(see Miller (1997) [3] pp. 89-91).
While there will usually be remedies to non-normality and heteroscedastic vari-
ance, dependence of errors will usually not be amenable to any alternative
method available to the investigator, at least if it is in the form of serial corre-
lation. Dependence due to blocking, on the other hand, can easily be handled
by adding an extra parameter to the model to represent the presence of block-
ing. We will see later how blocking can purposely be introduced to optimize an
experimental plan. It has been shown (see...) that if there is serial correlation
within (rather than across) samples, then the significance level of the F-test
will be smaller or larger than desired according as the correlation is negative or
positive. The presence of serial correlation of lag 1 can be detected by visually
inspecting plots of variate pairs (yij, yi,j+1). The hope should be not to spot
any apparent linear relationship between the lagged pairs if the F-test is to be
employed.
Outliers can also be a nuisance in applying the F-test. Since the sample mean
and variance are not robust against outliers, such outlying observations can
greatly augment the within-group mean square which in turn would render the
F−test conservative2
. Usually no transformation will remedy the situation of
outlying observations. One option to deal with outliers would be to use the
trimmed mean in the calculation of the sum of squares. Another is the use of
nonparametric methods. We discuss nonparametric methods in section 1.2.3.
Usually for a design to yield observations that have all three of the charac-
teristics enumerated above, the experimenter should ensure random allocation
of treatments. That is, experimental units must be allocated at random to the
treatments. Randomization is very critical in all of experimental design. It also
makes possible the calculation of unbiased estimates of the treatment effects.
One important concept that has thus far only received brief mention is that
of unbalanced designs. If in stead of the same number J of replicates under
each treatment we suppose that we have ni observations under treatment i,
where the ni need not be equal, then it can easily be shown that the identity in
equation 1.5 becomes
I
i=1
ni
j=1
(Yij − ¯Y..)2
=
I
i=1
ni( ¯Yi. − ¯Y..)2
+
I
i=1
ni
j=1
(Yij − ¯Yi.)2
.
Otherwise the analysis remains the same as in the balanced design and an
analogous F-test can be derived. The next example, adapted from Snedecor
& Cochran (1980) [4], illustrates points we made in the last few paragraphs
including the possibility of an unbalanced design.
Example 2. For five regions in the United States in 1977, public school ex-
penditures per pupil per state were recorded. The data are shown in table 1.3.
2A conservative test is “reluctant” to reject—i.e. it has a smaller type I error than desired.

South North Mountain
Northeast Southeast Central Central Pacific
1.33 1.66 1.16 1.74 1.76
1.26 1.37 1.07 1.78 1.75
2.33 1.21 1.25 1.39 1.60
2.10 1.21 1.11 1.28 1.69
1.44 1.19 1.15 1.88 1.42
1.55 1.48 1.15 1.27 1.60
1.89 1.19 1.16 1.67 1.56
1.88 1.26 1.40 1.24
1.86 1.30 1.51 1.45
1.99 1.74 1.35
1.53 1.16
Table 1.3: Public school expenditures per pupil per state (in $1 000).
Otherwise for R users the relevant data-frame is named pupil. The question
of interest is the same old one, namely, are the region to region expenditure
differences statistically significant or are they due to chance alone?
Figure 1.3 shows that the distribution cannot be judged to be very symmet-
rical, nor can we be overly optimistic about constant variance. Since overall,
there is not too much skewness, it is about the latter that we should be most
worried. No outliers are visible so there really is not much that calls normal
theory into question. The R command for creating the plot in figure 1.3 is
plot(expenditure~region,pupil).
We seek now for an appropriate variance stabilizing transformation. Since all
the values are nonnegative, we could try the log-transformation, or even the
square-root transformation. A plot of the log-transformed data is shown in
figure 1.4.
The log-transformed distribution does not look vaguely more symmetrical.
After a few trials, we finally take the reciprocal of the square of the observations,
which yields the plot depicted in figure 1.5.
This time the variance looks reasonably constant across treatments. A little
question mark over symmetry remains though. But there is not strong enough
skewness to warrant too much concern. To investigate this further, we create a
normal qqplot and a histogram of residuals. These are shown in figure 1.6 from
which we see a slight deviation from normality.
But earlier we pointed out that the F-test is not too sensitive to moderate
departures from normality. The anova table on the transformed response is ob-
tained by issuing the command, anova(lm(expenditure^-2~region,pupil))
in R, and is shown in table 1.4.
From table 1.4 we see a highly significant F-statistic. That is, strong evidence
suggests that expenditures vary from region to region.

Northeast Southeast S. Central N. Central M. Pacific
1.2
1.4
1.6
1.8
2
2.2
Response
Figure 1.3: Side-by-side boxplots for the public school expenditures data.
Between 4 0.78114 0.195285 11.62 0.0000
Within 43 0.72263 0.016805
Total 47 1.50377
Table 1.4: Anova table for the expenditures per pupil per state data.
1.2.2 Multiple Comparisons
Despite all its merits, the omnibus F-test is not without deficiencies of its own.
From the previous example we concluded that expenditures varied from region to
region. For all we know, such a conclusion could have been reached because only
one of the regions had a sample mean much greater or less than the rest. Usually
we would be interested in knowing which pair of groups differ significantly. The
current section addresses this problem by introducing commonly used methods
of multiple comparisons that can be used in lieu of the omnibus F-test, or
after the F-test has rejected the null hypothesis. It was shown earlier that two
treatment means, µi and µi , can be concluded to be different at level α if the

0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Log−transformedResponse
Figure 1.4: Side-by-side boxplots for the log-transformed data.
100(1 − α)% conﬁdence interval for their diﬀerence,
¯Yi. − ¯Yi . ± tν,1−α/2sp
1
ni
+
1
ni
, (1.8)
does not contain zero, or equivalently, if
| ¯Yi. − ¯Yi .| > tν,1−α/2sp
1
ni
+
1
ni
.
If all k = I
2 intervals are to be considered as a family, the statement given by
equation 1.8 above does not hold with probability 1 − α; the coverage proba-
bility, or as commonly called, the family-wise rate (FWR), will be lower. For
the special case of ni = ni = J, one commonly used remedial measure was
developed by John Tukey. He showed that the variate,
max
i,i
|( ¯Yi. − µi) − ( ¯Yi . − µi )|
sp/
√
J
,
follows the so-called Tukey studentized range distribution with parameters I and
I(J − 1), where the pooled sample variance s2
p equals the mean square of error.
If we denote the 100(1 − α) percentile of this distribution by qI,I(J−1)(α), then
we have the following probability statement,

0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Reciprocal−of−square−transfomedResponse
Figure 1.5: Side-by-side boxplots for the reciprocal-of-square-transformed data.
Pr max
i,i
|( ¯Yi. − µi) − ( ¯Yi . − µi )| ≤ qI,I(J−1)(α)sp/
√
J = 1 − α, (1.9)
from which we obtain the following family of confidence intervals of the differ-
ences µi − µi ,
¯Yi. − ¯Yi . ± qI,I(J−1)(α)sp/
√
J,
with family-wise error rate exactly equal to α. Accordingly, any pair of treat-
ment sample means will be significantly different at level α if
| ¯Yi. − ¯Yi .| > qI,I(J−1)(α)sp/
√
J.
Methods to deal with unbalanced designs have also been devised. One
method that gives very good results despite its crudity is due to Bonferroni.
From the Bonferroni equality, it can be shown that to ensure a family-wise er-
ror rate of at most α, then each of the k tests of µi = µi should be carried
out at significance level α/k. Where N =
I
i=1 ni denotes the total number of
observations, we then have the following family of confidence intervals,
¯Yi. − ¯Yi . ± t
α/2k
N−I sp
1
ni
+
1
ni
, where k =
I
2
,

q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−2 −1 0 1 2
−0.2−0.10.00.10.2
OrderedResiduals
Residuals
Frequency
−0.3 −0.1 0.1 0.3
02468101214
uals.
which should have coverage probability of at least 1 − α. We call these the
Bonferroni confidence intervals. Let us consider an example.
Example 3. Since the previous example dealt with unequal sample sizes, we
employ the Bonferroni method to carry out multiple comparisons. We calculated
the pooled sample variance to be s2
p = MSE = .017. We also have k = 10
comparisons. According to Bonferroni’s method, a pair of sample means (of
sizes ni and ni ) that differ by an absolute amount greater than
.1296 × t43(.9975) ×
1
ni
+
1
ni
,
will be considered significantly different at level α = .05. Following are R
commands to compute and compactly display absolute differences of all possible
combinations of sample means in a 5 by 5 array.
> data(pupil)
> attach(pupil)
> X <- Y <- tapply(expenditure^-2,region,mean)
> diff <- abs(outer(X,Y,"-"));diff
Consider for instance, the Mountain Pacific and North Central regions. The
absolute value of their sample means difference is 0.033, which is far less than

the critical value of .164. In fact, the 99.75% confidence interval of the difference
of means can be shown to be (−0.130, 0.197) or (−0.197, 0.130), depending on
how the difference is taken. This interval obviously contains zero. So the two
regions’ levels of expenditure cannot be considered to be statistically different.
Next, let us consider the Northeast and South Central regions. Their sample
means differ by an absolute amount of 0.366, which exceeds the critical value
of .176. The corresponding confidence interval is (.190, .543) or (−.543, −.190).
The last 8 comparisons can be made similarly. The reader will find that, overall,
4 pairs are significantly different, viz., Northeast and South Central, Northeast
and Southeast, South Central and Mountain Pacific, and South Central and
North Central.
Other commonly used multiple comparison methods for unbalanced designs
include that due to Sheffé and a variant of Tukey’s method which we discussed
earlier, called the Tukey-Kramer method. Both give conservative results, as does
the Bonferroni method. Because of their “conservatism”, one should consider
using Tukey’s method whenever a balanced design is dealt with, which should
give shorter confidence intervals. Sheffé’s confidence intervals for the difference
µi − µi are given by,
¯Yi. − ¯Yi . ± sp (I − 1)Fα
I−1,N−I
1
ni
+
1
ni
,
where Fα
I−1,N−I denotes the 100(1 − α) percentile of the F-distribution with I
and N − I degrees of freedom. On applying Sheffé’s method to the expenditure
data, it is striking to see that we reach similar conclusions as those reached in
example 3 above under Bonferroni’s method. But Sheffé’s intervals are signifi-
cantly broader.
It is still not too clear whether the Tukey-Kramer method gives intervals with
coverage probability of at least 1−α or approximately 1−α. But it too gives re-
sults good enough to merit its mention. Confidence intervals under this method
are given by,
¯Yi. − ¯Yi . ± qI,N−I(α)sp
1
2
1
ni
+
1
ni
.
An abundance of other multiple comparison procedures have been proposed but
not all are good enough to enter the fray.
1.2.3 Nonparametric Methods
If the assumptions underlying the analysis of variance do not hold and no trans-
formation is available to make the F-test more applicable, nonparametric meth-
ods are often used in stead. The Kruskal-Wallis test is by far the most com-
monly used nonparametric analog of the one way analysis of variance. Unlike
the F-test, it makes no distributional assumptions about the observations; for
it to be applicable, the observations need only be independent.

In this method, we denote by Rij, the rank of yij in the combined sample of all
N =
I
i=1 ni observations. Then define
¯Ri. =
I
i=1
Rij/ni,
and
¯R.. =
I
i=1
¯Ri./N,
as the average rank score of the ith sample and the grand rank score, respec-
tively. Finally we compute the following statistic,
K =
12
N(N + 1)
I
i=1
ni( ¯Ri. − ¯R..)2
=
12
N(N + 1)
I
i=1
ni
¯R2
i. − 3(N + 1),
which has been shown to have a limiting χ2
distribution with I − 1 degrees of
freedom under the null hypothesis of equal location parameters under each of
the I groups. The null hypothesis is rejected for large values of K.
Just as in the two sample case, tied observations will be assigned average ranks.
The K-statistic defined above should perform reasonably well if there are not
too many ties. Otherwise some correction factor will have to be applied.
Example 4. Table 1.5 presents ranks of the expenditure data from example 2.
From these data we calculate a highly significant value of K = 21.83. The R
command to compute the p-value is, 1-pchisq(21.83,4).
It is well to realize the sum of squares occurring in the expression for the K-
statistic as the between-groups sum of squares in the analysis of variance. Then
the value of K can easily be calculated by performing the usual analysis of vari-
ance on the ranks and then multiplying the between-groups sum of squares by
12/N(N + 1).
The Kruskal-Wallis test has an implementation in R. However, it will usually
give a different value for K than that obtained from using the foregoing ex-
pression. This is because in calculating the statistic, R uses some weights that
will make the distribution of the K-statistic as χ2
as possible. Here are the R
commands and output for the previous example.
> kruskal.test(expenditure,region,data=pupil)
Kruskal-Wallis rank sum test
data: expenditure and region
Kruskal-Wallis chi-squared = 24.0387, df = 4, p-value = 7.846e-05

1.3. TWO-WAY CLASSIFICATION 21
South North Mountain
Northeast Southeast Central Central Pacific
18 33 6 36.5 39
13.5 20 1 40 38
47 9.5 12 21 31.5
46 9.5 2 16 35
24 8 3.5 42.5 23
29 26 3.5 15 31.5
44 8 6 34 30
42.5 13.5 22 11
41 17 27 25
45 36.5 19
28 6
Table 1.5: Ranks of the Public school expenditures data.
Since the Kruskal-Wallis test works with ranks rather than actual numerical
values of the observations, it will greatly eliminate the effect of outliers. In
practice, one will usually resort to this test if there are too many outliers in the
data, if normal theory is not applicable, or if the data are already in the form
of ranks.
1.3 Two-way Classification
1.3.1 Introduction
Up to this point we have assumed, at least tacitly, that the experiments we
deal with yield observations that can only be grouped according to one factor.
This need not be the case; several factors can be considered simultaneously.
For example, consider an experiment in which the amount of milk produced by
a hundred cows is studied. It is natural to consider breed and age-group as
possible factors in such a study. There could also be a third, and even a fourth
factor, etc., all of which are considered simultaneously. We introduce herein
methods of analyzing such experimental designs. We will only treat the case of
two factors in which case the design is called two-way analysis of variance , but
the reader should, however, be aware that the order of classification is abitrary.
In the general case we speak of N-way analysis of variance.
For the ease of reference we shall call the factors with which we deal, factor
A and factor B. It is also common in the literature to call these row and column
factors. It is natural then to speak of a treatment/column or row effect according
as the effect due to factor A or that due to factor B is referred to. Treatment
and row effects are also referred to as main efects to distinguish them from the
so-called interaction effect. We explain what interaction means shortly.

1.3.2 Normal Theory
The analysis in the two-way classification departs slightly from that in the one-
way classification as more variables come into play. In particular, the...occassions
the need to extend our notation from the previous sections. If we assume that
factor A has I levels and factor B has J levels and that in the cell determined
by level i of factor A and level j of factor B there are k observations (or repli-
cations), then we use yijk to symbolize the kth observation under such a cell.
If each of factors A and B contributes to the response variable an amount inde-
pendent of that contributed by the other, the model is termed as an additive
model and is formulated,
Yijk = µ + αi + βj + ijk, (1.10)
with identification conditions,
I
i=1
αi = 0,
and
J
j=1
βj = 0,
where i = 1, . . . , I and j = 1, . . . , J. Just as before, the random errors, ijk, are
assumed to be independently and identically normally distributed about zero
mean with constant variance σ2
.
If the contribution to the response variable by factor A depends on the
level of factor B, or conversely, then the simple additive model is not totally
representative of the design and a phenomenon called interaction is said to
exist. We introduce another variable, ij, that will represent this interaction
effect. Hence for example, 23 will be negative or positive according as factors
A and B have opposing or synergistic effects under level 2 of factor A and level
3 of factor B. This full model which takes interaction into account is given by,
Yijk = µ + αi + βj + ij + ijk, (1.11)
with identification conditions,
I
i=1
αi = 0,
J
j=1
βj = 0,
and
I
i=1
ij =
J
j=1
ij = 0,

where i = 1, . . . , I and j = 1, . . . , J.
In addition to testing the significance of the main effects in two-way analysis
of variance (or any factorial anova for that matter), there is need to also test
for interaction effects. We thus have a total of three null hypotheses to test.
In dealing with many null hypotheses we will have reason to vary our usual
notation. Specifically, we superscript each null hypothesis with a naught to
avoid confusing HA for an alternative hypothesis, for instance. That is the no
main effects null hypotheses are denoted,
H0
A : αi = 0 ∀i ∈ {1, . . . , I},
and
H0
B : βj = 0 ∀j ∈ {1, . . . , J},
and the no interaction effect null hypothesis is written,
H0
I : ij = 0 for all combinations of i and j.
In anticipation of their need ahead, we give expressions for the sums of
squares, which are a little more involved than those in the one-way layout.
Also, some identities and statistics other than the sum of sqaures which will
provide tests of the hypotheses stated above will be derived just as we did in
the one-way layout.
The next theorem constructs a generalized likelihood ratio test for H0
A, H0
B,
and H0
I .
Theorem 3. The generalized likelihood ratio test statistics for testing the null
hypotheses of no main and interaction effects are given by:
1.
FA =
MSA
MSE
,
where H0
A is rejected at 100(1 − α)% if FA > F1−α
I−1,IJ(K−1),
2.
FB =
MSB
MSE
,
where H0
B is rejected at 100(1 − α)% if FB > F1−α
J−1,IJ(K−1), and
3.
FI =
MSI
MSE
,
where H0
I is rejected at 100(1 − α)% if FI > F1−α
(I−1)(J−1),IJ(K−1).
Proof. Since a complete proof to each part of the theorem can easily span two
and half pages, we will proof the first part and leave the last two to the reader.
We have for i = 1, . . . , I, j = 1, . . . , J, and k = 1, . . . , K,
f(yijk) =
1
σ
√
2π
exp −
1
2
yijk − µ − αi − βj − ij
σ
2
.

Thus the likelihood is given by
L(µ, αi, βj, ij, σ2
|y) =(2πσ2
)−IJK/2
×
exp



−
1
2
I
i=1
J
j=1
K
k=1
Yijk − µ − αi − βj − ij
σ
2



,
from the assumption of independence. For ease of maximization we use the
log-likelihood,
l = log L = −
IJK
2
log (2πσ2
) −
1
2σ2
I
i=1
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij)2
.
The parameter space under the general alternative hypothesis which states that
all effects are non-zero is denoted,
Ω = { (µ, αi, βj, ij, σ2
)| − ∞ < µ, αi, βj, ij < ∞, σ2
> 0 }.
Proceeding to find the ML estimates under Ω we have,
∂l
∂µ
=
1
σ2
I
i=1
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij) = 0,
which implies that ˆµΩ = ¯Y.... Similarly, it is easily verified that
∂l
∂αi
=
1
σ2
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij) = 0
implies ˆαiΩ = ¯Yi.. − ¯Y....
∂l
∂βj
=
1
σ2
I
i=1
K
k=1
(Yijk − µ − αi − βj − ij) = 0,
yields ˆβiΩ
= ¯Y.j. − ¯Y.... Likewise,
∂l
∂ ij
=
1
σ2
K
k=1
(Yijk − µ − αi − βj − ij) = 0
implies îjΩ
= ¯Yij. − ¯Yi.. − ¯Y.j. + ¯Y.... Finally
∂l
∂σ2
= −
IJK
2σ2
+
1
2σ4
I
i=1
J
j=1
K
k=1
(Yijk − µ − αi − βj − ij)2
= 0
yields
ˆσ2
Ω = N−1
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2
.

These give an expression for the supremum of the likelihood under Ω, namely
sup
Ω
L(µ, αi, βj, ij, σ2
) = exp −
IJK
2
·



2π
IJK
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2



−IJK/2
.
Under HA, the parameter space is given by
ωA = { (µ, βj, σ2
)| − ∞ < µ, βj < ∞, σ2
> 0 }.
Similar arguments give the following expression for the supremum of the likeli-
hood,
sup
ωA
L(µ, βj, ij, σ2
|y) = exp −
IJK
2
·



2π
IJK
I
i=1
J
j=1
K
k=1
(Yijk − ¯Y.j.)2



−IJK/2
Hence the generalized likelihood ratio is given by,
ΛA =
sup
ωA
L
sup
Ω
L
=







I
i=1
J
j=1
K
k=1
(Yijk − ¯Y.j.)2
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2







−IJK/2
=







I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2
+ JK
I
i=1
( ¯Yi.. − ¯Y...)2
I
i=1
J
j=1
K
k=1
(Yijk − ¯Yij.)2







−IJK/2
The generalized likelihood ratio test then rejects H0
A for large values of SSA/SSW.
That is we reject H0
A if
SSA
SSW
> k,
or equivalently, if
FA =
SSA/(I − 1)
SSW/IJ(K − 1)
=
MSA
MSW
> k
IJ(K − 1)
I − 1
= c
where c is chosen such that Pr(F > c|H0
A) = α. From the distribution of
this F-statistic, which we derived earlier, it is immediately evident that c =
F1−α
I−1,IJ(K−1). This completes the proof to the ﬁrst part of the theorem. By
noting that similar restrictions have been imposed on the βj as on the αi,
one will note that it is not necessary to construct the proof to part 2 ab initio.

However, he need only permute some subscripts and use the appropriate degrees
of freedom to complete the proof. But the reader who feels unsated with the
proposed logic should convince himself by going through all the steps. The proof
to the last part can be completed similarly to the proof just presented and is
left as an exercise.
Example 5. In an experiment to test 3 types of adhesive, 45 glass to glass
specimens were set up in 3 different types of assemblies and tested for tensile
strength. The types of adhesive were, 047, 00T, and 001 and the types of assem-
blies were cross-lap, square-center, and round-center. Each of the 45 entries of
table 1.6 represents the recorded tensile strength of the glass to glass assemblies
[data from Johnson and Leone [5]]. These data can be found under dataset
glass under this book’s package.
Glass-Glass Assembly
Adhesive Cross-Lap Square-Centre Round-Center
047 16 17 13
14 23 19
19 20 14
18 16 17
19 14 21
00T 23 24 24
18 20 21
21 12 25
20 21 29
21 17 24
001 27 14 17
28 26 18
14 14 13
26 28 16
17 27 18
Table 1.6: Table of bond strength of glass-glass assembly.
Figure 1.7 shows slight symmetry, no outliers, and not enough violation of the
constant variance assumption to warrant suspicion. Not the exact same can be
said about figure 1.8 which calls the constant variance assumption into question.
At least by now we know the risk entailed by blatantly ignoring such a clear
indication of heteroscedasticity. R commands to view both figures at the same
time follow.
> data(glass)
> par(mfrow=c(1,2))
> plot(strength~adhesive+assembly)

Cross−lap Square−center Round−center
12
14
16
18
20
22
24
26
28
Response
Figure 1.7: Boxplots for the glass data plotted according to assembly type.

047 00T 001
12
14
16
18
20
22
24
26
28
Response
Figure 1.8: Boxplots for the glass data plotted according to adhesive type.
Fitting a model to the raw (i.e. untransformed) data gives significant results for
adhesives and interactions. But we might want to think twice before concluding
that these factors are indeed significant. To this end, we seek a transformation to
stabilize the variance. The square-root transformation seems to work reasonably
well for us, but it is seen to greatly upset normality. Boxplots for the transformed
data are not shown for purposes of space. A histogram and qqplot of residuals
are shown in figure 1.9.
The histogram shows a slightly ragged character, a long and fat left tail, and
lack of symmetry. The qqplot also shows gross departure from linearity. The
following set of commands will create figure 1.9.
> m2 <- lm(strength^.5~adhesive*assembly,glass)
> r <- m2$resid
> par(mfrow=c(1,2))
> qqnorm(r,ylab="Ordered Residuals",main="");qqline(r,col=2)
> hist(r,xlab="Residuals",main="")
The Shapiro-Wilk test of normality shows that we have not lost much; it gives
a p-value of 0.084, while the untransformed variable has a (slightly higher)
p-value of 0.158. We also know that the F-test is robust against departures
from normality. We therefore accept the square-root transformation as a good
compromise. Table 1.7 summarizes the results of fitting a linear model to the

q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−2 −1 0 1 2
−1.0−0.50.00.5
OrderedResiduals
Residuals
Frequency
−1.0 0.0 0.5
02468
uals.
transformed variable. As you should have expected, the p-values are slightly
lower but the adhesives are still significant. The interactions on the other hand
are slightly short of the 5% significance level. We conclude that the type of
adhesive influences bond strength while the type of assembly does not.
Adhesive 2 1.5682 0.78409 3.4548 0.0424
assembly 2 0.0749 0.03745 0.1650 0.84854
Interaction 4 2.3003 0.57509 2.5339 0.05699
Within 36 8.1704 0.22696
Total 44 12.1138
Table 1.7: Anova table for the glass to glass assembly data.

1.3.3 Multiple Comparisons
1.3.4 Randomized Complete Blocks
Randomized blocks are a form of unreplicated two-way analysis of variance in
which the two factors forming the design are the treatment and another factor
known to have an effect on the response under investigation. This second factor
is called a block. Each block is assigned all treatments at random in such a
way that within each block, each treatment appears once and only once. A
block effect is rarely tested in practice; of primary interest is the treatment
effect since the blocks are, by assumption, already expected to have an effect.
Randomized blocks were first developed for agricultural experiments and much
of the terminology has remained unchanged. The term “block” was traditionally
understood to refer to a block of land, but with the wide appreciation and
popularity of randomized complete blocks over the years, it is now used to refer
to any factor that plays an analogous role in more recent adaptations of such
experiments.
In a study to compare the effects of I fertilizers (or treatments in the more
general case) on the yield, J blocks of land are subdivided into I homogeneous
plots and the fertilizers are allocated at random to these plots. This is a classical
problem for which the method of randomized complete blocks was developed.
Other uses of this design can be found in several other fields.
The statistical model for randomized complete block design is,
Yij = µ + αi + βj + ij,
where
I
i=1
αi =
J
j=1
βj = 0.
The sums of squares are the same as those under the two-way additive model
but with K = 1. The null hypotheses of the no treatment and no block effects
are,
H0
A : αi = 0 ∀i ∈ {1, . . . , I},
and
H0
B : βj = 0 ∀j ∈ {1, . . . , J},
respectively. But remember that only the former is of interest. In the fertilizer
experiment presented above, an experimenter will hardly be as interested in
whether block A was the most productive as he would be in whether fertilizer
II yielded the most crop.

1.4. LATIN SQUARES 31
hypotheses of no treatment and block effects are given by:
1.
FA =
MSA
MSI
,
where H0
I−1,(I−1)(J−1), and
2.
FB =
MSB
MSI
,
where H0
J−1,(I−1)(J−1).
Proof. The details of this proof are left to the reader.
1.4 Latin Squares
Latin squares arise as natural extensions of randomized complete blocks—they
are a form of three-way analysis of variance without replication. If heterogeneity
is known to be two-dimensional in some investigation, then two blocking factors
can be incorporated in an unreplicated design, effectively forming a square with
N row blocks and N column blocks. We then speak of a row effect, a column
effect, and a treatment effect. But as in randomized blocks, it is only the latter
that will be of concern to the investigator. These designs have found wide
application in industry because of their optimality and impressive performance.
A prototype of a Latin square design is an experiment in which a fertilizer
(i.e. the treatment) is to be tested at N levels on a field that is known to vary
in intrinsic fertility, say, in a north-south direction and in soil depth, say, in
an east-west direction. The field is then subdivided to form an N × N array
of subplots and the fertilizers are randomly allocated to the subplots in both
directions in such a manner that all N levels of the treatment occur once and
only once in either direction.
Let τi denote the differential effect of the ith row block, βj the differential effect
of the jth column block and, γk the differential effect of the kth treatment.
Then the statistical model is
Yijk = µ + τi + βj + γk + ijk, (1.12)
where
N
i=1
τi =
N
j=1
βj =
N
k=1
γk = 0.

Theorem 5. If we assume that the random errors, ijk ∼ iidN(0, σ2
), for i =
1, . . . , N , j = 1, . . . , N, and k = 1, . . . , N, then we have the following results:
1. SST/σ2
=
N
i=1
N
j=1
N
k=1
(Yijk − ¯Y...)2
∼ χ2
N2−1
2. SSA/σ2
= N2
N
i=1
( ¯Yi.. − ¯Y...)2
/σ2
∼ χ2
N−1
3. SSB/σ2
= N2
N
j=1
( ¯Y.j. − ¯Y...)2
/σ2
∼ χ2
N−1
4. SSC/σ2
= N2
N
j=1
( ¯Y..k − ¯Y...)2
/σ2
∼ χ2
N−1
5. SSE/σ2
=
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2
∼ χ2
(N−1)(N−2)
6. The above variates are mutually independent.
Proof. We present the proof shortly....
hypotheses of no row, no column and no treatment eﬀects are given by:
1.
FA =
MSA
MSE
,
where H0
N−1,(N−1)(N−2),
2.
FB =
MSB
MSE
,
where H0
N−1,(N−1)(N−2), and
3.
FC =
MSB
MSE
,
where H0
C is rejected at 100(1 − α)% if FC > F1−α
N−1,(N−1)(N−2).
Proof. From the statistical model given in equation 1.12 we have,
f(yijk) =
1
σ
√
2π
exp −
1
2
yijk − µ − τi − βj − γk
σ
2
.

1.4. LATIN SQUARES 33
The likelihood function takes the form,
L(µ, τi, βj, γk, σ2
|y) = (2πσ2
)−N3
/2
exp



−
1
2
N
i=1
N
j=1
N
k=1
Yijk − µ − τi − βj − γk
σ
2



.
Then we have,
l = log L = −
N3
2
log(2πσ2
) −
1
2σ2
N
i=1
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk)2
.
Under the hypothesis of no eﬀects, we have the following parameter space.
Ω = {(µ, τi, βj, γk, σ2
)| − ∞ < µ, τi, βj, γk < ∞, σ2
> 0}
The maximum likelihood estimates are obtained in the usual way. From
∂l
∂µ
=
1
σ2
N
i=1
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk) = 0,
we have ˆµΩ = ¯Y....
∂l
∂τi
=
1
σ2
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk) = 0,
∂l
∂βj
=
1
σ2
N
i=1
N
k=1
(Yijk − µ − τi − βj − γk) = 0,
∂l
∂γk
=
1
σ2
N
i=1
N
j=1
(Yijk − µ − τi − βj − γk) = 0,
Finally,
∂l
∂σ2
= −
N3
2σ2
+
1
2σ4
N
i=1
N
j=1
N
k=1
(Yijk − µ − τi − βj − γk)2
= 0,
gives,
ˆσ2
Ω = N−3
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2
.
Putting everything together we obtain,
sup
Ω
L = exp −
N3
2
·



2π
N3
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2



−N3
/2
.

The parameter space under the first null hypothesis HA is,
ωA = {(µ, βj, γk, σ2
)| − ∞ < µ, βj, γk < ∞, σ2
> 0}.
Similar arguments to those above give the following supremum under ωA,
sup
ωA
L = exp −
N3
2
·



2π
N3
N
i=1
N
j=1
N
k=1
(Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2



−N3
/2
.
ΛA =
sup
ωA
L
sup
Ω
L
=







N
i=1
N
j=1
N
k=1
(Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2







−N3
/2
It can be shown that the numerator sum of squares can be decomposed to give,







N
i=1
N
j=1
N
k=1
( ¯Yi.. − ¯Y...)2
+
N
i=1
N
j=1
N
k=1
(Yijk − ¯Y.j. − ¯Y..k + ¯Y...)2
N
i=1
N
j=1
N
k=1
(Yijk − ¯Yi.. − ¯Y.j. − ¯Y..k + 2 ¯Y...)2







−N3
/2
.
The term in brackets simplifies to 1+SSA/SSE, hence the generalized likelihood
ratio test rejects HA for large values of SSA/SSE, or equivalently, if
FA =
SSA/(N − 1)
SSE/(N − 1)(N − 2)
=
MSA
MSE
> c ,
where it is easily verified that
c = F1−α
N−1,(N−1)(N−2).
Example 6.
1.5 Summary and Addenda

1.6. EXERCISES 35
Carbon Grade 4 1787.4 446.8 2.3894 0.10888
pH 4 14165.4 3541.3 18.9370 0.00004
Quantity 4 3194.6 798.6 4.2706 0.02233
Residuals 12 2244.1 187.0
Total 24 21391.5
Table 1.8: Anova table for the puriﬁcation process data.
1.6 Exercises
1. Show that...Just a template
2. Given that
Yijk = µ + αi + βj + ij + ijk,
show that...

BIBLIOGRAPHY
[1] Sokal, R. R., and Rohlf, F. J. (1968). Biometry: The principles and practice
of statistics in biological research. Freeman
[2] Cassella, B. and Berger, R. L. (1992). Statistical Inference. Duxbury
[3] Miller, R. G., Jr. (1997). Beyond Anova: Basics of applied statistics. Chap-
man & Hall
[4] Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods. Iowa State
[5] Johnson, N. L. and Leone, F. C. (1964). Statistics and Experimental Design:
in Engineering and the Physical Sciences. volume II. Wiley
[6] Development Core Team (2012). R: A language and environment for statis-
tical computing. R Foundation for Statistical Computing, Vienna, Austria.
ISBN 3-900051-07-0, URLhttp://www.R-project.org/.
37

AnalysisOfVariance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AnalysisOfVariance

Similar to AnalysisOfVariance (20)

AnalysisOfVariance