SlideShare a Scribd company logo
1 of 148
Download to read offline
MTH 201: Biometry
Lecture Notes
October 2013
2
Department of Biometry and Mathematics
Faculty of Science
Sokoine University of Agriculture
MTH 201: Biometry
Lecture Notes
Kassile, T.
Office Room # 9, KEPA, SMC, Mazimbu
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
0.1.1 Course Objective . . . . . . . . . . . . . . . . . . . . . . . . . v
0.1.2 Course Description . . . . . . . . . . . . . . . . . . . . . . . . v
0.1.3 Pre-requisite . . . . . . . . . . . . . . . . . . . . . . . . . . . v
0.1.4 Course requirement . . . . . . . . . . . . . . . . . . . . . . . v
0.1.5 Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
0.1.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Terminologies in Experimental Designs 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Principles of experimental designs 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Randomization principle . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Replication principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Local control principle . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Analysis of Variance 9
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Assumptions in the analysis of variance . . . . . . . . . . . . . . . . 10
3.3 Analysis of variance for one-way classification . . . . . . . . . . . . . 11
3.3.1 Analysis of variance for one-way classification with unequal
replication (unbalanced data) . . . . . . . . . . . . . . . . . . 11
3.3.2 Linear additive model for one-way classification . . . . . . . . 12
3.3.3 Fixed vs. random effects . . . . . . . . . . . . . . . . . . . . . 12
i
ii TABLE OF CONTENTS
3.3.4 Calculation of sums of squares . . . . . . . . . . . . . . . . . 13
3.3.5 ANOVA for one-way classification with equal replication (bal-
anced data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 ANOVA for two-way classification (Without Replication) . . . . . . 21
3.4.1 Linear additive model for two-way classification . . . . . . . . 21
3.5 The least significance difference (LSD) . . . . . . . . . . . . . . . . . 25
4 Introduction to SPSS 29
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Starting SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Data entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Keying data into SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.1 Osteopathic manipulation data set . . . . . . . . . . . . . . . 32
4.5 Opening an existing dataset . . . . . . . . . . . . . . . . . . . . . . . 34
4.6 Importing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7 Exporting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.8 ANOVA for one-way classification in SPSS . . . . . . . . . . . . . . . 35
5 Completely Randomized Design 39
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.1 Statistical hypotheses . . . . . . . . . . . . . . . . . . . . . . 40
5.3.2 Test procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Advantages and disadvantages of CRD . . . . . . . . . . . . . . . . . 41
5.4.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Randomised Block Design 45
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3.1 Statistical hypotheses . . . . . . . . . . . . . . . . . . . . . . 47
6.3.2 Test procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 47
TABLE OF CONTENTS iii
6.4 Advantages and disadvantages of RBD . . . . . . . . . . . . . . . . . 48
6.4.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.4.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.6 Reasons for blocking in RBD . . . . . . . . . . . . . . . . . . . . . . 52
7 Latin Square Design 55
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2.1 Linear additive model . . . . . . . . . . . . . . . . . . . . . . 57
7.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3.1 Calculation of sums of squares . . . . . . . . . . . . . . . . . 57
7.4 Advantages and disadvantages of LSD . . . . . . . . . . . . . . . . . 58
7.4.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8 Factorial Experiments 65
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Main effects and interaction effects . . . . . . . . . . . . . . . . . . . 66
8.3 The 22 factorial experiments . . . . . . . . . . . . . . . . . . . . . . . 66
8.4 The 23 factorial experiments . . . . . . . . . . . . . . . . . . . . . . . 68
8.5 Sum of squares due to factorial effects . . . . . . . . . . . . . . . . . 69
8.6 Tests of significance of factorial effects . . . . . . . . . . . . . . . . . 71
8.7 Yates’ method of computing factorial effect totals . . . . . . . . . . . 74
9 Multiple Comparisons 77
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.2 Multiple comparisons procedures . . . . . . . . . . . . . . . . . . . . 78
9.2.1 Duncan’s new multiple range-test . . . . . . . . . . . . . . . . 78
10 Simple Linear Regression and Correlation 85
10.1 Simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.1.1 Fitting a simple linear regression model-the method of least
squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
iv TABLE OF CONTENTS
10.1.2 Assessing the fitted regression . . . . . . . . . . . . . . . . . . 87
10.1.3 Confidence intervals for regression parameters . . . . . . . . . 93
10.2 Correlation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.2.1 Karl Pearson’s correlation coefficient (r) (ref: MTH 106) . . . 102
10.2.2 Spearman’s coefficient of Rank correlation . . . . . . . . . . . 104
11 Data Transformation 109
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
11.2 Parameters of normal distribution . . . . . . . . . . . . . . . . . . . 109
11.2.1 Shape of the normal distribution . . . . . . . . . . . . . . . . 110
11.3 Reasons for data transformation . . . . . . . . . . . . . . . . . . . . 110
11.4 Testing for normality . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.5 Common data transformations . . . . . . . . . . . . . . . . . . . . . 111
12 Analysis of Frequency Data 115
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
12.2 Objective of two-way classification . . . . . . . . . . . . . . . . . . . 115
12.3 The Chi-square test of independence . . . . . . . . . . . . . . . . . . 117
13 Review Exercises 125
13.1 Exercise I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
13.2 Exercise II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
13.3 Exercise III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
13.4 Exercise IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
13.5 Exercise V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
13.6 Ecercise VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
13.7 Exercise VII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
13.8 Exercise VIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
13.9 Exercise IX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
0.1. PREFACE v
0.1 Preface
0.1.1 Course Objective
Focuses on the use of statistical/mathematical techniques to problems in agricul-
tural, environmental, and biological sciences. It is concerned with the design of
experiments, analysis and interpretation of results.
0.1.2 Course Description
Principles of experimental designs, analysis of variance (ANOVA): one way classifica-
tion, e.g., completely randomised design (balanced and unbalanced data), multiway
classification, e.g., randomised complete block design, Latin square design; factorial
experiments. Multiple comparisons, data transformation; simple linear regression
and correlation, analysis of frequency data e.g., contingency tables.
0.1.3 Pre-requisite
MTH 106-Introductory Statistics.
0.1.4 Course requirement
I: Coursework- 2 quizes and 2 tests: contribute 40% of the total credits allotted
to this course.
II: Final (End of Semester) Exam: contribute 60%
0.1.5 Computing
Where necessary, for illustration purposes, the Statistical Package for the Social
Sciences (SPSS) and the SAS software packages will be frequently used. However,
use of SPSS or SAS in the course is considered optional.
0.1.6 References
Cody, R.P. and Smith, J.K. (1997). Applied Statistics and the SAS Programming
Language. Fourth Ed. Prentice Hall.
Der, G. and Everitt, B.S.(2002). A Handbook of Statistical Analyses using SAS.
Second Ed. Chapman & Hall/CRC.
Montgomery, D. (2001). Introduction to Linear Regression Analysis. Wiley and
Sons, Inc.
Montgomery, D. (2001). Design and Analysis of Experiments. Wiley and Sons, Inc.
Neter, J., Kutner, M., Nachetsheim, C. J. and Wasserman, W. (1996). Applied
Linear Statistical Models. Irwin, Chicago.
Chapter 1
Terminologies in Experimental
Designs
1.1 Introduction
Real-life scientific investigations often involve comparisons between several sets of
data collected from basically similar conditions, e.g., groups of plants of the same
type which have been grown under conditions alike except that different fertilizers
were used for each group, different doses of a drug administered to the same or
different groups of patients, different varieties of rations given to a group of homo-
geneous animals, same instructor teaching students with different background about
the subject being taught, etc.
Many types of biological data are collected through planned (well designed) experi-
ments. Designing an experiment requires adherence to some rules or principles and
procedures if valid conclusions are to be drawn. For example, the data from an ex-
periment set up according to a particular design should be analysed according to the
appropriate procedure for that design. No type of statistical method, no matter how
sophisticated, can compensate for a poorly designed study or improve the quality
of results obtained from an improperly designed experiment. Thus, an important
aspect in this respect is that design of experiment determines the quality of the
results! Before we embark on the contents of the course, let us first, briefly distin-
guish between biometry and some related specializations/fields within the statistics
domain.
1.1.1 Definitions
Biometry. As alluded to above, biometry: is a subject that is concerned with the
application of statistics and matehematics to problems in the agricultural, environ-
mental, and biological sciences. Hence, biometrics: is the application of statistics
and mathematics to problems with a biological component, including the problems
in agricultural, environmental, and biological sciences as well as medical science.
1
2 CHAPTER 1. TERMINOLOGIES IN EXPERIMENTAL DESIGNS
These include statistical methods, computational biology, applied mathematics, and
mathematical modeling.
Biostatistics: is a field of study that is concerned with the application of statis-
tics to the biological sciences, especially those relating to medical sciences. Med-
ical colleges/universities (for example, Muhimbili University of Health and Allied
Sciences-MUHAS, International Medical and Technological University-IMTU, Kili-
manjaro Christian Medical College-KCMC, Catholic University of Health and Allied
Sciences - CUHAS, and so on) often have biostatistics as one of the core courses to
students enrolled in various degree programmes with a major in medical sciences.
Described below are key terminologies in the notion of experimental designs.
Experiment: is an investigation set up to provide answers to a question or ques-
tions of interest. For example, we may wish to conduct an experiment to test the
efficacy of a certain newly developed drug for curing a certain skin condition in hu-
mans or aminals. We may also conduct an experiemnt to invest whether or not three
varieties of feeds give same are different in terms of amount of milk produced per day.
In this context, an experiment is more likely to involve comparison of treatments
(defined below) e.g., drugs, rations, methods, varieties, fertilizers, etc. However, in
some cases experiments do not involve comparison of one treatment with another
treatment. Hence, experiments can be absolute or comparative. If we conduct
an experiment to examine the usefulness of a newly developed drug for curing a
certain skin condition in animals without comparing its effect with other drugs, the
experiment will be an absolute experiment. On the other hand, if we conduct an
experiment to assess the effectiveness of one drug as compared to the effects of other
drugs, the experiment is said to be a comparative experiment.
Experimental design or designing of an experiment: a design is a plan for
obtaining relevant information to answer the research question of interest. In other
words, we define designing of an experiment as the compete sequence of steps laid
down in advance to ensure that maximum amount of information relevant to the
problem under investigation will be collected.
Treatment or treatment combination: procedure whose effect is to be mea-
sured and compared with other procedures. For example, in a dietary or medical
experiment, the different diets or medicines are the treatments, in an agricultural
experiment, the different varieties of a crop or the different fertilizers will be the
treatments.
Experimental unit: the unit of experimental material to which one application of
the treatment is applied and on which the variable under study is measured or an
experimental unit is that unit to which a single treatment (which may be a combi-
1.1. INTRODUCTION 3
nation of many factors as in factorial experiments) is applied in one replication of
the basic experiment.
Examples
In an agricultural experiment, the plot of land will be the experimental unit; in a
dietary experiment the whole animal is the experimental unit, in medical experi-
ments for which treatments (or medications) are assigned to individuals and effects
measured, the individual is the experimental units.
Response (yield/outcome): is a result observed for a particular experimental
unit.
Examples
One may be interested to know the amount of a crop (in kg) produced when different
types of fertilizers are applied to a piece of land, or number of students who pass
MTH 201 when different instructors are used for each degree programme taking the
course, or the amount of milk (in litres) that will be produced when different types of
feeds are used to a group of supposedly homogeneous cows, or number of customers
who will visit a particular supermarket in Dar es Salaam when different marketing
strategies are used by the company operating the supermarket.
Exercise
In the agricultural field experiment of assessing the effects of different varieties of
fertilizers on crop production described above to illustrate the notion of response
identify:
i. the experimental unit;
ii. the treatments; and
iii. the response or yield or outcome.
Factor: Is a variable, which is believed to affect the outcome of an experiment e.g.
humidity, pressure, time, concentration, etc.
Level: the various values or classifications of the factors are known as the levels of
the factor (s). For example, suppose we wish to compare the efficacy of three med-
ications (M1, M2, and M3) for lowering blood pressure among middle aged women,
thus, there are three levels of the factor blood pressure. Assume also that a es-
earcher is interested in comparing four different doses (D1, D2, D3 and D4) of a drug
administered to rats of the same type; here there are four levels of the factor drug.
Experimental error: is a measure of the variation among experimental units that
measures mainly inherent variation among them. Thus, experimental error is a
technical term and does not mean a mistake, but includes all types of extraneous
variation due to:
4 CHAPTER 1. TERMINOLOGIES IN EXPERIMENTAL DESIGNS
i. inherent variability in the experimental units;
ii. error associated with the measurements made; and
iii. lack of representativeness of the sample to the population under study.
Therefore, based on the above reasons particularly the first one, we cannot completely
control experimental error, but we can always think of how to reduce it. Variations
among experimental units sometimes cannot be avoided in practice, some variations
are controllable, and some are beyond the control of the experimenter. If we can
control the magnitude of experimental error we would be in a better position to
detect differences among treatments if really exists.
Exercises
1 Suppose the following experiment is conducted, with the aim of comparing three
feeds (I, II, II) in cows. Three cows are obtained. One cow is given feed I, another
feed II and the last cow feed III. 300 observations are taken on each cow.
i. What is the experimental unit?
ii. What are the treatments?
iii. How many replicates per treatment? (to be answered later)
2 An experiment is to be undertaken to compare growth patterns obtained in mice
given three different types of drug. The drug may be administered orally or by
injection. 72 identical mice are available for study. Two different experimental plans
are proposed:
(i) The 72 mice are to be allocated to 12 cages, 6 mice per cage. Each cage is assigned
at random to one of the three drugs, 4 cages per drug. For each cage, the drug is
administered to the animals within the cage by mixing it into the daily shared food
supply for the 6 mice.
(ii) The 72 cages are to be allocated to 12 cages, 6 mice per cage. Within each cage,
each mouse is assigned to receive one of the drugs by daily injection, 2 mice per drug
in each cage.
i. What are the treatments under investigation?
ii. In each of plans (i) and (ii), identify the experimental units.
Chapter 2
Principles of experimental
designs
2.1 Introduction
Designing an experiment to obtain relevant data in a way that permits subjective
analysis leading to valid inferences/conclusions with respect to the problem(s) un-
der investigation is often a challenging step in practice. Correctly identifying the
relevant experimental units, their size or number, and the way the treatments are
assigned to the experimental units are some of the most important aspects of design
of experiments. In this section we describe the principles that depending on the
design chosen must be adhered to when planning an experiment to answer a specific
problem. There are three main principles of experimental designs, namely:
i. Randomisation;
ii. Replication; and
iii. Error/local control
2.2 Randomization principle
Randomisation is an essential component/principle in experimental design. Ran-
domisation involves the assignment of treatments to the experimental units, based
on the chosen design, by some chance mechanism or probabilistic procedures, e.g.,
random numbers, so that each experimental unit has the same chance of receiving
any one of the treatments, which are under study. Conscious allocation of the treat-
ments to the experimental units has been criticised by many researchers, in fact
results from studies which had not allocated treatments to the experimental units
at random have left useless and thus contributed nothing to the literature avail-
able to date. Briefly speaking, randomization is the use of a known, understood
probabilistic procedure for the assignment of treatments to experimental units.
5
6 CHAPTER 2. PRINCIPLES OF EXPERIMENTAL DESIGNS
As we will discuss later in the course, randomisation been an important principle of
experimental designs is utilised in all designs that we will discuss in this course.
Question
Why do we really need to adhere to this principle?
Recap: as we explained in chapter 3, treatment is a procedure whose effect is to be
measured and compared with other procedures.
Goal: Based on the fact that our intention is to measure and compare effects of
one treatment in comparison to another treatment (s), thus, one obvious goal of
randomisation is to ensure that no treatment is somehow favoured or handicapped.
Randomisation ensures that observations represent random samples (independence of
observations) from population of interest. This insures validity of statistical methods
leading to valid conclusions/inferences.
Illustration
The following example illustrates the importance of randomisation. A study is to
be conducted to compare the efficacies of two drugs (I and II) for treating a certain
skin condition. It is decided that patients will be assigned to drug I if they have
had previous outbreaks of the condition and to drug II if the current outbreak of the
condition is the first for the patient. Comment on this experimental design. If you
feel that the design has drawbacks, state how you would improve it.
From our discussion above, clearly this design lacks an important ingredient-randomisation.
The design had allocated the drugs depending on whether the patient has had a pre-
vious outbreak of the condition. This is not a proper way of assigning treatments to
experimental units. It may be, for example, that patient with first-time outbreaks
are more or less difficult to treat than repeat outbreaks. This may put one drug or
the other at a disadvantage in evaluation of efficacy.
How to improve?
A better design would be one that assigns patients at random to the drugs regardless
of outbreak status. An even better design would be one which assigns patients with
first-time outbreaks to each drug randomly, and similarly for patients with repeat
outbreaks, so that each drug is seen on patients of each type.
2.3 Replication principle
The term replication refers to the number of experimental units on each treatment.
A treatment is said to be replicated if it is applied to more than one experimental
unit. Literally speaking, replication means the number of times a treatment appears
on experimental units.
2.4. LOCAL CONTROL PRINCIPLE 7
Question
What do we replicate and why? The first part of this question is answered above.
We replicate treatments to experimental units. Perhaps of most interest here at least
in my views is the question of why do we need to replicate the treatments.
We repeat a treatment a number of times in order to obtain more reliable estimate
than is possible from a single observation. If you can recall our discussion of statis-
tical inference in MTH 106, we mentioned that the sample size n is a key factor
in determining precision and power. This is the case because if we increase the
sample size, we decrease the magnitude of s ¯D which is a measure of how precisely we
have can estimate the difference and determine the size of our test statistic (and thus
power of the test). In the context of experimental design, the number of replicates
per treatment is also a key factor in determining precision and power.
Example
Suppose an experiment is conducted with the goal of comparing two diets in weight
in sheep. Two sheep are available for experimentation. One sheep is given diet A,
the other; diet B. 400 observations are taken on each sheep.
In this example very little can be learned about how the treatments compare in the
population of such sheep. We have only one sheep on each treatment (diet); thus,
we do not know if observed differences we might see are due to a real difference
in the treatments we are comparing or just the fact that these two sheep are quite
different. This is perhaps a contrived example, but it illustrates a general point of
why replication is an important principle in experimental designs.
A practical advice
If we have a fixed number of experimental units available for experimentation, an
obvious challenge is how to make the best use of them for detecting treatment dif-
ferences. In this situation of limited resources we would be better off with a few
treatments with lots of replicates on each treatment rather than many treatments
with fewer replicates on each.
Thus, if limited resources are available, it is better to reduce the number of treat-
ments to be considered or postpone the experiment rather than to proceed with too
few replicates. So randomisation plus replication will be necessary for the validity
of the experiment.
Exercise
In your own words explain why you think replication is an important concept to
keep in mind when designing an experiment.
2.4 Local control principle
Experimental design is founded on the principle of reducing what we regard as
experimental error by meaningful grouping of experimental units into small non-
8 CHAPTER 2. PRINCIPLES OF EXPERIMENTAL DESIGNS
overlapping units. As we discussed above, we cannot eliminate inherent variability
completely but if we try to be careful enough about what we consider to be inherent
variability we should be in a position to separate systematic variation among exper-
imental units from inherent variation and hence arrive at the stated goal (s) of the
experiment.
Local control are techniques for reducing the error variance. One such measure is to
make experimental units homogeneous, i.e. to form units into several homogeneous
groups called blocks. This is done particularly in situations where the experimen-
tal units are assumed to be non-homogeneous. Thus, to reduce the magnitude of
experimental error one needs to group the experimental units.
It should be understood that in order to detect treatment differences if they really,
we must strive to control the effects of experimental error, so that any variation we
observe can be attributed mainly to the effects of the treatments we are comparing
rather than to differences among the experimental units to which the treatment are
applied.
Summary: From what we have discussed so far, it is clear that a good experimental
design attempts to:
i. ensure sufficient replication of treatments to experimental units; and
ii. reduce the effects of experimental error by meaningful grouping of experimental
units –application of local/error control.
Chapter 3
Analysis of Variance
3.1 Introduction
Among the most extremely useful statistical procedures in the fields of agriculture,
economics, psychology, education, sociology, business/industry and in researches of
several other disciplines is the analysis of variance. This technique is particularly
used when multiple sample cases are involved.
Recap: Tests of significance discussed in MTH 106 between the means of two sam-
ples can easily be judged through either the standard normal distribution, z-test or
the student’s t- test. Just to remind you one of the popular t-test is the two sample
pooled t- test used when the two unknown population variances are assumed to be
equal.
Problem: When there are more than two samples, performing all possible pairwise
comparisons especially if n is large becomes a wearying exercise. The analysis of
variance technique enables us to perform this simultaneous test. Using this tech-
nique one can draw inferences about whether the samples have been drawn from
populations having the same mean.
Example
Comparison of yields of a certain crop from several varieties of seeds, the smoking
habits of six groups of SUA students and so on.
If we are to use either the z or t-tests, one need to consider all possible combinations
of two varieties of seeds at a time and also two groups of students. This would take
some time before one arrives at a decision. In such circumstances, one quite often
utilizes the analysis technique and through it investigates the differences among the
means of all the populations simultaneously.
Acronym: The popular acronym for ANalysis Of VAriance is ANOVA.
9
10 CHAPTER 3. ANALYSIS OF VARIANCE
Definition: Montgomery (2001) defines ANOVA, as a procedure for testing the
difference among different groups of data for homogeneity.
Target: To partition the total amount of variation in a set of data into two compo-
nents:
i. The amount which can be attributed to chance; and
ii. The amount, which can be attributed to specified causes
If we take only one factor and investigate differences amongst its various categories
having numerous possible values, we are said to use one-way ANOVA and in case
we investigate two factors at the same time, then we use two-way ANOVA.
3.2 Assumptions in the analysis of variance
When one employs the ANOVA technique has to be satisfied that the basic assump-
tions underlying the technique are fulfilled if he/she is to give valid inferences. There
are three basic assumptions underlying this approach:
i. The observations, and hence the errors, are normally distributed
ii. All observations both across and within samples, are unrelated (independent)
iii. The observations have the same variance σ2
Important: The assumptions stated above are not necessarily true for any given
situation. In fact, they are probably never exactly true. For many data sets in
practice, they may be a reasonable approximation in which case the results will be
fairly reliable. In other cases, they may be badly violated; in this case, the resulting
conclusions may not be valid or may be misleading.
If the data really are not normal, hypothesis test may be imperfect, leading to
invalid inferences.
Strategy: In some data it is possible to get around these issues somewhat. One
of the most commonly used approaches to deal with the problem of non-normality
of data is the so called data transformation, an aspect which with be dealt with
later in the course.
Important: For the reminder of our discussion of analysis of variance in this and
subsequent chapters, we will assume that the above assumptions are reasonable either
3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 11
on the original scale of measurement of the data or transformed scale. Keep this in
mind at all times that these are just assumptions, and must be confirmed before the
methods may be considered suitable.
3.3 Analysis of variance for one-way classification
Under the one-way (or single factor) ANOVA, we randomly obtain the experimen-
tal units for the experiment and randomly assign them to the treatments so that
each experimental unit is observed under one of the treatments. In this situation,
the only way in which experimental units may be classified is with respect to which
treatment they received. Basically, the experimental units are viewed alike in this
experiment. Thus, when experimental units are thought to be alike and are thus
expected to exhibit a small amount of variation from unit to unit, grouping then
would be pointless in the sense that doing so would not add much precision to an
experiment.
It can be shown that the total variation in the observed responses can be subdivided
into two components:
i. Due to the differences in the level of factor (say A)
ii. The residual variation (error term)
3.3.1 Analysis of variance for one-way classification with unequal
replication (unbalanced data)
We will first consider the case where unequal number of replication of the treatments
to the experimental units is observed.
Notation: To facilitate the development of methods that we will require in our
discussion, we will change slightly our notation of sample mean we discussed in
MTH 106. As we will see shortly, we will be dealing with several different types of
means for the data.
Let t denote treatment and k the number (levels) of treatments. Let also Yij be the
response of the jth experimental unit receiving the ith treatment level
We will also denote the sample mean for treatment i (mean of all plots receiving
treatment i) by
Y i. = 1
ni
ni
j=1
Yij
Also we define Y .. =
k
i=1
ni
j=1
yij
k
i=1
ni
or Y.. = G
N as the grand mean yield (sample mean
12 CHAPTER 3. ANALYSIS OF VARIANCE
of all the data) in the whole experiment. Note that because we have unequal
replications, the total number of observations is
k
i=1
ni = N
3.3.2 Linear additive model for one-way classification
For a one way classification with unequal replication we may classify an individual
observation as being on the jth experimental unit in the ith treatment level as:
Yij = µ + ti + eij, i=1, 2,. . . k, j=1, 2,. . . ni
Where:
µ = the general mean effect
ti = the effect of level i or the ith treatment effect
eij = the error term
Yij as defined above
Remark: In our discussion we will consider only cases where a single observation
of response is made on each experimental unit; however, it is common practice to
take more than one observation on an experimental unit.
3.3.3 Fixed vs. random effects
In the above model for one way classification with unequal replication, ti represents
the ith treatment effect. However, interpretation of timay differ depending on the
situation.
To better understand the notions of fixed and random effects, consider the following
examples.
Example 1
Suppose there are three varieties of wheat for which mean yields are to be compared.
Here, we are interesting in comparing 3 specific treatments. If we repeated the ex-
periment again, these 3 varieties of wheat would always constitute the treatments of
interest.
Example 2
Suppose a factory operates a large number of machines to produce a product and
wishes to determine whether the mean yield of these machines differs. It is unfea-
sible for the company to keep track for all of the many machines it operates, so a
random sample of 5 such machines is selected, and observations on yield are made
on these 5 machines. The hope is that the results for the 5 machines involved in the
experiment may be generalized to gain insight into the behaviour of all the machines.
In the first example, there is a particular set of treatments of interest. If we started
the experiment next week instead of this week, we would still be interested in this
3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 13
same particular set of treatments. It would not vary across other possible experi-
ments we might do.
In the second example, the treatments are the 5 machines from all machines at the
company, chosen at random. If we started the experiment next week instead of this
week, we might end up with a different set of 5 machines with which to do the
experiment. Here interest focuses on the population of all machines operated by
the company. The question of interest is not about the particular treatments in the
experiment, but the population of all such treatments.
We thus make the following distinction in our model:
In the case like example 1, the ti are best regarded as fixed quantities, as they de-
scribe a particular set of conditions. Thus, ti are referred to as fixed effects
In a case like example 2, the ti are best regarded as random variables. Here the par-
ticular treatments in the experiment may be thought of as drawn from a population
of all such treatments, so there is a chance involved. In this situation, the ti are
referred to as random effects.
3.3.4 Calculation of sums of squares
As we described above, the fundamental nature of the ANOVA is that the total
amount of variation in a set of data is broken down into two components, that
amount which can be attributed to chance and that amount which can be attributed
to specified causes.
Thus, based on the above linear additive model we partition the total variation in
the data as:
Total variation = Variation due to factor A (treatment) + Residual/Error term or
Total sum of squares = Sum of squares due to factor A + Sum of squares due to error
In short we have,
SST = SSA + SSE
Algebraic facts show that the total sum of squares (SST) can be partitioned as:
k
i=1
ni
j=1
Yij − Y..
2
=
k
i=1
ni Y i. − Y..
2
+
k
i=1
ni
j=
Yij − Y i.
2
14 CHAPTER 3. ANALYSIS OF VARIANCE
Y i.and Y .. as defined in Section 3.3.1
Thus, SST=
k
i=1
ni
j=1
Yij − Y..
2
, SSA=
k
i=1
ni
¯Yi. − ¯Y..
2
and SSE=
k
i=1
ni
j=1
Yij − ¯Yi.
2
For calculation we express the SSs as follows:
Define C.F=Correction factor =
k
i=1
ni
j=1
Yij
2
k
i=1
ni
= G2
N Here, G is the grand total=
k
i=1
ni
j=1
Yij and N as defined in Section 2.3.1
It can be shown that:
SST=
k
i=1
ni
j=
Y 2
ij −
k
i=1
ni
j=1
Yij
2
k
i=1
ni
or
k
i=1
ni
j=
Y 2
ij − G2
N or
k
i=1
ni
j=
Y 2
ij − C.F
Treatment SS or Factor A SS=
k
i=1
Y 2
i
ni
− C.F
Where: Yi =
ni
j=1
Yijis the total yield of all the njplots which carried treatment i
Error SS (SSE) =SST-SSA
Since we have k levels of factor (A) or treatment then SSA will have k-1 independent
comparisons possible (degrees of freedom). Similarly SST will have N-1 independent
comparisons (degrees of freedom), and SSE will have (N-1)-(k-1) =N−k independent
comparisons (degrees of freedom).
We summarize the computations in a table known as the ANOVA table.
Table 3.1: One way ANOVA table with unequal replication
Source of Degrees of Sum of Mean square
variation (S.V) freedom (D.F) squares (S.S) (M.S) F- ratio
Between treatments k-1 SSA SSA
K−1 = MSA MSA
MSE
Error(within treat.) N − k SSE SSE
N−k = MSE
Total N-1 SST
The calculated F-value MSA
MSE is compared with the F-tabulated value
(Fα, [(k − 1) , (N − k)]) at α level of significance for k-1 and N −k degrees of freedom.
3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 15
Statistical Hypotheses
The question of interest in this setting is to determine if the means of the different
treatment populations are different.
Mathematically we write:
Ho : µ1 = µ2 = ... = µk That is, the µi are all equal
H1 : µi = µj for at least one i = j That is, the µi are not all equal
Or simply
Ho :There is no variation among the treatments
H1 :Variation exists
Test procedure
At level of significance α, if F> Fα, [(k − 1) , (N − k)] then there is evidence for no
significance variation (i.e. we reject the null hypothesis).
Note that the alternative hypothesis stated above does not specify the way in
which the treatment means (or deviation) differ. The best we can say based on our
statistic is that they differ somehow.
Example
Four Machines are used for filling plastic bottles with a net volume of 16.0 cm3.
The quality-engineering department suspects that both machines fill to the same net
volume whether or not this volume is 16.0 cm3. A random sample is taken from the
output of each machine.
Table 3.2: Machine data set
Machines
A B C D
16.03 16.01 16.02 16.03
16.04 15.99 15.97 16.04
15.96 16.03 15.96 16.00
16.05 15.05 16.02
16.04
Total 64.08 48.03 79.04 64.09
Assume that the measurements are approximately normally distributed, with ap-
proximately constant variance σ2. Do you think the quality-engineering department
is correct? Use α = 0.05
Statistical hypotheses:
Ho :There is no significant variation among the levels of machines
H1 :Variation exists
or
Ho : µ1 = µ2 = µ3 = µ4 (all means are equal)
16 CHAPTER 3. ANALYSIS OF VARIANCE
H1 : µi = µj for at least one i = j (the means are not all equal)
Calculation
Here we have 4 treatment levels (A, B, C, D).
N =
k
i=1
ni = n1 + n2 + n3 + n4 = 4 + 3 + 5 + 4=16
Grand Total (G) =
4
i=1
ni
j=1
Yij=16.03+16.04+. . . +16.00+16.02=255.24
Thus, C.F = G2
N = (255.24)2
16 =4071.7161
Uncorrected total SS
=
4
i=1
ni
j=1
Y 2
ij = (16.03)2
+ (16.04)2
+ ... + (16.00)2
+ (16.02)2
=4072.5976
Total Sum of Squares (SST) =
4
i=1
ni
j=1
Y 2
ij − C.F=4072.5976-4071.7161=0.8815
Totals: A=64.08, B=48.03, C=79.04, D=64.09
Machine (treatment) sum of squares (SSM):
SSM= 1
ni
k
i=1
Y 2
i − C.F= 1
4(64.08)2 + 1
3(48.03)2 + 1
5(79.04)2 + 1
4(64.09)2 − 4071.7161
=4071.868245-4071.7161=0.152145
Error sum of squares (SSE) =Total SS-Treatment (Machine) SS or
SST-SSM=0.8815-0.152145=0.729355
We also have k-1=4-1, N − k =16-4=12, so that:
Treatment (machine) MS=0.152145
5 =0.051, Error MS=0.729355
12 =0.061,
F = 0.051
0.061 =0.83
3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 17
We summarize the computation in an analysis of variance table:
Table 3.3: ANOVA Table
Source of variation DF SS MS F- ratio
Between treatments (machines) 3 0.152145 0.051 0.83
Error(within treatments) 12 0.729355 0.061
Total 15 0.881500
To perform the hypothesis test for differences among the means of machines, we
compare the F-calculated value (0.83) from the appropriate value from the F table.
For level of significance α = 0.05, we have F0.05; 3, 12=3.49. Since 0.83 does not
exceed F0.05; 3, 12=3.49. We thus do not reject Ho and hence conclude that the
quality-engineering department is correct, that is, no significant variations among
the machines. In other words, all machines fill to the same net volume.
In SAS
SAS Program
data notes;
input Machines $ Volume @@;
cards;
A 16.03 A 16.04 A 15.96 A 16.05
B 16.01 B 15.99 B 16.03 C 16.02
C 15.97 C 15.96 C 15.05 C 16.04
D 16.03 D 16.04 D 16.00 D 16.02
;
run; proc print; run; quit;
proc anova;
class Machines;
model Volume=Machines;
run; quit;
%newpage
Selected SAS outputs
%begin{verbatim}
Obs Machines Volume
1 A 16.03
2 A 16.04
3 A 15.96
4 A 16.05
5 B 16.01
6 B 15.99
7 B 16.03
8 C 16.02
9 C 15.97
10 C 15.96
11 C 15.05
12 C 16.04
13 D 16.03
14 D 16.04
15 D 16.00
18 CHAPTER 3. ANALYSIS OF VARIANCE
16 D 16.02
The ANOVA Procedure
Dependent Variable: Volume
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 0.15214500 0.05071500 0.83 0.5004
Error 12 0.72935500 0.06077958
Corrected Total 15 0.88150000
R-Square Coeff Var Root MSE Volume Mean
0.172598 1.545433 0.246535 15.95250
Source DF Anova SS Mean Square F Value Pr > F
Machines 3 0.15214500 0.05071500 0.83 0.5004
Exercise
The following data comes from an experiment conducted to investigate the effect
of 4 diets on weight gain in pigs. 19 pigs were randomly selected and assigned at
random to one of the 4 diet regimes. The data are the body weights of the pigs, in
pounds, after having been raised on the diets.
Diet 1 Diet 2 Diet 3 Diet 4
133.8 151.2 225.8 193.4
125.3 149.0 224.6 185.3
143.1 162.7 220.4 182.8
128.9 145.8 212.3 188.5
135.7 153.5 198.6
Assume that the measurements are approximately normally distributed, with con-
stant variance: Is there any evidence in these data to suggest that the mean weights
are different under the different diets? Use α = 0.05. Compare your ANOVA table
with the one below from SAS.
The ANOVA Procedure
Dependent Variable: BWeight
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 20461.40576 6820.46859 164.38 <.0001
Error 15 622.39950 41.49330
Corrected Total 18 21083.80526
R-Square Coeff Var Root MSE BWeight Mean
0.970480 3.753460 6.441529 171.6158
Source DF Anova SS Mean Square F Value Pr > F
Diets 3 20461.40576 6820.46859 164.38 <.0001
3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 19
3.3.5 ANOVA for one-way classification with equal replication (bal-
anced data)
In the above exercise the diets 1, 2, and 4 are each replicated 5 times while diet 3 is
replicated 4 times. In this case as we have discussed above, the sample mean for
treatment i is Y i. = 1
ni
ni
j=1
Yij. We now discuss the case where ni = n for all i,
i=1, 2,. . . , k
Since each treatment is replicated the same number of time (say n), then the total
number of observations, N=nk.
Thus, with this new notation, we define the quantities Y i.,Y .., Total SS, Treatment
SS, and Error SS and their degrees of freedom as follows:
Y i. = 1
n
n
j=1
Yij, Y .. =
k
i=1
n
j=1
yij
nk orY.. = G
N , Total SS (SST) =
k
i=1
n
j=1
Y 2
ij − G2
N ,
Treatment SS or Factor A SS=
k
i=1
Y 2
i
n − C.F where C.F =
k
i=1
n
j=1
Yij
2
nk = G2
N
G = the grand total=
k
i=1
n
j=1
Yij
The degrees of freedom for Treatment SS, Error SS and Total SS, are respectively
(k-1), (N − k) or (nk-k) or k(n-1) and (N-1) or (nk-1).
Table 3.4: One way ANOVA table with equal replication
Source of DF SS MS F- ratio
variation
Between treat. k-1 SSA SSA
k−1 = MSA MSA
MSE
Error (within treat.) k(n-1) SSE SSE
k(n−1) = MSE
Total N-1 SST
The calculated F-value MSA
MSE is compared with the F-tabulated value
(Fα, [(k − 1) , (k(n − 1))]) at α level of significance for k-1 and k(n-1) degrees of free-
dom.
Statistical Hypotheses as given above
Test procedure
At level of significanceα, if F> Fα, [(k − 1) , (k(n − 1))] then there is evidence for no
significance variation (i.e. we reject the null hypothesis).
Example
The following data record the length of pea sections, in ocular units (×0.114 mm),
20 CHAPTER 3. ANALYSIS OF VARIANCE
grown in tissue culture with auxin present. The purpose of the experiment was to
test the effects of the addition of various sugars on growth as measured by length.
Pea plants were randomly assigned to one of 5 treatment groups: control (no sugar
added), 2% glucose added, 2% fructose added, 1% glucose + 1% fructose added, and
2% sucrose added. 10 observations were obtained for each group of plants.
Control 2% glucose 2% fructose 1% fructose 2% sucrose
1 75 57 58 58 62
2 67 58 61 59 66
3 70 60 56 58 65
4 75 59 58 61 63
5 65 62 57 57 64
6 71 60 56 56 62
7 67 60 61 58 65
8 67 57 60 57 65
9 76 59 57 57 62
10 68 61 58 59 67
Total 701 593 582 580 641
We assume that the measurements are approximately normally distributed, with the
same variance σ2. Use α = 0.05 and perform the relevant hypothesis test to these
data.
Calculations show that (check):
C.F=
k
i=1
n
j=1
Yij
2
nk = (701+...+641)2
10×5 = (3097)2
50 = 191828.18
k
i=1
n
j=1
Y 2
ij = 752
+ 672
+ ... + 672
= 193151.00
Thus, Total SS=
k
i=1
n
j=1
Y 2
ij − G2
N =193151.00-191828.18=1322.82
Treatment SS=
k
i=1
Y 2
i
n − C.F =
(7012+...+6412
)
10 − 191828.18=192905.50-191828.18
=1077.32
Error SS= Total SS-Treatment SS=1322.82-1077.32=245.50
We also have (k-1) =5-1=4, k(n-1) =5(10-1)=45 so that
Treatment MS=1077.32
4 = 269.33, Error MS=245.50
45 = 5.46, F = 269.33
5.46 = 49.31
We summarize the computations in an analysis of variance table:
3.4. ANOVA FOR TWO-WAY CLASSIFICATION (WITHOUT REPLICATION)21
Table 3.5: ANOVA table-Pea section data
Source of variation DF SS MS F- ratio
Between treatments 4 1077.32 269.33 49.33
Error (within treatments) 45 245.50 5.46
Total 49 1322.82
F0.05; 4, 45 = 2.61
Comparing the calculated F value (49.33) with the F value from F table (2.61) at
0.05 level of significance we see that 49.33 >F0.05; 4, 45=2.61. We thus reject H0.
There is evidence in these data to suggest that the mean lengths of pea sections are
different depending upon which sugar was added.
3.4 ANOVA for two-way classification (Without Repli-
cation)
As the name suggests, two-way classification means the data are classified on the
basis of two factors. Thus, two-way ANOVA technique is used when the data are
classified on the basis of two factors.
Suppose the two factors are A and B which have h and g levels respectively in an
experiment without replication. Using the ANOVA technique we can partition the
variation of the response about their mean into three different components.
3.4.1 Linear additive model for two-way classification
For two-way classification without replication, let Yij be the response for the ith
level of factor A and jth level of factor B. Thus, the model can be written as:
Yij = µ + ti + bj + eij, i=1, 2, . . . , h; j=1, 2, . . . g
Where:
µ is the overall mean
ti is the effect of level i for factor A
bj is the effect of level j for factor B and eij is the residual (error term).
The ANOVA technique allows us to partition the total SS as:
Total SS = Factor A SS + Factor B SS + Residual SS
or simply,
SST= SSA+SSB + SSE
As in one-way classification, the short methods of computing sum of squares are
given as follows:
22 CHAPTER 3. ANALYSIS OF VARIANCE
Let N (=hg) be the total number of experimental observations
Let G = the sum of yields over all the N (=hg) plots. So that G =
h
i=1
g
j=
Yij,
Correction factor (C.F) =G2
N =
h
i=1
g
j=1
Yij
2
hg
Total SS (SST) =
h
i=1
g
j=1
Y 2
ij − C.F
Factor A SS (SSA) =1
g
h
i=1
Y 2
i − C.F where Yi =
g
j=1
Yij is the total yield of all the
g plots which carried treatment i.
Factor B SS (SSB) =1
h
g
j=1
Y 2
j − C.F where Yj =
h
i=1
Yij is the total yield of all
the h plots which carried treatment j.
Error SS (SSE) = SST – (SSA +SSB)= SST –SSA- SSB
Table 3.6: ANOVA table for two-way classification
Source of variation DF SS MS F-ratio
Factor A h-1 SSA SSA
h−1 = MSA MSA
MSE
Factor B g-1 SSB SSB
g−1 = MSB MSB
MSE
Residual (h-1)(g-1) SSE SSE
(h−1)(g−1) = MSE
Total N-1 SST
Statistical hypotheses:
Factor A:
Ho: t1 = t2=. . . =th
H1: ti =tj for at least one i = j
Factor B:
Ho: b1=b2=. . . =bg
H1: bi =bj for at least one i = j
Test procedure
Reject Ho for factor A, if the calculated F-value MSA
MSE > the tabulated F-value
Fα, [(h − 1) , (h − 1)(g − 1)] at α-level of significance. Otherwise, we do not reject
Ho.
Similarly, reject Ho for factor B, if the calculated F-value MSB
MSE > the tabulated
F-value Fα, [(g − 1) , (h − 1)(g − 1)] at α-level of significance. Otherwise, we do not
reject Ho
3.4. ANOVA FOR TWO-WAY CLASSIFICATION (WITHOUT REPLICATION)23
Example
Three different methods of analysis M1, M2, and M3 are used to determine in parts
per million the amount of a certain constituent in a sample. Each method is used
by five analysts and the results are given below.
Analyst
1 2 3 4 5 Total
7.0 6.9 6.8 7.1 6.9 34.7
Method 6.5 6.7 6.5 6.7 6.6 33.0
6.6 6.2 6.4 6.3 6.4 31.9
Total 20.1 19.8 19.7 20.1 19.9 99.6
Do these results indicate a significant variation either between the methods or be-
tween analysts? Use α = 0.01
Statistical hypotheses:
For analyst
Ho: analysts do not differ
H1:Analysts differ
For method
Ho: methods do not differ
H1:methods differ
Calculation
C.F =G2
N =
h
i=1
g
j=1
Yij
2
hg = (99.6)2
15 =661.344
Total SS (SST) =
h
i=1
g
j=1
Y 2
ij − C.F=662.32-661.344=0.976
Analyst SS (SSA) =1
g
h
i=1
Y 2
i − C.F
=1
3 (20.1)2 + (19.8)2 + (19.7)2 + (20.1)2 + (19.9)2 − 661.344
=661.3866667-661.344=0.0426667
Method SS (SSM)
=1
h
g
j=1
Y 2
j − C.F =1
5 (34.7)2 + (33.0)2 + (31.9)2 − 661.344=662.14-661.344=0.796
24 CHAPTER 3. ANALYSIS OF VARIANCE
Error SS (SSE):
= SST –SSA- SSM=0.976-0.0426667-0.796=0.1373333
Table 3.7: ANOVA table
Source of variation DF SS MS F-ratio
Analyst 4 0.04267 0.01067 0.620
Method 2 0.79600 0.39800 23.18
Error 8 0.13733 0.01717
Total 14 0.9760
Comparing the F calculated values, (0.62) and (23.18) for analyst and method with
the critical F values, (7.01) and (8.65) respectively, we do not reject the null hy-
pothesis for analyst while for method the null hypothesis is rejected. Hence, we
conclude that there is not enough evidence in these data to suspect that the analysts
differ. On the other hand, the data indicates significant differences in methods at
the 1% level of significance.
In SAS
SAS Program
data twoway;
input Analyst $ Method $ ppm @@;
cards;
A1 M1 7.0
A1 M2 6.5
A1 M3 6.6
A2 M1 6.9
A2 M2 6.7
A2 M3 6.2
A3 M1 6.8
A3 M2 6.5
A3 M3 6.4
A4 M1 7.1
A4 M2 6.7
A4 M3 6.3
A5 M1 6.9
A5 M2 6.6
A5 M3 6.4
;
run; proc print;run;quit;
proc anova;
class Analyst Method;
model ppm=Analyst Method;
run;quit;
The SAS System
Obs Analyst Method ppm
1 A1 M1 7.0
2 A1 M2 6.5
3 A1 M3 6.6
4 A2 M1 6.9
3.5. THE LEAST SIGNIFICANCE DIFFERENCE (LSD) 25
5 A2 M2 6.7
6 A2 M3 6.2
7 A3 M1 6.8
. . . .
. . . .
The ANOVA Procedure
Dependent Variable: ppm
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 6 0.83866667 0.13977778 8.14 0.0046
Error 8 0.13733333 0.01716667
Corrected Total 14 0.97600000
R-Square Coeff Var Root MSE ppm Mean
0.859290 1.973217 0.131022 6.640000
Source DF Anova SS Mean Square F Value Pr > F
Analyst 4 0.04266667 0.01066667 0.62 0.6601
Method 2 0.79600000 0.39800000 23.18 0.0005
3.5 The least significance difference (LSD)
If we reject the null hypothesis by the use of the F-test, we can carry out further
analyses, i.e., carry out pairwise comparisons of the levels of the factor (s) by the
use of t-test. We consider the situation where we have planned in advance of the
experiment to make certain comparisons among treatment means. In this case, each
comparison is important in its own right, and thus is to be viewed as separate, i.e.,
cannot be combined.
Suppose we have t treatments in the experiment, and we are interested in comparing
two treatments 1 and 2, with means µ1 and µ2 respectively. That is, we wish to test
the hypotheses:
H0 : µ1 = µ2 vs. H1 : µ1 = µ2
Test statistic: As our test statistic for H0 vs. H1, we use:
| ¯Y1. − ¯Y2.|
s¯Y1.− ¯Y2.
, s¯Y1.− ¯Y2.
= s
1
r1
+
1
r2
, s =
√
MSE
That is, instead of basing the estimate of σ2 on only the two treatments in question,
we use the estimate from all t treatments in the experiment. Here, r1 and r2 are
respectively the replicates of samples 1 and 2.
26 CHAPTER 3. ANALYSIS OF VARIANCE
Test procedure: Reject H0 in favour of H1 if
| ¯Y1. − ¯Y2.|
s¯Y1.− ¯Y2.
> tN−t,α/2
Here, N − t are the degrees of freedom for estimating σ2 (experimental error)
Note that the test procedure above for testing H0 against H1 may be rewritten as
follows:
Reject H0 if:
| ¯Y1. − ¯Y2.| > s¯Y1.− ¯Y2.
× tN−t,α/2, s =
√
MSE
Terminology: In comparing two treatment means from large experiments involving
t treatments, the value
s¯Y1.− ¯Y2.
× tN−t, α/2 = s
1
r1
+
1
r2
× tN−t,α/2, s =
√
MSE
is called the least significance difference (LSD) for the test of H0 vs. H1 based
on the entire experiment. Thus, from the above expression, we reject H0 in favour
of H1 at level α if
| ¯Y1. − ¯Y2.| > s¯Y1.− ¯Y2.
× tN−t,α/2
The case of equal replication: If all treatments are replicated equally, that is,
ri = r the value of the LSD is the same and is given by:
s¯Y1.− ¯Y2.
= s
2
r
, s =
√
MSE, LSD = s
2
r
× tN−t,α/2
Thus, in case of equal replications, all pairwise comparisons of interest require only
a single calculation.
Example
Consider the pea section data we discussed in Section 5.3.5. In this data we had
equal replications (r=10) and 5 treatments (t=5). Suppose it was decided in advance
that one investigator was interested in the particular question of whether 2% glucose
(treatment 2) differs from control.
Let µ1 denote the mean for the control andµ2, µ3, µ4,µ5 denote the means for the
sugar treatments, 2% glucose, 2% fructose, 1% fructose and 2% sucrose
respectively.
3.5. THE LEAST SIGNIFICANCE DIFFERENCE (LSD) 27
In this situation, we want to test the hypotheses:
H02 : µ1 = µ2 vs. H12 : µ1 = µ2
From the information given, we have
¯Y1. = 70.1, ¯Y2. = 59.3, s =
√
MSE=2.3357, N − t = 45, tN−t,α/2 = t45,0.025 = 2.01.
Thus
LSD = s
2
r
× tN−t,α/2 = (2.3357) 2/10(2.01) = 2.10
| ¯Y1. − ¯Y2.|=10.8 > 2.10
Conclusion
Since | ¯Y2. − ¯Y1.| = 10.8>LSD (= 2.10), we reject H02 at level of significance α
=0.05; there is sufficient evidence to suggest that the glucose treatment yields mean
pea section lengths different from the control.
Exercise
1. Suppose that another investigator was interested in the specific question of
whether the 2% fructose (treatment 3) differs from the control. That is, test for:
H03 : µ1 = µ3vs. H13 : µ1 = µ3 Use α=0.05
2. Test whether the means of the 2% glucose and 2% fructose differ significantly at
5% level of significance.
3. Calculate 99% confidence limits for the mean of treatment 4 (1% fructose)
28 CHAPTER 3. ANALYSIS OF VARIANCE
Chapter 4
Introduction to SPSS
4.1 Introduction
SPSS is an extremely useful statistical software package. It provides full statistical
analysis capabilities including data management, an analysis tool which embraces
both plain and sophisticated but interesting and easy to learn statistical techniques
one cannot afford to ignore in the analysis of real-life data! SPSS has historically
been applied extensively in the areas of social science, however, these days it is also
widely being used in other fields of study. The current version of SPSS is 21.
As mentioned, this text of SPSS is not part of MTH 201 course coverage (require-
ment) as you have seen in Section 1.1 but is meant to make you understand that all
computations of the different theoretical aspects that we have discussed and those
still to be discussed, though some have been illustrated using SAS, can also be done
in other software packages, SPSS being one of them. Other software packages from
which statistical analyses may be carried out include STATA, S-Plus/R. However,
S-Plus/R requires good programming knowledge to be able to use it!.
Unlike many software packages, SPSS is a user-friendly (easy to use), widely avail-
able and well documented such that one can quickly make reference to available and
easily accessible citation. These are among the reasons why I have chosen to give
you this text! Don’t forget that there is always no free lunch! Like SAS, S-Plus, and
STATA, SPSS is not free! You have to pay something to get it!
It is important to note that the SPSS statistical analyses presented in this text are
specific. That is, does not cover all features available in SPSS but focuses on only
few of the many analysis tools that SPSS can offer to the analyst. Consequently, to
sharpen your competency in using SPSS especially in carrying out more advanced
statistical analyses you are urged to refer to any SPSS Manual.
In SPSS, unlike with the other software packages, getting output is relatively easy;
however, one needs to be cautious-remember that “there is always no free lunch”.
29
30 CHAPTER 4. INTRODUCTION TO SPSS
James Steven, 1996 points out that because it is easy to get output, it is also easy to
get “garbage.” Hence, knowing exactly what to focus on in the printout so as to be
able to give a practical interpretation of the problem at hand is an important aspect
one needs to bear in mind when selecting the output to concentrate on.
Throughout all our illustrations in this text it is assumed that the reader will be ac-
cessing the data from disk or CD ROM already saved as an SPSS file. Meaning that
the data has already undergone through important treatments like editing, coding
etc. This is not always the case in practice. Often in practice analysts receive raw
data and do the required treatments themselves. Data management (e.g., merging,
interleaving or matching files) in SPSS is out of the scope of this text. For those
who are interested however, SAS is nicely set up for ease of file manipulation. In
this text I will however, briefly describe how data entry is done.
It is worth mentioning that coming up with a valid conclusion or answer to a spe-
cific scientific question of interest requires not only one’s competency in the software
package of analysis but also an understanding of several other facets such as knowing
what assumptions are important to check for a given analysis, adequate sample size,
and careful selection of variables.
SPSS do a wide range of analyses from simple descriptive statistics to various analysis
of variance designs and to all kinds of complex multivariate analyses (multivariate
analysis of variance –MANOVA-, factor analysis, multiple regression, discriminant
analysis, etc.). Multivariate analyses as listed above are complete arenas I do not
wish to enter into in this text. I limit myself into only those aspects expected to be
covered by the target group(s), i.e., some important SPSS environments or analysis
tools. However, I refer any reader interested with both the theoretical and practical
treatment of the complex multivariate analyses to the books by Johnson, R.A. and
Wichern, D.W. (1998) and Steven, J. (1996).
4.2 Starting SPSS
You can start SPSS in two different ways depending on how it is set up on your
computer. You can either double-click on the SPSS icon on the desktop of your
computer or click on the start button “normally” located at the lower left corner of
your computer then on programs, etc, as indicated in the root below:
Start>Programs>SPSS for Windows>SPSS 11.0 for Windows
When you click on the last option (SPSS 11.0 for Windows) of the above root you
will see the “Data Editor Window”
In general SPSS has four different types of windows namely:
4.2. STARTING SPSS 31
i. Data Editor;
ii. An output Window;
iii. A syntax Window; and
iv. A Chart Editor
We briefly describe each of these windows in turn.
The Data Editor Window
The Data Editor Window is where data can be entered and edited. The Data Editor
is further divided into a data view and a variable view.
At the top of the Data Editor you can see a menu line consisting of the following
options: File, Edit, View, Data, Transform, Analyze, Graphs, Utilities, Window and
Help.
Figure 1. Data Editor menu
For more details on how you can use each one of these options I refer you to any
SPSS manual. I focus my attention on the “Analyze Menu” Here is where all the
statistical analyses are carried out.
The Output Window
Through this window you can read the results of your analysis. Depending on the
analysis you are carrying out, you can also see graphs and you can copy your results
into another document (e.g., word) for further description.
The Syntax Window
The syntax window is used for coding your analysis manually. Through this window
the user can code more advanced analyses, which may not be available in the stan-
dard menu. To open the syntax window select File>New>Syntax. In the window
you can enter the program code you want SPSS to perform. This requires a little
more programming. However, when the code is ready to be run you mark it (with
your mouse) and select Run> Selection.
The Chart Editor
The chart editor is used when editing a graph in SPSS. To be able to edit your graph
you need first to double-click your graph.
32 CHAPTER 4. INTRODUCTION TO SPSS
4.3 Data entry
There are basically two ways to enter data into SPSS. One is to manually enter
directly into the Data Editor the other is to import data from a different program
or a text file. For example from Excel, SAS, etc. I will illustrate both options here.
For importing data, I will restrict myself to importing data from excel.
4.4 Keying data into SPSS
As we have seen above, when SPSS is opened, by default the Data Editor is opened
and this is where you can enter your data. Alternatively, to enter data go to
File>New>Data.
Before you start entering your data it is always a good idea to first give names to your
variables. This is done by selecting the variable view in the Data Editor window.
4.4.1 Osteopathic manipulation data set
The following is part of the data collected from a clinical trial1 whose prime objec-
tive was to compare the effect of an osteopathic manipulation with a control group
in measuring influence on blood flow at two different time points. This effect was
assessed in 80 volunteer healthy subjects aged between 17 and 69 years. Blood flow
(in ml/min) was measured from the right superficial femoral artery using Duplex-
Doppler while subjects lying down on a research table at baseline (minute-0: M1),
one minute after manipulation (M2) and four minutes after manipulation (M3).
The variable Patid in the table below represents patient’s identity number. The
variables initials, age, weight, height and gender carry the usual meaning. M1, M2
and M3 are as described above. Use this simple data set to practice data entry in
SPSS.
1
A clinical trial is study that investigates the efficacy of drug (s).
4.4. KEYING DATA INTO SPSS 33
Table 4.1: Osteopathic Manipulation data set
Patid Initials Age Weight Height Gender M1 M2 M3
1 SJ 31 75 178 M 109.5 262.1 136.4
2 TG 30 69 178 M 103.2 145.7 121.3
3 SF 38 73 176 M 221.2 231.7 111.9
4 WD 24 78 179 M 230.0 281.3 196.2
5 VF 54 73 162 F 112.4 120.4 139.7
6 SM 64 75 168 M 226.8 369.4 247.1
7 DWM 61 65 160 F 103.7 84.9 109.5
8 GM 34 60 166 F 178.5 139.6 154.6
9 VM 38 64 165 F 103.7 132.4 107.1
10 DWF 47 63 167 M 150.6 158.8 110.4
11 BE 49 85 172 M 149.1 72.0 96.2
12 CV 55 91 177 M 193.5 286.1 245.5
13 CG 25 69 170 F 183.4 270.6 183.7
Exercise
Enter these data in your SPSS Data Editor window without labelling the variables.
If you enter the data in SPSS without first giving names to the variables, SPSS labels
the variables as var00001, var00002, etc. Do you see this?
Next try to give names to the variables. As described above, you can give names to
your variables via the variable view in the Data Editor window. Alternatively you
can double click the variable.
Now click on the first cell of the first column “Name” and type the name of the first
variable as indicated in Table 3. That is, Patid, and then move on to the second cell
and type the name of the second variable and so on.
Under Type you define which type your variable is (numeric, string etc.). If you
place the marker in the Type cell, a button like the one in Figure 1 below appears.
Figure 1: Defining variables
This button indicates that you can click it and a window like the one below in Figure
2 will show:
34 CHAPTER 4. INTRODUCTION TO SPSS
Figure 2: Variable Type
Numeric is selected if your variable exists of numbers. String is selected if your
variable is a text (Male/Female). The same way you can specify Values and Missing.
By selecting Label you get the possibility to further explain the respective variable in
a sentence or so. This is often a very good idea since the variable name is restricted
to only 8 characters. Missing is selected when defining if missing values occur among
the observations of a variable.
In Values you can enter a label for each possible response value of a discrete variable
(e.g. 1 = Male and 2 = Female).
When entering a variable name the following rules must be obeyed in SPPS for it to
work:
i. The name has to start with a letter and not end with a full stop (.).
ii. No more than 8 characters can be entered.
iii. Do not enter space or other characters like e.g.! ? ‘, and *.
iv. No two variable names must be the same.
When all data are entered and variable names are given you can save your data via
select File>Save As. . . in the menu.
4.5 Opening an existing dataset
If the dataset already exists in SPSS file you can easily open it. Select File>Open. . . and
the dataset will automatically open in the Data Editor.
4.6. IMPORTING DATA 35
4.6 Importing data
Sometimes the data are available in a different format than an SPSS data file. E.g.
the data might be available as an Excel, SAS, or text file. As already mentioned we
describe how to import data from excel.
Importing data from Excel
If you want to use data from an Excel file in SPSS there are two ways to import the
data.
i. One is to simply mark all the data in the Excel window (excluding the variable
names) you want to enter into SPSS. Then copy and past them into the SPSS
data window. The disadvantage by using this method is that the variable
names cannot be included meaning that you will have to enter these manually
after pasting the data.
ii. The other option (where the variable names are automatically entered) is to
do the following:
• Open SPSS, select File> Open>Data. Choose the drive where the data are
stored and then double click on the file you want to open or mark the file and
click on the open icon on the open file menu.
Under Files of type you select Excel, press ‘Open’, and the data now appear
in the Data Editor in SPSS.
4.7 Exporting data
Exporting data from SPSS to a different program is done by selecting File Save
As. . . Under Save as type you select the format you want the data to be available in
e.g. Excel.
4.8 ANOVA for one-way classification in SPSS
Let us now see how we can use SPSS to perform analysis of variance for one way
classification. I will use the machine data set discussed in Section 5.3.4 to illustrate
the construction of the analysis of variance table. As described above, getting out-
puts in SPSS is simple. Assuming that you have already entered the data, what you
need to do next is to analyze the data by following the root below:
Analyze>Compare Means>One-Way ANOVA
If you click on the last option (One-Way ANOVA) you will see a window like the
one below:
36 CHAPTER 4. INTRODUCTION TO SPSS
The only dependent variable in this example is “volume” and the factor is “machine”.
You can also include descriptive statistics in your outputs by clicking on the ”Op-
tions” and then select descriptive.
Below is the SPSS ANOVA table for the machine data set.
SPSS ANOVA table-Machine data
Source of Sum of df Mean Square F Sig.
Variation Squares
Between Groups 0.152 3 5.072E-02 0.834 0.500
Within Groups 0.729 12 6.078E-02
Total 0.882 15
From the above ANOVA table we see that, the results presented in SPSS are approx-
imated to three decimal places. By default, SPSS, like any other software package
gives the p-value (s) of the test (s)-indicated as Sig. in the last column of the table.
The p-value indicates “how much evidence against the null hypothesis” has been
observed in the outcomes of the experiment. Based on the given p-value (> 0.5) we
do not reject the null hypothesis (remember we are testing the hypotheses at 0.05
level of significance), the conclusion we reached by comparing the F calculated value
(0.83) and the critical F-value (3.49) from the table. For comparison purposes of the
two ANOVA tables-the one we obtained before through mathematical calculations
and the one from SPSS above, I reproduce below the ANOVA table obtained by
mathematical computations. Are they similar?
Source of variation DF SS MS F- ratio
Between treatment 3 0.152145 0.051 0.83
Within treatment 12 0.729355 0.061
Total 15 0.881500
Exercise (optional)
Use the Pea section data set to perform analysis of variance for one way classification
4.8. ANOVA FOR ONE-WAY CLASSIFICATION IN SPSS 37
with equal replication. Compare your ANOVA table with the one obtained through
mathematical computations.
Note: The above two examples-the machine and pea section data sets illustrates
respectively what is termed as unbalanced and balanced data. Unbalanced in the
first case in the sense that there are unequal numbers of replications of machines
and balanced in the second case in the sense that the various sugar types are all
replicated equal number of times (10 times).
38 CHAPTER 4. INTRODUCTION TO SPSS
Chapter 5
Completely Randomized Design
5.1 Introduction
When the experimental units are assumed to be fairly uniform or homogeneous,
that is, no sources of variations other than the treatments are expected, grouping
them (applying error/local control principle) will be pointless in the sense that very
little (in terms of precision) may be gained. Thus, the simplest experimental design,
which incorporates only the first two principles (randomisation and replication) of
experimental designs, is the completely randomized design or CRD.
CRD is a design in which the treatments are assigned completely at random to the
experimental units, or vice-versa. Since we assume that there are no other sources of
variations in the experiment except the treatments under investigation, then CRD
imposes no restrictions, such as blocking on the allocation of the treatments to the
experimental units.
5.2 Layout
Suppose that we have t treatments under investigation and that the ithtreatment is
to be replicated ri times, i =1, 2, . . . , t. For an experiment with t treatments each
one replicated ri times, the total number of experimental units N =
t
i=1
ri. When
ri = r, that is, the case of equal replication, N=rt.
Definition: layout refers to the placement of treatment to the experimental units
subject to conditions of the design. Randomisation in CRD can be carried out by
using a random number table or any other probabilistic procedures.
Example
Suppose there are three treatments to be compared in a CRD. Suppose further
that the treatments are replicated 4, 3 and 5 times respectively. Thus, a total of
39
40 CHAPTER 5. COMPLETELY RANDOMIZED DESIGN
N =
t
i=1
ri=4 + 3 +5=12 experimental units. One possible layout of this experiment
is as follows:
T2
1
T1
2
T2
3
T3
4
T3
5
T1
6
T3
7
T1
8
T1
9
T3
10
T2
11
T3
12
5.3 Statistical analysis
The analysis of CRD is the same as that of one way classification. Let Yij be the
yield on the jthplot receiving treatment i. Thus, the model is:
Yij = µ + ti + eij, i=1, 2, . . . , t; j=1, 2, . . . r
Where:
µ is the grand mean (average) yield over all the N plots,
ti is the ith treatment effect
eij is the experimental error
Sums of squares are computed in the same way we discussed in one way classification.
5.3.1 Statistical hypotheses
The statistical hypotheses of interest as we stated before are:
Ho : µ1 = µ2 = ... = µt That is, the µi are all equal
H1 : µi = µj for at least one i = j. That is, the µi are not all equal.
Or simply
Ho :There is no variation among the treatments
H1 :Variation exists
5.4. ADVANTAGES AND DISADVANTAGES OF CRD 41
Table 5.1: ANOVA table
Source of Variation DF SS MS F-ratio
Treatments t-1 SSA SSA
t−1 = MSA MSA
MSE
Error N − t SSE SSE
N−t = MSE
Total N-1 SST
5.3.2 Test procedure
At level of significanceα, if F = MSA
MSE > Fα, [(t − 1) , (N − t)] then there is evidence
for no significance variation, i.e. we reject the null hypothesis. Otherwise, e do not
reject.
5.4 Advantages and disadvantages of CRD
5.4.1 Advantages
• Useful in small preliminary experiments and also in certain types of animal or
laboratory experiments where the experimental units are homogeneous.
• Flexibility in the number of treatments and the number of their replications.
• Provides maximum number of d.f. for the estimation of experimental error-
The precision of small experiment increases with error d.f.
5.4.2 Disadvantages
• Its use is restricted to those cases in which homogeneous experimental units
are available- local control not utilised. Thus, presence of entire variation may
inflate the experimental error.
• Rarely used in field experiments because the plots are not homogeneous.
5.5 Example
A sample of plant material is thoroughly mixed and 15 aliquots taken from it for
determination of potassium contents. 3 laboratory methods (I, II, and III) are em-
ployed. “I” being the one generally used. 5 aliquots are analysed by each method,
giving the following results (µg/ml).
I 1.83 1.81 1.84 1.83 1.79
Method II 1.85 1.82 1.88 1.86 1.84
III 1.80 1.84 1.80 1.82 1.79
Examine whether methods II and III give results comparable to those of method I.
Use α = 0.05
42 CHAPTER 5. COMPLETELY RANDOMIZED DESIGN
Calculations
The statistical model for this problem is Yij = µ + ti + eij. Here, i=1, 2, 3, j=1,2,
3, 4, 5.
Grand total, G =
3
i=1
5
j=1
Yij=1.83+1.81+. . . +1.79=27.4
Total number of observations, N =
k
i=1
ni =
3
i=1
ni = n1 + n2 + n3=5+5+5=15. In
the particular situation at hand (equal replication), N =rt =5×3=15
Correction factor, C.F=
3
i=1
5
j=1
Yij
2
rt = (27.4)2
15 = 750.76
15 =50.0507
Total sum of squared observations or uncorrected total sum of squares
3
i=1
5
j=1
Y 2
ij = (1.83)2
+ (1.81)2
+ ... + (1.79)2
= 50.0602
Total SS (SST) =
3
i=1
5
j=1
Y 2
ij − G2
rt =50.0602-50.0507=0.0095
=0.0095
Treatment (Method) totals: I=9.10, II=9.25, III=9.05
Treatment SS (SSTr) =1
r
k
i=1
Y 2
i − G2
rt , Yi =
r
j=1
Yij
=
1
5
(9.10)2
+ (9.25)2
+ (9.05)2
− 50.0507
=50.055-50.0507
=0.0043
Error SS (SSE) =SST-SSTr
=0.0095-0.0043
=0.0052
Table 5.2: ANOVA table
Source of Variation DF SS MS F-ratio
Between treatments 2 0.0043 0.00215 4.9654
Error (within treatments) 12 0.0052 0.00043
Total 14 0.0095
5.5. EXAMPLE 43
F0.05, 2, 12 = 3.89
Statistical hypothesis
Ho : µ1 = µ2 = µ3That is, the µi are all equal
H1 : µi = µjfor at least one i = j. That is, the µi are not all equal
Or
Ho: Methods do not differ
H1: Methods differ
Decision
Since the F calculated value (4.9654)> the critical F-value (3.89) at 0.05 level of
significance, we reject the null hypothesis and thus conclude that the laboratory
results depends on the method of analysis. That is, there exist significance variations
among the three laboratory methods.
To examine whether methods II and III give results comparable to those of method
I, we need to carry out further analysis using the t-test (LSD) as follows:
Let the mean of method I be denoted by ¯Y1., of method II by ¯Y2.and that of method
III by ¯Y3.
Thus, ¯Y1. = 9.10
5 = 1.82, ¯Y2. = 9.25
5 = 1.85, ¯Y3. = 9.05
5 = 1.81
Statistical Hypotheses
Here we need to test the hypotheses:
H02 : µ1 = µ3 vs. H12 : µ1 = µ3
H03 : µ1 = µ3 vs. H13 : µ1 = µ3
Test procedure
Reject: H02 if
| ¯Y2. − ¯Y1.| > LSD = s 2
r × tN−t,α/2
and H03 if | ¯Y3. − ¯Y1.| > LSD = s 2
r × tN−t,α/2
Exercises
1. Complete the test.
2. Eight varieties, A − H, of black currant cuttings are planted in square plots in a
nursery, each plot containing the same number of cuttings. Four plots of each variety
are planted, A and the shoot length made in the first growing season measured.
44 CHAPTER 5. COMPLETELY RANDOMIZED DESIGN
The plot totals are:
A: 46 29 39 35 E: 16 37 24 30
B: 37 31 28 44 F: 41 28 38 29
C: 38 50 32 36 G: 56 48 44 44
D: 34 19 29 41 H: 23 31 29 37
B and C are standard varieties; assess the remaining six for vigour in comparison
with B and C. Use α = 0.05
Chapter 6
Randomised Block Design
6.1 Introduction
CRD discussed in Chapter 3 will seldom be used if the experimental units are not
alike. Hence, when experimental units may be meaningfully grouped, e.g., by
area of field, device, hospital, salesmen, etc, clearly a completely randomised design
(CRD) will be insufficient. In this situation an alternative strategy for assigning
treatments to the experimental units, which takes advantage of the grouping, may
be used. The alternative strategy that we are going to discuss is what we call the
randomised block design or Randomised Complete Block Design (RCBD).
In the randomised block design:
• The groups are called blocks
• Each treatment appears the same number of times in each block; hence the
term complete block design
• The simplest case is that where each treatment appears exactly once in each
block. Here, because the number of replicates=number of experimental
units for each treatment,
we therefore have: number of replicates=number of blocks=r
• Blocks are often called replicates for this reason
• To set up such randomised block design the following steps are involved:
(i) Divide the units into r more homogeneous groups commonly known as blocks.
(ii) Assign the treatments at random to the experimental units within each block.
This randomisation has to be done afresh for each block.
Hence, the term randomised block design
45
46 CHAPTER 6. RANDOMISED BLOCK DESIGN
Motivation: experimental units within blocks are alike as possible, so observed
differences among them should be mainly attributed to the treatments. To ensure
this interpretation holds, in the conduct of the experiment, all experimental units
within a block should be treated as uniform as possible.
Intuitively speaking, randomised block design is an improvement over the CRD.
In the RBD the principle of local control can be applied along with the other two
principles of experimental design (randomisation and replication).
Number of experimental units (N )
Suppose we want to compare the effects of t treatments, each treatment being repli-
cated an equal number of times, say r times. Then we need N =rt experimental
units.
6.2 Layout
To illustrate the layout of an RBD, consider 4 treatments, each replicated 3 times.
So we need N =rt= 3×x4=12 experimental units which are grouped into 3 blocks
of 4 units.
Suppose the blocks formed after grouping the experimental units are labelled as 1,
2, and 3. To ensure randomness in every process involved in the experiment we
select the block to start with in allocating the treatments to the experimental units
at random. Assume the blocks are selected in the order 3, 1, 2. Thus, we start with
the third block and assign the 4 treatments at random to it. As we have discussed,
to assign the treatments, we may use any probabilistic procedure.
Permutations is one of the probabilistic procedures that may be used to allocate
treatments to experimental units. Suppose one of the permutations of the digits 1
to 4 for the treatment is 4, 1, 3, 2. Therefore we allocate treatment 4 in the first
unit of block 3, treatment 1 in the second unit of block 3, up to treatment 2 in the
fourth unit of block 3. That is, we have the following layout for block 3 (first selected
block).
T4 T1 T3 T2
Repeating the same procedure, suppose we select the permutations 3, 4, 2, 1 for
block 1 and 2, 3, 4, 1 for block 2, finally get the following complete layout.
6.3. STATISTICAL ANALYSIS 47
Block 1 T3 T4 T2 T1
Block 2 T2 T3 T4 T1
Block 3 T4 T1 T3 T2
6.3 Statistical analysis
The analysis of the design is the same as that of two-way classified data with one ob-
servation per cell-experimental unit- (without replication) we discussed in Section
5.5.
We use the same model we have discussed,
Yij = µ + ti + bj + eij, i=1, 2, . . ., t, j=1, 2, . . ., r
In words:
Observation of the ith treatment from the jth block =general mean +ith treatment
effect + jth block effect + experimental error component
RECAP: we partition the total sum of squares into different components:
Total SS=Treatment SS + Block SS + Error SS
6.3.1 Statistical hypotheses
The hypotheses of interest are:
HO1 : t1 = t2 = ... = tk
Ho2 : b1 = b2 = ... = bk
Against their alternative that tis, bjs are not all equal.
ANOVA table
Source of variation DF SS MS F-ratio
Blocks r-1 SSB SSB
r−1 = MSB MSB
MSE = FB
Treatment t-1 SSTr
SSTr
t−1 = MSTr
MSTr
MSE = FT r
Error (r-1)(t-1) SSE SSE
(r−1)(t−1) = MSE
Total N-1 SST
6.3.2 Test procedure
The calculated F-values for treatments and blocks (FBand FTr) are compared with
the tabulated (critical) F-values at (t-1) and (r-1)(t-1) and (r-1) and (r-1)(t-1) de-
grees of freedom respectively.
48 CHAPTER 6. RANDOMISED BLOCK DESIGN
In symbols
Fα, [(t − 1), (r − 1)(t − 1)] and Fα, [(r − 1), (r − 1)(t − 1)]
Thus, if FB>Fα, [(r − 1), (r − 1)(t − 1)] we reject the null hypothesis, otherwise we
do not reject. Also if FTr>Fα, [(t − 1), (r − 1)(t − 1)] we reject the null hypothesis,
otherwise we do not reject.
6.4 Advantages and disadvantages of RBD
6.4.1 Advantages
• Greater precision
• Increased scope of inference is possible because more experimental conditions
may be included
6.4.2 Disadvantages
• Large number of treatments increases the block size; as a result the block may
loose homogeneity leading to large experimental error.
• Any missing observation in a unit in a block will lead to either:
(i) discard the whole block
(ii) estimate the missing value from the unit by special missing plot technique.
6.5 Example
The following data are yields in bushels/acre from an agricultural experiment set out
in a randomised complete clock design. The experiment was designed to investigate
the differences in yield for seven hybrid varieties of wheat, labelled A-G here. A field
was divided into 5 blocks, each containing 7 plots. In each plot, the seven plots were
assigned at random to be planted with the seven varieties, one plot for each variety.
A yield was recorded for each plot. Examine whether varieties affect the yield. Use
α=0.05.
Variety
Block A B C D E F G Total
I 10 9 11 15 10 12 11 78
II 11 10 12 12 10 11 12 78
III 12 13 10 14 15 13 13 90
IV 14 15 13 17 14 16 15 104
V 13 14 16 19 17 15 18 112
Total 60 61 62 77 66 67 69 G=462
6.5. EXAMPLE 49
We assume that the measurements are approximately normally distributed, with the
same variance σ2.
Calculations
Total number of experimental observation (N) = r × t= 7×5=35
Grand total (G) =
t
i=1
r
j=1
yij=10 + 9 +. . . + 15 + 18=462
Correction factor (C.F) =G2
N = (462)2
35 =6098.4
Uncorrected total sum of squares
t
i=1
r
j=1
y2
ij=102 +92 +. . . + 152 + 182 =6314.0
Total sum of squares (SST) =
t
i=1
r
j=1
y2
ij-C.F= 6314.0 - 6098.4=215.6
Treatment (variety) sum of squares (SSTr) = 1
r
t
i=1
Y 2
i -C.F
=1
5(602 + ... + 692)-C.F
=6140.0-6098.4
=41.6
Block sum of squares (SSB) = 1
t
r
j=1
Y 2
j -C.F
=1
7(782 + ... + 1122)-C.F
=6232.6-6098.4
=134.2
Error sum of squares (SSE) =Total SS-Treatment SS-Block SS
= 215.6-134.2-41.6=39.8
Treatment (variety) mean squares (MSTr) = SSTr
t−1 = 41.6
6 =6.93
Block mean squares (MSB) = SSB
r−1 = 134.2
4 =33.54
Error mean squares (MSE) = SSE
(t−1)(r−1) = 39.8
24 =1.66.
Finally, we estimate the F-values. For block the F-calculated value is
50 CHAPTER 6. RANDOMISED BLOCK DESIGN
=MSB
MSE = 33.54
1.66 =20.21 and
for treatments the F-calculated value is MSTr
MSE = 6.93
1.66=4.18
We summarize the calculations in an ANOVA table as follows:
ANOVA Table
Source of variation D.F SS MS F-ratio
Blocks 4 134.2 33.54 20.21
Treatments 6 41.6 6.93 4.18
Error 24 39.8 1.66
Total 34 215.6
To perform the hypothesis test for differences among the treatment means, we com-
pare the F calculated values to the appropriate value from the F table. For level of
significance α=0.05, we have
F0.05;6,24 = 2.51. F-calculated (= 4.18)> F0.05;6,24 = 2.51.
Therefore, we reject H0. There is evidence in these data to suggest that there are
differences in mean yields among the varieties.
To test the hypothesis on block differences, we find F0.05,4,24=2.78. We have 20.21>2.87,
thus, we also reject H0. There is strong evidence in these data to suggest differ-
ences in mean yield across blocks at the 5% level of significance.
In SAS
data hybrid;
input Block $ Variety $ Yield @@;
cards;
I A 10 I B 9 I C 11 I D 15 I E 10 I F 12 I G 11
II A 11 II B 10 II C 12 II D 12 II E 10 II F 11 II G 12
III A 12 III B 13 III C 10 III D 14 III E 15 III F 13 III G 13
IV A 14 IV B 15 IV C 13 IV D 17 IV E 14 IV F 16 IV G 15
V A 13 V B 14 V C 16 V D 19 V E 17 V F 15 V G 18
;
run;
proc print;run;
proc anova;
class Block Variety ;
model Yield=Block Variety;
run;quit;
The SAS System
Obs Block Variety Yield
1 I A 10
6.5. EXAMPLE 51
2 I B 9
3 I C 11
4 I D 15
5 I E 10
6 I F 12
7 I G 11
8 II A 11
9 II B 10
10 II C 12
11 II D 12
12 II E 10
13 II F 11
14 II G 12
15 III A 12
16 III B 13
17 III C 10
18 III D 14
19 III E 15
20 III F 13
21 III G 13
22 IV A 14
23 IV B 15
24 IV C 13
25 IV D 17
26 IV E 14
27 IV F 16
28 IV G 15
29 V A 13
30 V B 14
31 V C 16
32 V D 19
33 V E 17
34 V F 15
35 V G 18
Class Level Information
Class Levels Values
Block 5 I II III IV V
Variety 7 A B C D E F G
Number of observations 35
Dependent Variable: Yield
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 10 175.7714286 17.5771429 10.59 <.0001
Error 24 39.8285714 1.6595238
Corrected Total 34 215.6000000
R-Square Coeff Var Root MSE Yield Mean
0.815266 9.759281 1.288225 13.20000
52 CHAPTER 6. RANDOMISED BLOCK DESIGN
Source DF Anova SS Mean Square F Value Pr > F
Block 4 134.1714286 33.5428571 20.21 <.0001
Variety 6 41.6000000 6.9333333 4.18 0.0052
6.6 Reasons for blocking in RBD
Note from these results that the blocking served to explain much of the overall vari-
ation. To appreciate this further, suppose that we had not blocked the experiment,
but instead had just conducted the experiment according to a completely random-
ized design. Suppose that we ended up with the same data as in the experiment
above.
Under these conditions variety is the only classification factor for the plots, and we
would construct the following analysis of variance table for one way-classification as
discussed in the previous sections.
ANOVA Table
Source of variation DF SS MS F-ratio
Between treatments (varieties) 6 41.6 6.93 1.12
Within treatments (error) 28 174.0 6.21
Total 34 215.6
The test for differences in mean yield for the varieties (treatments) would be to
compare F=1.12 to F0.05,6,28=2.45. Note that we would thus not reject H0 of no
treatment differences at the 5% level of significance.
Concluding Remark
From the above example, it is immediately clear that if the different sources of
variation are not properly identified (e.g., due to erroneously accounting for the
experimental design), then invalid conclusions will be drawn.
The example discussed above clearly demonstrates the aspect of wrongly identify-
ing the experimental design. In the one-way classification experiment and analysis
presented above, there is no accounting for the variation in the data that is actually
attributable to a systematic source, position in the field (the factor used to block
the experiment). The one-way analysis has no choice but to attribute this variation
to experimental error; that is, it regards this variation as just part of the inherent
variation among experimental units that we cannot explain. The result is that the
Error SS in the one-way analysis contains both variation due to position in the field
(which is actually systematic variation) and inherent variation.
Here, note that 143.2+39.8=174.0 and 4+24=28.
This is the Error SS for the one-way classification analysis. Which actually may be
regarded as ignoring the blocks (because what we really did was to pretend that the
6.6. REASONS FOR BLOCKING IN RBD 53
blocks didn’t exist) and thus resulting into big MSE in which we could not reject the
H0. By blocking the experiment, and explicitly acknowledging position in the field
as a potential source of variation, MSE was sufficiently reduced so that we could
identify variety differences.
It can be learned from this example that:
• Blocking may be an effective means of explaining variation (increasing preci-
sion) so that differences among treatments that may really exist are more likely
to be detected.
• The data from an experiment set up according to a particular design should
be analysed according to the appropriate procedure for that design. The
above shows that if we set up the experiment according to a randomised com-
plete block design, but then analyse it as if it had been set up according to
a completely randomised design, erroneous inferences results, in this case,
failure to identify real treatment differences. Remember, the design of an
experiment dictates the analysis!!
Exercise
Four different plant densities A-D are included in an experiment on the growth of
lettuce. The experiment is laid out as a randomised block, and the same number of
plants is harvested from each plot, giving the weights (recorded) below. Examine
whether density appears to affect the yield. Use α = 0.01
54 CHAPTER 6. RANDOMISED BLOCK DESIGN
Block
Density I II III IV V VI
A 2.7 2.6 3.1 3.0 2.5 3.0
B 3.0 2.8 3.1 3.2 2.8 3.1
C 3.3 3.3 3.5 3.4 3.0 3.2
D 3.2 3.0 3.3 3.2 3.0 3.1
Chapter 7
Latin Square Design
7.1 Introduction
There are often situations where it may be necessary to account for two sources
of variation by blocking. If the number of treatments and levels of each blocking
factor is large, the size of the experiment may become unwieldy or resources may be
limited. Thus, in agricultural field experiments (and other situations), a particular
setup is often used that allows differences among treatments to be assessed with less
recourses.
The principle of local control was used in the RBD by grouping the units in one
way; i.e. according to blocks. The grouping can be carried one step forward and
we can group the units in two ways, each way corresponding to a known source of
variation among the units, and get the Latin Square Design (LSD). This design is
used with advantage in agricultural experiments where the fertility contours are not
always known. It has also been used successfully in industry and in the laboratory.
Latin square design is a design, which uses the principle of local control twice.
RBD removes one systematic source of variation in addition to treatments, but LSD
removes two such sources. Hence, LSD is a three-way classification.
In a field experiment if two probable fertility trends can be thought of, in directions
at right angles, both need to be made the basis of blocking. Thus when there is a
slope of the land being used, and also a climatic trend (e.g. effects of wind, rain) at
right angles to this, a randomised block cannot take out all the known variation.
7.2 Layout
In field experiments, the physical layout is that of a square with rows of plots. In
this set up the layout is such that every letter (A, B, C,. . . ) the set of treatments
occurs exactly once in each row and in each column. For four letters A, B, C, and
D the layout would be as shown below.
55
56 CHAPTER 7. LATIN SQUARE DESIGN
Column 1 2 3 4 5
1 E B A D C
2 C A D E B
Row 3 B E C A D
4 A D B C E
5 D C E B A
This type of setup would be useful when for example variability due to soil differences,
etc, arises in two directions. Each plot would constitute a single experimental unit.
This particular kind of setup with two blocking variables (rows and columns), in
which the numbers of rows, columns and treatments are the same, is known as a
Latin square.
Notation
Because the number of treatments, rows and columns are the same, the number
of replicates on each treatment is equal to the number of treatments, rows, and
columns. We will denote this as t. For given value of t, there may be several ways
to construct a Latin square. This is actually as mathematical exercise. Extensive
listings of ways to construct Latin squares for different values of t are often given in
texts on experimental design (see for example, Montgomery, 2001).
Selection of LSD
The totality of LSD’s obtained from a single LSD by permuting rows, columns and
treatments (letters) is called a transformation set.
e.g. A B C D fixed
B C D A
C D A B
D A B C
A B C D
C D A B
D A B C
B C D A
A k×k Latin Square with k letters A, B, C, . . . in the natural order occurring in the
first column is called a standard square (square in canonical form).
e.g. A B C D
B C D A
C D A B
D A B C
From a standard k × k Latin Square, we may obtain k! (k-1)! Different LSD’s by
permuting all the k columns and the (k-1) rows except the first row. Hence there
7.3. STATISTICAL ANALYSIS 57
are in all k! (k-1)! Different LSDs with the same standard square. Thus the total
number of different LSDs in a transformation set is k! (k-1)! times the number of
standard LSDs in the set. In order to give all k × k LSDs equal probability of being
selected, we select one LSD from all k × k LSDs and then randomise the columns
and rows, excluding the first row (if it is the fixed one).
Randomisation consists of choosing one of the possible designs for given t at random.
Then randomly assign the letters A, B, C, etc to the treatments of interest
7.2.1 Linear additive model
To write down a model, we need to be a bit careful with the notation. The key is
that, although we have three classifications (row, column and treatment) we do not
have t × t × t = t3 observations; rather we only have t × t = t2
The mathematical model will now be,
yijk = µ + ti + rj + ck + eijk
where:
yijk observation for the ith treatment appearing in row j, column k
µ is an overall mean
rj, ck represents the effects of the jth row and kth column
ti represents the effect of the treatment appearing at position, j,k
eijk error associated with the experimental unit appearing at position j,k
7.3 Statistical analysis
The analysis proceeds along the same lines as for RBD, but instead of the one sum
of squares for blocks; systematic variation is now taken out by two sums of squares,
which are always called Rows (SSR) and columns (SSC).
7.3.1 Calculation of sums of squares
To setup the analysis of variance we define Rjand Ck as the totals of all plots in the
jth row and kth column respectively in the layout, the sums of squares required are:
Total SS SST =
t
i=1
t
j=1
t
k=1
y2
ijk − G2
t2
Row SS (SSR)=1
t
t
j=1
R2
j − G2
t2
Column SS (SSC)=1
t
t
k=1
C2
k − G2
t2
58 CHAPTER 7. LATIN SQUARE DESIGN
Treatment SS (SSTr) = 1
t
t
i=1
T2
i − G2
t2
Error SS (SSE)= SST –(SSR+SSC+SSTr) or
SSE =SST – SSR –SSC -SSTr
The degrees of freedom for SSR, SSC, SSTr are each (t-1); for SST, (t2-1), and so
for SSE, (t2-1)-3 (t-1), reducing to (t-1)(t-2).
Table 7.1: Three way ANOVA table
Source of DF SS MS F-ratio
variation
Rows t-1 SSR SSR
t−1 = MSR MSR
MSE = FR
Columns t-1 SSC SSC
t−1 = MSC MSC
MSE = FC
Treatments t-1 SSTr SSTr
t−1 = MSTr
MSTr
MSE = FTr
Error (t-1)(t-2) SSE SSE
(t−1)(t−2) = MSE
Total t2-1 SST
7.4 Advantages and disadvantages of LSD
7.4.1 Advantages
• Eliminates from the error two major sources of variation. Hence LSD is an
improvement over RBD in controlling error by planned grouping just as the
RBD is an improvement over CRD.
• LSD is a 3-way incomplete layout since LSD considers treatments, rows, and
columns at the same number of levels t, we would need a complete three way
layout of t3 number of experimental units. However, since we are using t2
number of experimental units, then it is said to be a 3-way incomplete layout.
7.4.2 Disadvantages
• A serious limitation of the LSD is that the number of replicates must be the
same as the number of treatments, the larger the square the more is the repli-
cates, hence the bigger the blocks (columns and rows). Hence larger squares
(over 12×12 ) are seldom used in the sense that the squares does not remain
homogeneous. On the other hand, small squares provide only a few degrees of
freedom for the error. Preferable LSDs are form 5×5 to 8×8.
• The analysis depends heavily on the assumption that there are no interactions
present.
• Analysis becomes very difficult where there are missing observations.
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOK

More Related Content

What's hot

Mark Quinn Thesis
Mark Quinn ThesisMark Quinn Thesis
Mark Quinn ThesisMark Quinn
 
ubc_2015_november_angus_edward
ubc_2015_november_angus_edwardubc_2015_november_angus_edward
ubc_2015_november_angus_edwardTed Angus
 
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image RegistrationEfficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image RegistrationEnrique Muñoz Corral
 
Stochastic Programming
Stochastic ProgrammingStochastic Programming
Stochastic ProgrammingSSA KPI
 
Mansour_Rami_20166_MASc_thesis
Mansour_Rami_20166_MASc_thesisMansour_Rami_20166_MASc_thesis
Mansour_Rami_20166_MASc_thesisRami Mansour
 
Applying Machine Learning Techniques to Revenue Management
Applying Machine Learning Techniques to Revenue ManagementApplying Machine Learning Techniques to Revenue Management
Applying Machine Learning Techniques to Revenue ManagementAhmed BEN JEMIA
 
Introduction to methods of applied mathematics or Advanced Mathematical Metho...
Introduction to methods of applied mathematics or Advanced Mathematical Metho...Introduction to methods of applied mathematics or Advanced Mathematical Metho...
Introduction to methods of applied mathematics or Advanced Mathematical Metho...Hamed Oloyede
 
Lecture notes on hybrid systems
Lecture notes on hybrid systemsLecture notes on hybrid systems
Lecture notes on hybrid systemsAOERA
 
Morton john canty image analysis and pattern recognition for remote sensing...
Morton john canty   image analysis and pattern recognition for remote sensing...Morton john canty   image analysis and pattern recognition for remote sensing...
Morton john canty image analysis and pattern recognition for remote sensing...Kevin Peña Ramos
 
The gage block handbook
The gage block handbookThe gage block handbook
The gage block handbookgoyito13
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...valentincivil
 
Applied Stochastic Processes
Applied Stochastic ProcessesApplied Stochastic Processes
Applied Stochastic Processeshuutung96
 
3016 all-2007-dist
3016 all-2007-dist3016 all-2007-dist
3016 all-2007-distNYversity
 
10.1.1.3.9670
10.1.1.3.967010.1.1.3.9670
10.1.1.3.9670reema2601
 
Smith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwuSmith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwuProto Gonzales Rique
 
Thermal and statistical physics h. gould, j. tobochnik-1
Thermal and statistical physics   h. gould, j. tobochnik-1Thermal and statistical physics   h. gould, j. tobochnik-1
Thermal and statistical physics h. gould, j. tobochnik-1Petrus Bahy
 

What's hot (20)

Mark Quinn Thesis
Mark Quinn ThesisMark Quinn Thesis
Mark Quinn Thesis
 
ubc_2015_november_angus_edward
ubc_2015_november_angus_edwardubc_2015_november_angus_edward
ubc_2015_november_angus_edward
 
thesis
thesisthesis
thesis
 
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image RegistrationEfficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
 
Stochastic Programming
Stochastic ProgrammingStochastic Programming
Stochastic Programming
 
Mansour_Rami_20166_MASc_thesis
Mansour_Rami_20166_MASc_thesisMansour_Rami_20166_MASc_thesis
Mansour_Rami_20166_MASc_thesis
 
feilner0201
feilner0201feilner0201
feilner0201
 
Applying Machine Learning Techniques to Revenue Management
Applying Machine Learning Techniques to Revenue ManagementApplying Machine Learning Techniques to Revenue Management
Applying Machine Learning Techniques to Revenue Management
 
Introduction to methods of applied mathematics or Advanced Mathematical Metho...
Introduction to methods of applied mathematics or Advanced Mathematical Metho...Introduction to methods of applied mathematics or Advanced Mathematical Metho...
Introduction to methods of applied mathematics or Advanced Mathematical Metho...
 
Lecture notes on hybrid systems
Lecture notes on hybrid systemsLecture notes on hybrid systems
Lecture notes on hybrid systems
 
Morton john canty image analysis and pattern recognition for remote sensing...
Morton john canty   image analysis and pattern recognition for remote sensing...Morton john canty   image analysis and pattern recognition for remote sensing...
Morton john canty image analysis and pattern recognition for remote sensing...
 
The gage block handbook
The gage block handbookThe gage block handbook
The gage block handbook
 
Frmsyl1213
Frmsyl1213Frmsyl1213
Frmsyl1213
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
 
Applied Stochastic Processes
Applied Stochastic ProcessesApplied Stochastic Processes
Applied Stochastic Processes
 
3016 all-2007-dist
3016 all-2007-dist3016 all-2007-dist
3016 all-2007-dist
 
10.1.1.3.9670
10.1.1.3.967010.1.1.3.9670
10.1.1.3.9670
 
Smith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwuSmith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwu
 
phd_unimi_R08725
phd_unimi_R08725phd_unimi_R08725
phd_unimi_R08725
 
Thermal and statistical physics h. gould, j. tobochnik-1
Thermal and statistical physics   h. gould, j. tobochnik-1Thermal and statistical physics   h. gould, j. tobochnik-1
Thermal and statistical physics h. gould, j. tobochnik-1
 

Similar to Mth201 COMPLETE BOOK

Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdf
Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdfNavarro & Foxcroft (2018). Learning statistics with jamovi (1).pdf
Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdfTerimSura
 
An Introduction to Statistical Inference and Its Applications.pdf
An Introduction to Statistical Inference and Its Applications.pdfAn Introduction to Statistical Inference and Its Applications.pdf
An Introduction to Statistical Inference and Its Applications.pdfSharon Collins
 
Applied Statistics With R
Applied Statistics With RApplied Statistics With R
Applied Statistics With RTracy Drey
 
Introductory Statistics Explained.pdf
Introductory Statistics Explained.pdfIntroductory Statistics Explained.pdf
Introductory Statistics Explained.pdfssuser4492e2
 
Am06 complete 16-sep06
Am06 complete 16-sep06Am06 complete 16-sep06
Am06 complete 16-sep06Nemo Pham
 
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...ssuserfa7e73
 
Statistics for economists
Statistics for economistsStatistics for economists
Statistics for economistsMt Ch
 
Methods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfMethods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfComrade15
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspectivee2wi67sy4816pahn
 
Szalas cugs-lectures
Szalas cugs-lecturesSzalas cugs-lectures
Szalas cugs-lecturesHanibei
 
BACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WT
BACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WTBACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WT
BACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WTÀlex Garcia Manzanera
 
probability_stats_for_DS.pdf
probability_stats_for_DS.pdfprobability_stats_for_DS.pdf
probability_stats_for_DS.pdfdrajou
 
An Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdfAn Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdfvishnuraj764102
 
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdfssusere02009
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance SpectraCarl Sapp
 

Similar to Mth201 COMPLETE BOOK (20)

Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdf
Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdfNavarro & Foxcroft (2018). Learning statistics with jamovi (1).pdf
Navarro & Foxcroft (2018). Learning statistics with jamovi (1).pdf
 
An Introduction to Statistical Inference and Its Applications.pdf
An Introduction to Statistical Inference and Its Applications.pdfAn Introduction to Statistical Inference and Its Applications.pdf
An Introduction to Statistical Inference and Its Applications.pdf
 
book.pdf
book.pdfbook.pdf
book.pdf
 
MLBOOK.pdf
MLBOOK.pdfMLBOOK.pdf
MLBOOK.pdf
 
Applied Statistics With R
Applied Statistics With RApplied Statistics With R
Applied Statistics With R
 
Introductory Statistics Explained.pdf
Introductory Statistics Explained.pdfIntroductory Statistics Explained.pdf
Introductory Statistics Explained.pdf
 
Scikit learn 0.16.0 user guide
Scikit learn 0.16.0 user guideScikit learn 0.16.0 user guide
Scikit learn 0.16.0 user guide
 
Am06 complete 16-sep06
Am06 complete 16-sep06Am06 complete 16-sep06
Am06 complete 16-sep06
 
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
 
Statistics for economists
Statistics for economistsStatistics for economists
Statistics for economists
 
Methods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfMethods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdf
 
Dm
DmDm
Dm
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspective
 
Jmetal4.5.user manual
Jmetal4.5.user manualJmetal4.5.user manual
Jmetal4.5.user manual
 
Szalas cugs-lectures
Szalas cugs-lecturesSzalas cugs-lectures
Szalas cugs-lectures
 
BACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WT
BACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WTBACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WT
BACHELOR_THESIS_ACCELERATIOM-BASED_CONTROL_OF_OFFSHORE_WT
 
probability_stats_for_DS.pdf
probability_stats_for_DS.pdfprobability_stats_for_DS.pdf
probability_stats_for_DS.pdf
 
An Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdfAn Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdf
 
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance Spectra
 

More from musadoto

The design of Farm cart 0011 report 1 2020
The design of Farm cart 0011  report 1 2020The design of Farm cart 0011  report 1 2020
The design of Farm cart 0011 report 1 2020musadoto
 
IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018 ...
IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018    ...IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018    ...
IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018 ...musadoto
 
CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...
CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...
CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...musadoto
 
Assignment thermal 2018 . ...
Assignment thermal 2018                   .                                  ...Assignment thermal 2018                   .                                  ...
Assignment thermal 2018 . ...musadoto
 
BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018
BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018
BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018musadoto
 
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018musadoto
 
Hardeninig of steel (Jominy test)-CoET- udsm
Hardeninig of steel (Jominy test)-CoET- udsmHardeninig of steel (Jominy test)-CoET- udsm
Hardeninig of steel (Jominy test)-CoET- udsmmusadoto
 
Ultrasonic testing report-JUNE 2018
Ultrasonic testing report-JUNE 2018Ultrasonic testing report-JUNE 2018
Ultrasonic testing report-JUNE 2018musadoto
 
Ae 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solution
Ae 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solutionAe 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solution
Ae 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solutionmusadoto
 
Fluid mechanics ...
Fluid mechanics                                                              ...Fluid mechanics                                                              ...
Fluid mechanics ...musadoto
 
Fluid mechanics (a letter to a friend) part 1 ...
Fluid mechanics (a letter to a friend) part 1                                ...Fluid mechanics (a letter to a friend) part 1                                ...
Fluid mechanics (a letter to a friend) part 1 ...musadoto
 
Fluids mechanics (a letter to a friend) part 1 ...
Fluids mechanics (a letter to a friend) part 1                               ...Fluids mechanics (a letter to a friend) part 1                               ...
Fluids mechanics (a letter to a friend) part 1 ...musadoto
 
Fresh concrete -building materials for engineers
Fresh concrete -building materials  for engineersFresh concrete -building materials  for engineers
Fresh concrete -building materials for engineersmusadoto
 
surveying- lecture notes for engineers
surveying- lecture notes for engineerssurveying- lecture notes for engineers
surveying- lecture notes for engineersmusadoto
 
Fresh concrete -building materials for engineers
Fresh concrete -building materials  for engineersFresh concrete -building materials  for engineers
Fresh concrete -building materials for engineersmusadoto
 
DIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWER
DIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWERDIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWER
DIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWERmusadoto
 
Farm and human power REPORT - AE 215-SOURCES OF FARM POWER
Farm and human power  REPORT - AE 215-SOURCES OF FARM POWER Farm and human power  REPORT - AE 215-SOURCES OF FARM POWER
Farm and human power REPORT - AE 215-SOURCES OF FARM POWER musadoto
 
ENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWER
ENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWERENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWER
ENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWERmusadoto
 
TRACTOR POWER REPORT -AE 215 SOURCES OF FARM POWER 2018
TRACTOR POWER REPORT -AE 215  SOURCES OF FARM POWER 2018TRACTOR POWER REPORT -AE 215  SOURCES OF FARM POWER 2018
TRACTOR POWER REPORT -AE 215 SOURCES OF FARM POWER 2018musadoto
 
WIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWER
WIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWERWIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWER
WIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWERmusadoto
 

More from musadoto (20)

The design of Farm cart 0011 report 1 2020
The design of Farm cart 0011  report 1 2020The design of Farm cart 0011  report 1 2020
The design of Farm cart 0011 report 1 2020
 
IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018 ...
IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018    ...IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018    ...
IRRIGATION SYSTEMS AND DESIGN - IWRE 317 questions collection 1997 - 2018 ...
 
CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...
CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...
CONSTRUCTION [soil treatment, foundation backfill, Damp Proof Membrane[DPM] a...
 
Assignment thermal 2018 . ...
Assignment thermal 2018                   .                                  ...Assignment thermal 2018                   .                                  ...
Assignment thermal 2018 . ...
 
BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018
BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018
BASICS OF COMPUTER PROGRAMMING-TAKE HOME ASSIGNMENT 2018
 
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018
ENGINEERING SYSTEM DYNAMICS-TAKE HOME ASSIGNMENT 2018
 
Hardeninig of steel (Jominy test)-CoET- udsm
Hardeninig of steel (Jominy test)-CoET- udsmHardeninig of steel (Jominy test)-CoET- udsm
Hardeninig of steel (Jominy test)-CoET- udsm
 
Ultrasonic testing report-JUNE 2018
Ultrasonic testing report-JUNE 2018Ultrasonic testing report-JUNE 2018
Ultrasonic testing report-JUNE 2018
 
Ae 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solution
Ae 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solutionAe 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solution
Ae 219 - BASICS OF PASCHAL PROGRAMMING-2017 test manual solution
 
Fluid mechanics ...
Fluid mechanics                                                              ...Fluid mechanics                                                              ...
Fluid mechanics ...
 
Fluid mechanics (a letter to a friend) part 1 ...
Fluid mechanics (a letter to a friend) part 1                                ...Fluid mechanics (a letter to a friend) part 1                                ...
Fluid mechanics (a letter to a friend) part 1 ...
 
Fluids mechanics (a letter to a friend) part 1 ...
Fluids mechanics (a letter to a friend) part 1                               ...Fluids mechanics (a letter to a friend) part 1                               ...
Fluids mechanics (a letter to a friend) part 1 ...
 
Fresh concrete -building materials for engineers
Fresh concrete -building materials  for engineersFresh concrete -building materials  for engineers
Fresh concrete -building materials for engineers
 
surveying- lecture notes for engineers
surveying- lecture notes for engineerssurveying- lecture notes for engineers
surveying- lecture notes for engineers
 
Fresh concrete -building materials for engineers
Fresh concrete -building materials  for engineersFresh concrete -building materials  for engineers
Fresh concrete -building materials for engineers
 
DIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWER
DIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWERDIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWER
DIESEL ENGINE POWER REPORT -AE 215 -SOURCES OF FARM POWER
 
Farm and human power REPORT - AE 215-SOURCES OF FARM POWER
Farm and human power  REPORT - AE 215-SOURCES OF FARM POWER Farm and human power  REPORT - AE 215-SOURCES OF FARM POWER
Farm and human power REPORT - AE 215-SOURCES OF FARM POWER
 
ENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWER
ENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWERENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWER
ENGINE POWER PETROL REPORT-AE 215-SOURCES OF FARM POWER
 
TRACTOR POWER REPORT -AE 215 SOURCES OF FARM POWER 2018
TRACTOR POWER REPORT -AE 215  SOURCES OF FARM POWER 2018TRACTOR POWER REPORT -AE 215  SOURCES OF FARM POWER 2018
TRACTOR POWER REPORT -AE 215 SOURCES OF FARM POWER 2018
 
WIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWER
WIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWERWIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWER
WIND ENERGY REPORT AE 215- 2018 SOURCES OF FARM POWER
 

Recently uploaded

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 

Recently uploaded (20)

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

Mth201 COMPLETE BOOK

  • 1. MTH 201: Biometry Lecture Notes October 2013
  • 2. 2 Department of Biometry and Mathematics Faculty of Science Sokoine University of Agriculture MTH 201: Biometry Lecture Notes Kassile, T. Office Room # 9, KEPA, SMC, Mazimbu
  • 3. Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 0.1.1 Course Objective . . . . . . . . . . . . . . . . . . . . . . . . . v 0.1.2 Course Description . . . . . . . . . . . . . . . . . . . . . . . . v 0.1.3 Pre-requisite . . . . . . . . . . . . . . . . . . . . . . . . . . . v 0.1.4 Course requirement . . . . . . . . . . . . . . . . . . . . . . . v 0.1.5 Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 0.1.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Terminologies in Experimental Designs 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Principles of experimental designs 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Randomization principle . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Replication principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Local control principle . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Analysis of Variance 9 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Assumptions in the analysis of variance . . . . . . . . . . . . . . . . 10 3.3 Analysis of variance for one-way classification . . . . . . . . . . . . . 11 3.3.1 Analysis of variance for one-way classification with unequal replication (unbalanced data) . . . . . . . . . . . . . . . . . . 11 3.3.2 Linear additive model for one-way classification . . . . . . . . 12 3.3.3 Fixed vs. random effects . . . . . . . . . . . . . . . . . . . . . 12 i
  • 4. ii TABLE OF CONTENTS 3.3.4 Calculation of sums of squares . . . . . . . . . . . . . . . . . 13 3.3.5 ANOVA for one-way classification with equal replication (bal- anced data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 ANOVA for two-way classification (Without Replication) . . . . . . 21 3.4.1 Linear additive model for two-way classification . . . . . . . . 21 3.5 The least significance difference (LSD) . . . . . . . . . . . . . . . . . 25 4 Introduction to SPSS 29 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Starting SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Data entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 Keying data into SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1 Osteopathic manipulation data set . . . . . . . . . . . . . . . 32 4.5 Opening an existing dataset . . . . . . . . . . . . . . . . . . . . . . . 34 4.6 Importing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.7 Exporting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.8 ANOVA for one-way classification in SPSS . . . . . . . . . . . . . . . 35 5 Completely Randomized Design 39 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3.1 Statistical hypotheses . . . . . . . . . . . . . . . . . . . . . . 40 5.3.2 Test procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4 Advantages and disadvantages of CRD . . . . . . . . . . . . . . . . . 41 5.4.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Randomised Block Design 45 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3.1 Statistical hypotheses . . . . . . . . . . . . . . . . . . . . . . 47 6.3.2 Test procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 47
  • 5. TABLE OF CONTENTS iii 6.4 Advantages and disadvantages of RBD . . . . . . . . . . . . . . . . . 48 6.4.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.4.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.6 Reasons for blocking in RBD . . . . . . . . . . . . . . . . . . . . . . 52 7 Latin Square Design 55 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.2.1 Linear additive model . . . . . . . . . . . . . . . . . . . . . . 57 7.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7.3.1 Calculation of sums of squares . . . . . . . . . . . . . . . . . 57 7.4 Advantages and disadvantages of LSD . . . . . . . . . . . . . . . . . 58 7.4.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7.4.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 8 Factorial Experiments 65 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.2 Main effects and interaction effects . . . . . . . . . . . . . . . . . . . 66 8.3 The 22 factorial experiments . . . . . . . . . . . . . . . . . . . . . . . 66 8.4 The 23 factorial experiments . . . . . . . . . . . . . . . . . . . . . . . 68 8.5 Sum of squares due to factorial effects . . . . . . . . . . . . . . . . . 69 8.6 Tests of significance of factorial effects . . . . . . . . . . . . . . . . . 71 8.7 Yates’ method of computing factorial effect totals . . . . . . . . . . . 74 9 Multiple Comparisons 77 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 9.2 Multiple comparisons procedures . . . . . . . . . . . . . . . . . . . . 78 9.2.1 Duncan’s new multiple range-test . . . . . . . . . . . . . . . . 78 10 Simple Linear Regression and Correlation 85 10.1 Simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 85 10.1.1 Fitting a simple linear regression model-the method of least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
  • 6. iv TABLE OF CONTENTS 10.1.2 Assessing the fitted regression . . . . . . . . . . . . . . . . . . 87 10.1.3 Confidence intervals for regression parameters . . . . . . . . . 93 10.2 Correlation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 10.2.1 Karl Pearson’s correlation coefficient (r) (ref: MTH 106) . . . 102 10.2.2 Spearman’s coefficient of Rank correlation . . . . . . . . . . . 104 11 Data Transformation 109 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 11.2 Parameters of normal distribution . . . . . . . . . . . . . . . . . . . 109 11.2.1 Shape of the normal distribution . . . . . . . . . . . . . . . . 110 11.3 Reasons for data transformation . . . . . . . . . . . . . . . . . . . . 110 11.4 Testing for normality . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.5 Common data transformations . . . . . . . . . . . . . . . . . . . . . 111 12 Analysis of Frequency Data 115 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 12.2 Objective of two-way classification . . . . . . . . . . . . . . . . . . . 115 12.3 The Chi-square test of independence . . . . . . . . . . . . . . . . . . 117 13 Review Exercises 125 13.1 Exercise I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 13.2 Exercise II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 13.3 Exercise III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 13.4 Exercise IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 13.5 Exercise V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 13.6 Ecercise VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 13.7 Exercise VII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 13.8 Exercise VIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 13.9 Exercise IX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
  • 7. 0.1. PREFACE v 0.1 Preface 0.1.1 Course Objective Focuses on the use of statistical/mathematical techniques to problems in agricul- tural, environmental, and biological sciences. It is concerned with the design of experiments, analysis and interpretation of results. 0.1.2 Course Description Principles of experimental designs, analysis of variance (ANOVA): one way classifica- tion, e.g., completely randomised design (balanced and unbalanced data), multiway classification, e.g., randomised complete block design, Latin square design; factorial experiments. Multiple comparisons, data transformation; simple linear regression and correlation, analysis of frequency data e.g., contingency tables. 0.1.3 Pre-requisite MTH 106-Introductory Statistics. 0.1.4 Course requirement I: Coursework- 2 quizes and 2 tests: contribute 40% of the total credits allotted to this course. II: Final (End of Semester) Exam: contribute 60% 0.1.5 Computing Where necessary, for illustration purposes, the Statistical Package for the Social Sciences (SPSS) and the SAS software packages will be frequently used. However, use of SPSS or SAS in the course is considered optional. 0.1.6 References Cody, R.P. and Smith, J.K. (1997). Applied Statistics and the SAS Programming Language. Fourth Ed. Prentice Hall. Der, G. and Everitt, B.S.(2002). A Handbook of Statistical Analyses using SAS. Second Ed. Chapman & Hall/CRC. Montgomery, D. (2001). Introduction to Linear Regression Analysis. Wiley and Sons, Inc. Montgomery, D. (2001). Design and Analysis of Experiments. Wiley and Sons, Inc. Neter, J., Kutner, M., Nachetsheim, C. J. and Wasserman, W. (1996). Applied Linear Statistical Models. Irwin, Chicago.
  • 8. Chapter 1 Terminologies in Experimental Designs 1.1 Introduction Real-life scientific investigations often involve comparisons between several sets of data collected from basically similar conditions, e.g., groups of plants of the same type which have been grown under conditions alike except that different fertilizers were used for each group, different doses of a drug administered to the same or different groups of patients, different varieties of rations given to a group of homo- geneous animals, same instructor teaching students with different background about the subject being taught, etc. Many types of biological data are collected through planned (well designed) experi- ments. Designing an experiment requires adherence to some rules or principles and procedures if valid conclusions are to be drawn. For example, the data from an ex- periment set up according to a particular design should be analysed according to the appropriate procedure for that design. No type of statistical method, no matter how sophisticated, can compensate for a poorly designed study or improve the quality of results obtained from an improperly designed experiment. Thus, an important aspect in this respect is that design of experiment determines the quality of the results! Before we embark on the contents of the course, let us first, briefly distin- guish between biometry and some related specializations/fields within the statistics domain. 1.1.1 Definitions Biometry. As alluded to above, biometry: is a subject that is concerned with the application of statistics and matehematics to problems in the agricultural, environ- mental, and biological sciences. Hence, biometrics: is the application of statistics and mathematics to problems with a biological component, including the problems in agricultural, environmental, and biological sciences as well as medical science. 1
  • 9. 2 CHAPTER 1. TERMINOLOGIES IN EXPERIMENTAL DESIGNS These include statistical methods, computational biology, applied mathematics, and mathematical modeling. Biostatistics: is a field of study that is concerned with the application of statis- tics to the biological sciences, especially those relating to medical sciences. Med- ical colleges/universities (for example, Muhimbili University of Health and Allied Sciences-MUHAS, International Medical and Technological University-IMTU, Kili- manjaro Christian Medical College-KCMC, Catholic University of Health and Allied Sciences - CUHAS, and so on) often have biostatistics as one of the core courses to students enrolled in various degree programmes with a major in medical sciences. Described below are key terminologies in the notion of experimental designs. Experiment: is an investigation set up to provide answers to a question or ques- tions of interest. For example, we may wish to conduct an experiment to test the efficacy of a certain newly developed drug for curing a certain skin condition in hu- mans or aminals. We may also conduct an experiemnt to invest whether or not three varieties of feeds give same are different in terms of amount of milk produced per day. In this context, an experiment is more likely to involve comparison of treatments (defined below) e.g., drugs, rations, methods, varieties, fertilizers, etc. However, in some cases experiments do not involve comparison of one treatment with another treatment. Hence, experiments can be absolute or comparative. If we conduct an experiment to examine the usefulness of a newly developed drug for curing a certain skin condition in animals without comparing its effect with other drugs, the experiment will be an absolute experiment. On the other hand, if we conduct an experiment to assess the effectiveness of one drug as compared to the effects of other drugs, the experiment is said to be a comparative experiment. Experimental design or designing of an experiment: a design is a plan for obtaining relevant information to answer the research question of interest. In other words, we define designing of an experiment as the compete sequence of steps laid down in advance to ensure that maximum amount of information relevant to the problem under investigation will be collected. Treatment or treatment combination: procedure whose effect is to be mea- sured and compared with other procedures. For example, in a dietary or medical experiment, the different diets or medicines are the treatments, in an agricultural experiment, the different varieties of a crop or the different fertilizers will be the treatments. Experimental unit: the unit of experimental material to which one application of the treatment is applied and on which the variable under study is measured or an experimental unit is that unit to which a single treatment (which may be a combi-
  • 10. 1.1. INTRODUCTION 3 nation of many factors as in factorial experiments) is applied in one replication of the basic experiment. Examples In an agricultural experiment, the plot of land will be the experimental unit; in a dietary experiment the whole animal is the experimental unit, in medical experi- ments for which treatments (or medications) are assigned to individuals and effects measured, the individual is the experimental units. Response (yield/outcome): is a result observed for a particular experimental unit. Examples One may be interested to know the amount of a crop (in kg) produced when different types of fertilizers are applied to a piece of land, or number of students who pass MTH 201 when different instructors are used for each degree programme taking the course, or the amount of milk (in litres) that will be produced when different types of feeds are used to a group of supposedly homogeneous cows, or number of customers who will visit a particular supermarket in Dar es Salaam when different marketing strategies are used by the company operating the supermarket. Exercise In the agricultural field experiment of assessing the effects of different varieties of fertilizers on crop production described above to illustrate the notion of response identify: i. the experimental unit; ii. the treatments; and iii. the response or yield or outcome. Factor: Is a variable, which is believed to affect the outcome of an experiment e.g. humidity, pressure, time, concentration, etc. Level: the various values or classifications of the factors are known as the levels of the factor (s). For example, suppose we wish to compare the efficacy of three med- ications (M1, M2, and M3) for lowering blood pressure among middle aged women, thus, there are three levels of the factor blood pressure. Assume also that a es- earcher is interested in comparing four different doses (D1, D2, D3 and D4) of a drug administered to rats of the same type; here there are four levels of the factor drug. Experimental error: is a measure of the variation among experimental units that measures mainly inherent variation among them. Thus, experimental error is a technical term and does not mean a mistake, but includes all types of extraneous variation due to:
  • 11. 4 CHAPTER 1. TERMINOLOGIES IN EXPERIMENTAL DESIGNS i. inherent variability in the experimental units; ii. error associated with the measurements made; and iii. lack of representativeness of the sample to the population under study. Therefore, based on the above reasons particularly the first one, we cannot completely control experimental error, but we can always think of how to reduce it. Variations among experimental units sometimes cannot be avoided in practice, some variations are controllable, and some are beyond the control of the experimenter. If we can control the magnitude of experimental error we would be in a better position to detect differences among treatments if really exists. Exercises 1 Suppose the following experiment is conducted, with the aim of comparing three feeds (I, II, II) in cows. Three cows are obtained. One cow is given feed I, another feed II and the last cow feed III. 300 observations are taken on each cow. i. What is the experimental unit? ii. What are the treatments? iii. How many replicates per treatment? (to be answered later) 2 An experiment is to be undertaken to compare growth patterns obtained in mice given three different types of drug. The drug may be administered orally or by injection. 72 identical mice are available for study. Two different experimental plans are proposed: (i) The 72 mice are to be allocated to 12 cages, 6 mice per cage. Each cage is assigned at random to one of the three drugs, 4 cages per drug. For each cage, the drug is administered to the animals within the cage by mixing it into the daily shared food supply for the 6 mice. (ii) The 72 cages are to be allocated to 12 cages, 6 mice per cage. Within each cage, each mouse is assigned to receive one of the drugs by daily injection, 2 mice per drug in each cage. i. What are the treatments under investigation? ii. In each of plans (i) and (ii), identify the experimental units.
  • 12. Chapter 2 Principles of experimental designs 2.1 Introduction Designing an experiment to obtain relevant data in a way that permits subjective analysis leading to valid inferences/conclusions with respect to the problem(s) un- der investigation is often a challenging step in practice. Correctly identifying the relevant experimental units, their size or number, and the way the treatments are assigned to the experimental units are some of the most important aspects of design of experiments. In this section we describe the principles that depending on the design chosen must be adhered to when planning an experiment to answer a specific problem. There are three main principles of experimental designs, namely: i. Randomisation; ii. Replication; and iii. Error/local control 2.2 Randomization principle Randomisation is an essential component/principle in experimental design. Ran- domisation involves the assignment of treatments to the experimental units, based on the chosen design, by some chance mechanism or probabilistic procedures, e.g., random numbers, so that each experimental unit has the same chance of receiving any one of the treatments, which are under study. Conscious allocation of the treat- ments to the experimental units has been criticised by many researchers, in fact results from studies which had not allocated treatments to the experimental units at random have left useless and thus contributed nothing to the literature avail- able to date. Briefly speaking, randomization is the use of a known, understood probabilistic procedure for the assignment of treatments to experimental units. 5
  • 13. 6 CHAPTER 2. PRINCIPLES OF EXPERIMENTAL DESIGNS As we will discuss later in the course, randomisation been an important principle of experimental designs is utilised in all designs that we will discuss in this course. Question Why do we really need to adhere to this principle? Recap: as we explained in chapter 3, treatment is a procedure whose effect is to be measured and compared with other procedures. Goal: Based on the fact that our intention is to measure and compare effects of one treatment in comparison to another treatment (s), thus, one obvious goal of randomisation is to ensure that no treatment is somehow favoured or handicapped. Randomisation ensures that observations represent random samples (independence of observations) from population of interest. This insures validity of statistical methods leading to valid conclusions/inferences. Illustration The following example illustrates the importance of randomisation. A study is to be conducted to compare the efficacies of two drugs (I and II) for treating a certain skin condition. It is decided that patients will be assigned to drug I if they have had previous outbreaks of the condition and to drug II if the current outbreak of the condition is the first for the patient. Comment on this experimental design. If you feel that the design has drawbacks, state how you would improve it. From our discussion above, clearly this design lacks an important ingredient-randomisation. The design had allocated the drugs depending on whether the patient has had a pre- vious outbreak of the condition. This is not a proper way of assigning treatments to experimental units. It may be, for example, that patient with first-time outbreaks are more or less difficult to treat than repeat outbreaks. This may put one drug or the other at a disadvantage in evaluation of efficacy. How to improve? A better design would be one that assigns patients at random to the drugs regardless of outbreak status. An even better design would be one which assigns patients with first-time outbreaks to each drug randomly, and similarly for patients with repeat outbreaks, so that each drug is seen on patients of each type. 2.3 Replication principle The term replication refers to the number of experimental units on each treatment. A treatment is said to be replicated if it is applied to more than one experimental unit. Literally speaking, replication means the number of times a treatment appears on experimental units.
  • 14. 2.4. LOCAL CONTROL PRINCIPLE 7 Question What do we replicate and why? The first part of this question is answered above. We replicate treatments to experimental units. Perhaps of most interest here at least in my views is the question of why do we need to replicate the treatments. We repeat a treatment a number of times in order to obtain more reliable estimate than is possible from a single observation. If you can recall our discussion of statis- tical inference in MTH 106, we mentioned that the sample size n is a key factor in determining precision and power. This is the case because if we increase the sample size, we decrease the magnitude of s ¯D which is a measure of how precisely we have can estimate the difference and determine the size of our test statistic (and thus power of the test). In the context of experimental design, the number of replicates per treatment is also a key factor in determining precision and power. Example Suppose an experiment is conducted with the goal of comparing two diets in weight in sheep. Two sheep are available for experimentation. One sheep is given diet A, the other; diet B. 400 observations are taken on each sheep. In this example very little can be learned about how the treatments compare in the population of such sheep. We have only one sheep on each treatment (diet); thus, we do not know if observed differences we might see are due to a real difference in the treatments we are comparing or just the fact that these two sheep are quite different. This is perhaps a contrived example, but it illustrates a general point of why replication is an important principle in experimental designs. A practical advice If we have a fixed number of experimental units available for experimentation, an obvious challenge is how to make the best use of them for detecting treatment dif- ferences. In this situation of limited resources we would be better off with a few treatments with lots of replicates on each treatment rather than many treatments with fewer replicates on each. Thus, if limited resources are available, it is better to reduce the number of treat- ments to be considered or postpone the experiment rather than to proceed with too few replicates. So randomisation plus replication will be necessary for the validity of the experiment. Exercise In your own words explain why you think replication is an important concept to keep in mind when designing an experiment. 2.4 Local control principle Experimental design is founded on the principle of reducing what we regard as experimental error by meaningful grouping of experimental units into small non-
  • 15. 8 CHAPTER 2. PRINCIPLES OF EXPERIMENTAL DESIGNS overlapping units. As we discussed above, we cannot eliminate inherent variability completely but if we try to be careful enough about what we consider to be inherent variability we should be in a position to separate systematic variation among exper- imental units from inherent variation and hence arrive at the stated goal (s) of the experiment. Local control are techniques for reducing the error variance. One such measure is to make experimental units homogeneous, i.e. to form units into several homogeneous groups called blocks. This is done particularly in situations where the experimen- tal units are assumed to be non-homogeneous. Thus, to reduce the magnitude of experimental error one needs to group the experimental units. It should be understood that in order to detect treatment differences if they really, we must strive to control the effects of experimental error, so that any variation we observe can be attributed mainly to the effects of the treatments we are comparing rather than to differences among the experimental units to which the treatment are applied. Summary: From what we have discussed so far, it is clear that a good experimental design attempts to: i. ensure sufficient replication of treatments to experimental units; and ii. reduce the effects of experimental error by meaningful grouping of experimental units –application of local/error control.
  • 16. Chapter 3 Analysis of Variance 3.1 Introduction Among the most extremely useful statistical procedures in the fields of agriculture, economics, psychology, education, sociology, business/industry and in researches of several other disciplines is the analysis of variance. This technique is particularly used when multiple sample cases are involved. Recap: Tests of significance discussed in MTH 106 between the means of two sam- ples can easily be judged through either the standard normal distribution, z-test or the student’s t- test. Just to remind you one of the popular t-test is the two sample pooled t- test used when the two unknown population variances are assumed to be equal. Problem: When there are more than two samples, performing all possible pairwise comparisons especially if n is large becomes a wearying exercise. The analysis of variance technique enables us to perform this simultaneous test. Using this tech- nique one can draw inferences about whether the samples have been drawn from populations having the same mean. Example Comparison of yields of a certain crop from several varieties of seeds, the smoking habits of six groups of SUA students and so on. If we are to use either the z or t-tests, one need to consider all possible combinations of two varieties of seeds at a time and also two groups of students. This would take some time before one arrives at a decision. In such circumstances, one quite often utilizes the analysis technique and through it investigates the differences among the means of all the populations simultaneously. Acronym: The popular acronym for ANalysis Of VAriance is ANOVA. 9
  • 17. 10 CHAPTER 3. ANALYSIS OF VARIANCE Definition: Montgomery (2001) defines ANOVA, as a procedure for testing the difference among different groups of data for homogeneity. Target: To partition the total amount of variation in a set of data into two compo- nents: i. The amount which can be attributed to chance; and ii. The amount, which can be attributed to specified causes If we take only one factor and investigate differences amongst its various categories having numerous possible values, we are said to use one-way ANOVA and in case we investigate two factors at the same time, then we use two-way ANOVA. 3.2 Assumptions in the analysis of variance When one employs the ANOVA technique has to be satisfied that the basic assump- tions underlying the technique are fulfilled if he/she is to give valid inferences. There are three basic assumptions underlying this approach: i. The observations, and hence the errors, are normally distributed ii. All observations both across and within samples, are unrelated (independent) iii. The observations have the same variance σ2 Important: The assumptions stated above are not necessarily true for any given situation. In fact, they are probably never exactly true. For many data sets in practice, they may be a reasonable approximation in which case the results will be fairly reliable. In other cases, they may be badly violated; in this case, the resulting conclusions may not be valid or may be misleading. If the data really are not normal, hypothesis test may be imperfect, leading to invalid inferences. Strategy: In some data it is possible to get around these issues somewhat. One of the most commonly used approaches to deal with the problem of non-normality of data is the so called data transformation, an aspect which with be dealt with later in the course. Important: For the reminder of our discussion of analysis of variance in this and subsequent chapters, we will assume that the above assumptions are reasonable either
  • 18. 3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 11 on the original scale of measurement of the data or transformed scale. Keep this in mind at all times that these are just assumptions, and must be confirmed before the methods may be considered suitable. 3.3 Analysis of variance for one-way classification Under the one-way (or single factor) ANOVA, we randomly obtain the experimen- tal units for the experiment and randomly assign them to the treatments so that each experimental unit is observed under one of the treatments. In this situation, the only way in which experimental units may be classified is with respect to which treatment they received. Basically, the experimental units are viewed alike in this experiment. Thus, when experimental units are thought to be alike and are thus expected to exhibit a small amount of variation from unit to unit, grouping then would be pointless in the sense that doing so would not add much precision to an experiment. It can be shown that the total variation in the observed responses can be subdivided into two components: i. Due to the differences in the level of factor (say A) ii. The residual variation (error term) 3.3.1 Analysis of variance for one-way classification with unequal replication (unbalanced data) We will first consider the case where unequal number of replication of the treatments to the experimental units is observed. Notation: To facilitate the development of methods that we will require in our discussion, we will change slightly our notation of sample mean we discussed in MTH 106. As we will see shortly, we will be dealing with several different types of means for the data. Let t denote treatment and k the number (levels) of treatments. Let also Yij be the response of the jth experimental unit receiving the ith treatment level We will also denote the sample mean for treatment i (mean of all plots receiving treatment i) by Y i. = 1 ni ni j=1 Yij Also we define Y .. = k i=1 ni j=1 yij k i=1 ni or Y.. = G N as the grand mean yield (sample mean
  • 19. 12 CHAPTER 3. ANALYSIS OF VARIANCE of all the data) in the whole experiment. Note that because we have unequal replications, the total number of observations is k i=1 ni = N 3.3.2 Linear additive model for one-way classification For a one way classification with unequal replication we may classify an individual observation as being on the jth experimental unit in the ith treatment level as: Yij = µ + ti + eij, i=1, 2,. . . k, j=1, 2,. . . ni Where: µ = the general mean effect ti = the effect of level i or the ith treatment effect eij = the error term Yij as defined above Remark: In our discussion we will consider only cases where a single observation of response is made on each experimental unit; however, it is common practice to take more than one observation on an experimental unit. 3.3.3 Fixed vs. random effects In the above model for one way classification with unequal replication, ti represents the ith treatment effect. However, interpretation of timay differ depending on the situation. To better understand the notions of fixed and random effects, consider the following examples. Example 1 Suppose there are three varieties of wheat for which mean yields are to be compared. Here, we are interesting in comparing 3 specific treatments. If we repeated the ex- periment again, these 3 varieties of wheat would always constitute the treatments of interest. Example 2 Suppose a factory operates a large number of machines to produce a product and wishes to determine whether the mean yield of these machines differs. It is unfea- sible for the company to keep track for all of the many machines it operates, so a random sample of 5 such machines is selected, and observations on yield are made on these 5 machines. The hope is that the results for the 5 machines involved in the experiment may be generalized to gain insight into the behaviour of all the machines. In the first example, there is a particular set of treatments of interest. If we started the experiment next week instead of this week, we would still be interested in this
  • 20. 3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 13 same particular set of treatments. It would not vary across other possible experi- ments we might do. In the second example, the treatments are the 5 machines from all machines at the company, chosen at random. If we started the experiment next week instead of this week, we might end up with a different set of 5 machines with which to do the experiment. Here interest focuses on the population of all machines operated by the company. The question of interest is not about the particular treatments in the experiment, but the population of all such treatments. We thus make the following distinction in our model: In the case like example 1, the ti are best regarded as fixed quantities, as they de- scribe a particular set of conditions. Thus, ti are referred to as fixed effects In a case like example 2, the ti are best regarded as random variables. Here the par- ticular treatments in the experiment may be thought of as drawn from a population of all such treatments, so there is a chance involved. In this situation, the ti are referred to as random effects. 3.3.4 Calculation of sums of squares As we described above, the fundamental nature of the ANOVA is that the total amount of variation in a set of data is broken down into two components, that amount which can be attributed to chance and that amount which can be attributed to specified causes. Thus, based on the above linear additive model we partition the total variation in the data as: Total variation = Variation due to factor A (treatment) + Residual/Error term or Total sum of squares = Sum of squares due to factor A + Sum of squares due to error In short we have, SST = SSA + SSE Algebraic facts show that the total sum of squares (SST) can be partitioned as: k i=1 ni j=1 Yij − Y.. 2 = k i=1 ni Y i. − Y.. 2 + k i=1 ni j= Yij − Y i. 2
  • 21. 14 CHAPTER 3. ANALYSIS OF VARIANCE Y i.and Y .. as defined in Section 3.3.1 Thus, SST= k i=1 ni j=1 Yij − Y.. 2 , SSA= k i=1 ni ¯Yi. − ¯Y.. 2 and SSE= k i=1 ni j=1 Yij − ¯Yi. 2 For calculation we express the SSs as follows: Define C.F=Correction factor = k i=1 ni j=1 Yij 2 k i=1 ni = G2 N Here, G is the grand total= k i=1 ni j=1 Yij and N as defined in Section 2.3.1 It can be shown that: SST= k i=1 ni j= Y 2 ij − k i=1 ni j=1 Yij 2 k i=1 ni or k i=1 ni j= Y 2 ij − G2 N or k i=1 ni j= Y 2 ij − C.F Treatment SS or Factor A SS= k i=1 Y 2 i ni − C.F Where: Yi = ni j=1 Yijis the total yield of all the njplots which carried treatment i Error SS (SSE) =SST-SSA Since we have k levels of factor (A) or treatment then SSA will have k-1 independent comparisons possible (degrees of freedom). Similarly SST will have N-1 independent comparisons (degrees of freedom), and SSE will have (N-1)-(k-1) =N−k independent comparisons (degrees of freedom). We summarize the computations in a table known as the ANOVA table. Table 3.1: One way ANOVA table with unequal replication Source of Degrees of Sum of Mean square variation (S.V) freedom (D.F) squares (S.S) (M.S) F- ratio Between treatments k-1 SSA SSA K−1 = MSA MSA MSE Error(within treat.) N − k SSE SSE N−k = MSE Total N-1 SST The calculated F-value MSA MSE is compared with the F-tabulated value (Fα, [(k − 1) , (N − k)]) at α level of significance for k-1 and N −k degrees of freedom.
  • 22. 3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 15 Statistical Hypotheses The question of interest in this setting is to determine if the means of the different treatment populations are different. Mathematically we write: Ho : µ1 = µ2 = ... = µk That is, the µi are all equal H1 : µi = µj for at least one i = j That is, the µi are not all equal Or simply Ho :There is no variation among the treatments H1 :Variation exists Test procedure At level of significance α, if F> Fα, [(k − 1) , (N − k)] then there is evidence for no significance variation (i.e. we reject the null hypothesis). Note that the alternative hypothesis stated above does not specify the way in which the treatment means (or deviation) differ. The best we can say based on our statistic is that they differ somehow. Example Four Machines are used for filling plastic bottles with a net volume of 16.0 cm3. The quality-engineering department suspects that both machines fill to the same net volume whether or not this volume is 16.0 cm3. A random sample is taken from the output of each machine. Table 3.2: Machine data set Machines A B C D 16.03 16.01 16.02 16.03 16.04 15.99 15.97 16.04 15.96 16.03 15.96 16.00 16.05 15.05 16.02 16.04 Total 64.08 48.03 79.04 64.09 Assume that the measurements are approximately normally distributed, with ap- proximately constant variance σ2. Do you think the quality-engineering department is correct? Use α = 0.05 Statistical hypotheses: Ho :There is no significant variation among the levels of machines H1 :Variation exists or Ho : µ1 = µ2 = µ3 = µ4 (all means are equal)
  • 23. 16 CHAPTER 3. ANALYSIS OF VARIANCE H1 : µi = µj for at least one i = j (the means are not all equal) Calculation Here we have 4 treatment levels (A, B, C, D). N = k i=1 ni = n1 + n2 + n3 + n4 = 4 + 3 + 5 + 4=16 Grand Total (G) = 4 i=1 ni j=1 Yij=16.03+16.04+. . . +16.00+16.02=255.24 Thus, C.F = G2 N = (255.24)2 16 =4071.7161 Uncorrected total SS = 4 i=1 ni j=1 Y 2 ij = (16.03)2 + (16.04)2 + ... + (16.00)2 + (16.02)2 =4072.5976 Total Sum of Squares (SST) = 4 i=1 ni j=1 Y 2 ij − C.F=4072.5976-4071.7161=0.8815 Totals: A=64.08, B=48.03, C=79.04, D=64.09 Machine (treatment) sum of squares (SSM): SSM= 1 ni k i=1 Y 2 i − C.F= 1 4(64.08)2 + 1 3(48.03)2 + 1 5(79.04)2 + 1 4(64.09)2 − 4071.7161 =4071.868245-4071.7161=0.152145 Error sum of squares (SSE) =Total SS-Treatment (Machine) SS or SST-SSM=0.8815-0.152145=0.729355 We also have k-1=4-1, N − k =16-4=12, so that: Treatment (machine) MS=0.152145 5 =0.051, Error MS=0.729355 12 =0.061, F = 0.051 0.061 =0.83
  • 24. 3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 17 We summarize the computation in an analysis of variance table: Table 3.3: ANOVA Table Source of variation DF SS MS F- ratio Between treatments (machines) 3 0.152145 0.051 0.83 Error(within treatments) 12 0.729355 0.061 Total 15 0.881500 To perform the hypothesis test for differences among the means of machines, we compare the F-calculated value (0.83) from the appropriate value from the F table. For level of significance α = 0.05, we have F0.05; 3, 12=3.49. Since 0.83 does not exceed F0.05; 3, 12=3.49. We thus do not reject Ho and hence conclude that the quality-engineering department is correct, that is, no significant variations among the machines. In other words, all machines fill to the same net volume. In SAS SAS Program data notes; input Machines $ Volume @@; cards; A 16.03 A 16.04 A 15.96 A 16.05 B 16.01 B 15.99 B 16.03 C 16.02 C 15.97 C 15.96 C 15.05 C 16.04 D 16.03 D 16.04 D 16.00 D 16.02 ; run; proc print; run; quit; proc anova; class Machines; model Volume=Machines; run; quit; %newpage Selected SAS outputs %begin{verbatim} Obs Machines Volume 1 A 16.03 2 A 16.04 3 A 15.96 4 A 16.05 5 B 16.01 6 B 15.99 7 B 16.03 8 C 16.02 9 C 15.97 10 C 15.96 11 C 15.05 12 C 16.04 13 D 16.03 14 D 16.04 15 D 16.00
  • 25. 18 CHAPTER 3. ANALYSIS OF VARIANCE 16 D 16.02 The ANOVA Procedure Dependent Variable: Volume Sum of Source DF Squares Mean Square F Value Pr > F Model 3 0.15214500 0.05071500 0.83 0.5004 Error 12 0.72935500 0.06077958 Corrected Total 15 0.88150000 R-Square Coeff Var Root MSE Volume Mean 0.172598 1.545433 0.246535 15.95250 Source DF Anova SS Mean Square F Value Pr > F Machines 3 0.15214500 0.05071500 0.83 0.5004 Exercise The following data comes from an experiment conducted to investigate the effect of 4 diets on weight gain in pigs. 19 pigs were randomly selected and assigned at random to one of the 4 diet regimes. The data are the body weights of the pigs, in pounds, after having been raised on the diets. Diet 1 Diet 2 Diet 3 Diet 4 133.8 151.2 225.8 193.4 125.3 149.0 224.6 185.3 143.1 162.7 220.4 182.8 128.9 145.8 212.3 188.5 135.7 153.5 198.6 Assume that the measurements are approximately normally distributed, with con- stant variance: Is there any evidence in these data to suggest that the mean weights are different under the different diets? Use α = 0.05. Compare your ANOVA table with the one below from SAS. The ANOVA Procedure Dependent Variable: BWeight Sum of Source DF Squares Mean Square F Value Pr > F Model 3 20461.40576 6820.46859 164.38 <.0001 Error 15 622.39950 41.49330 Corrected Total 18 21083.80526 R-Square Coeff Var Root MSE BWeight Mean 0.970480 3.753460 6.441529 171.6158 Source DF Anova SS Mean Square F Value Pr > F Diets 3 20461.40576 6820.46859 164.38 <.0001
  • 26. 3.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION 19 3.3.5 ANOVA for one-way classification with equal replication (bal- anced data) In the above exercise the diets 1, 2, and 4 are each replicated 5 times while diet 3 is replicated 4 times. In this case as we have discussed above, the sample mean for treatment i is Y i. = 1 ni ni j=1 Yij. We now discuss the case where ni = n for all i, i=1, 2,. . . , k Since each treatment is replicated the same number of time (say n), then the total number of observations, N=nk. Thus, with this new notation, we define the quantities Y i.,Y .., Total SS, Treatment SS, and Error SS and their degrees of freedom as follows: Y i. = 1 n n j=1 Yij, Y .. = k i=1 n j=1 yij nk orY.. = G N , Total SS (SST) = k i=1 n j=1 Y 2 ij − G2 N , Treatment SS or Factor A SS= k i=1 Y 2 i n − C.F where C.F = k i=1 n j=1 Yij 2 nk = G2 N G = the grand total= k i=1 n j=1 Yij The degrees of freedom for Treatment SS, Error SS and Total SS, are respectively (k-1), (N − k) or (nk-k) or k(n-1) and (N-1) or (nk-1). Table 3.4: One way ANOVA table with equal replication Source of DF SS MS F- ratio variation Between treat. k-1 SSA SSA k−1 = MSA MSA MSE Error (within treat.) k(n-1) SSE SSE k(n−1) = MSE Total N-1 SST The calculated F-value MSA MSE is compared with the F-tabulated value (Fα, [(k − 1) , (k(n − 1))]) at α level of significance for k-1 and k(n-1) degrees of free- dom. Statistical Hypotheses as given above Test procedure At level of significanceα, if F> Fα, [(k − 1) , (k(n − 1))] then there is evidence for no significance variation (i.e. we reject the null hypothesis). Example The following data record the length of pea sections, in ocular units (×0.114 mm),
  • 27. 20 CHAPTER 3. ANALYSIS OF VARIANCE grown in tissue culture with auxin present. The purpose of the experiment was to test the effects of the addition of various sugars on growth as measured by length. Pea plants were randomly assigned to one of 5 treatment groups: control (no sugar added), 2% glucose added, 2% fructose added, 1% glucose + 1% fructose added, and 2% sucrose added. 10 observations were obtained for each group of plants. Control 2% glucose 2% fructose 1% fructose 2% sucrose 1 75 57 58 58 62 2 67 58 61 59 66 3 70 60 56 58 65 4 75 59 58 61 63 5 65 62 57 57 64 6 71 60 56 56 62 7 67 60 61 58 65 8 67 57 60 57 65 9 76 59 57 57 62 10 68 61 58 59 67 Total 701 593 582 580 641 We assume that the measurements are approximately normally distributed, with the same variance σ2. Use α = 0.05 and perform the relevant hypothesis test to these data. Calculations show that (check): C.F= k i=1 n j=1 Yij 2 nk = (701+...+641)2 10×5 = (3097)2 50 = 191828.18 k i=1 n j=1 Y 2 ij = 752 + 672 + ... + 672 = 193151.00 Thus, Total SS= k i=1 n j=1 Y 2 ij − G2 N =193151.00-191828.18=1322.82 Treatment SS= k i=1 Y 2 i n − C.F = (7012+...+6412 ) 10 − 191828.18=192905.50-191828.18 =1077.32 Error SS= Total SS-Treatment SS=1322.82-1077.32=245.50 We also have (k-1) =5-1=4, k(n-1) =5(10-1)=45 so that Treatment MS=1077.32 4 = 269.33, Error MS=245.50 45 = 5.46, F = 269.33 5.46 = 49.31 We summarize the computations in an analysis of variance table:
  • 28. 3.4. ANOVA FOR TWO-WAY CLASSIFICATION (WITHOUT REPLICATION)21 Table 3.5: ANOVA table-Pea section data Source of variation DF SS MS F- ratio Between treatments 4 1077.32 269.33 49.33 Error (within treatments) 45 245.50 5.46 Total 49 1322.82 F0.05; 4, 45 = 2.61 Comparing the calculated F value (49.33) with the F value from F table (2.61) at 0.05 level of significance we see that 49.33 >F0.05; 4, 45=2.61. We thus reject H0. There is evidence in these data to suggest that the mean lengths of pea sections are different depending upon which sugar was added. 3.4 ANOVA for two-way classification (Without Repli- cation) As the name suggests, two-way classification means the data are classified on the basis of two factors. Thus, two-way ANOVA technique is used when the data are classified on the basis of two factors. Suppose the two factors are A and B which have h and g levels respectively in an experiment without replication. Using the ANOVA technique we can partition the variation of the response about their mean into three different components. 3.4.1 Linear additive model for two-way classification For two-way classification without replication, let Yij be the response for the ith level of factor A and jth level of factor B. Thus, the model can be written as: Yij = µ + ti + bj + eij, i=1, 2, . . . , h; j=1, 2, . . . g Where: µ is the overall mean ti is the effect of level i for factor A bj is the effect of level j for factor B and eij is the residual (error term). The ANOVA technique allows us to partition the total SS as: Total SS = Factor A SS + Factor B SS + Residual SS or simply, SST= SSA+SSB + SSE As in one-way classification, the short methods of computing sum of squares are given as follows:
  • 29. 22 CHAPTER 3. ANALYSIS OF VARIANCE Let N (=hg) be the total number of experimental observations Let G = the sum of yields over all the N (=hg) plots. So that G = h i=1 g j= Yij, Correction factor (C.F) =G2 N = h i=1 g j=1 Yij 2 hg Total SS (SST) = h i=1 g j=1 Y 2 ij − C.F Factor A SS (SSA) =1 g h i=1 Y 2 i − C.F where Yi = g j=1 Yij is the total yield of all the g plots which carried treatment i. Factor B SS (SSB) =1 h g j=1 Y 2 j − C.F where Yj = h i=1 Yij is the total yield of all the h plots which carried treatment j. Error SS (SSE) = SST – (SSA +SSB)= SST –SSA- SSB Table 3.6: ANOVA table for two-way classification Source of variation DF SS MS F-ratio Factor A h-1 SSA SSA h−1 = MSA MSA MSE Factor B g-1 SSB SSB g−1 = MSB MSB MSE Residual (h-1)(g-1) SSE SSE (h−1)(g−1) = MSE Total N-1 SST Statistical hypotheses: Factor A: Ho: t1 = t2=. . . =th H1: ti =tj for at least one i = j Factor B: Ho: b1=b2=. . . =bg H1: bi =bj for at least one i = j Test procedure Reject Ho for factor A, if the calculated F-value MSA MSE > the tabulated F-value Fα, [(h − 1) , (h − 1)(g − 1)] at α-level of significance. Otherwise, we do not reject Ho. Similarly, reject Ho for factor B, if the calculated F-value MSB MSE > the tabulated F-value Fα, [(g − 1) , (h − 1)(g − 1)] at α-level of significance. Otherwise, we do not reject Ho
  • 30. 3.4. ANOVA FOR TWO-WAY CLASSIFICATION (WITHOUT REPLICATION)23 Example Three different methods of analysis M1, M2, and M3 are used to determine in parts per million the amount of a certain constituent in a sample. Each method is used by five analysts and the results are given below. Analyst 1 2 3 4 5 Total 7.0 6.9 6.8 7.1 6.9 34.7 Method 6.5 6.7 6.5 6.7 6.6 33.0 6.6 6.2 6.4 6.3 6.4 31.9 Total 20.1 19.8 19.7 20.1 19.9 99.6 Do these results indicate a significant variation either between the methods or be- tween analysts? Use α = 0.01 Statistical hypotheses: For analyst Ho: analysts do not differ H1:Analysts differ For method Ho: methods do not differ H1:methods differ Calculation C.F =G2 N = h i=1 g j=1 Yij 2 hg = (99.6)2 15 =661.344 Total SS (SST) = h i=1 g j=1 Y 2 ij − C.F=662.32-661.344=0.976 Analyst SS (SSA) =1 g h i=1 Y 2 i − C.F =1 3 (20.1)2 + (19.8)2 + (19.7)2 + (20.1)2 + (19.9)2 − 661.344 =661.3866667-661.344=0.0426667 Method SS (SSM) =1 h g j=1 Y 2 j − C.F =1 5 (34.7)2 + (33.0)2 + (31.9)2 − 661.344=662.14-661.344=0.796
  • 31. 24 CHAPTER 3. ANALYSIS OF VARIANCE Error SS (SSE): = SST –SSA- SSM=0.976-0.0426667-0.796=0.1373333 Table 3.7: ANOVA table Source of variation DF SS MS F-ratio Analyst 4 0.04267 0.01067 0.620 Method 2 0.79600 0.39800 23.18 Error 8 0.13733 0.01717 Total 14 0.9760 Comparing the F calculated values, (0.62) and (23.18) for analyst and method with the critical F values, (7.01) and (8.65) respectively, we do not reject the null hy- pothesis for analyst while for method the null hypothesis is rejected. Hence, we conclude that there is not enough evidence in these data to suspect that the analysts differ. On the other hand, the data indicates significant differences in methods at the 1% level of significance. In SAS SAS Program data twoway; input Analyst $ Method $ ppm @@; cards; A1 M1 7.0 A1 M2 6.5 A1 M3 6.6 A2 M1 6.9 A2 M2 6.7 A2 M3 6.2 A3 M1 6.8 A3 M2 6.5 A3 M3 6.4 A4 M1 7.1 A4 M2 6.7 A4 M3 6.3 A5 M1 6.9 A5 M2 6.6 A5 M3 6.4 ; run; proc print;run;quit; proc anova; class Analyst Method; model ppm=Analyst Method; run;quit; The SAS System Obs Analyst Method ppm 1 A1 M1 7.0 2 A1 M2 6.5 3 A1 M3 6.6 4 A2 M1 6.9
  • 32. 3.5. THE LEAST SIGNIFICANCE DIFFERENCE (LSD) 25 5 A2 M2 6.7 6 A2 M3 6.2 7 A3 M1 6.8 . . . . . . . . The ANOVA Procedure Dependent Variable: ppm Sum of Source DF Squares Mean Square F Value Pr > F Model 6 0.83866667 0.13977778 8.14 0.0046 Error 8 0.13733333 0.01716667 Corrected Total 14 0.97600000 R-Square Coeff Var Root MSE ppm Mean 0.859290 1.973217 0.131022 6.640000 Source DF Anova SS Mean Square F Value Pr > F Analyst 4 0.04266667 0.01066667 0.62 0.6601 Method 2 0.79600000 0.39800000 23.18 0.0005 3.5 The least significance difference (LSD) If we reject the null hypothesis by the use of the F-test, we can carry out further analyses, i.e., carry out pairwise comparisons of the levels of the factor (s) by the use of t-test. We consider the situation where we have planned in advance of the experiment to make certain comparisons among treatment means. In this case, each comparison is important in its own right, and thus is to be viewed as separate, i.e., cannot be combined. Suppose we have t treatments in the experiment, and we are interested in comparing two treatments 1 and 2, with means µ1 and µ2 respectively. That is, we wish to test the hypotheses: H0 : µ1 = µ2 vs. H1 : µ1 = µ2 Test statistic: As our test statistic for H0 vs. H1, we use: | ¯Y1. − ¯Y2.| s¯Y1.− ¯Y2. , s¯Y1.− ¯Y2. = s 1 r1 + 1 r2 , s = √ MSE That is, instead of basing the estimate of σ2 on only the two treatments in question, we use the estimate from all t treatments in the experiment. Here, r1 and r2 are respectively the replicates of samples 1 and 2.
  • 33. 26 CHAPTER 3. ANALYSIS OF VARIANCE Test procedure: Reject H0 in favour of H1 if | ¯Y1. − ¯Y2.| s¯Y1.− ¯Y2. > tN−t,α/2 Here, N − t are the degrees of freedom for estimating σ2 (experimental error) Note that the test procedure above for testing H0 against H1 may be rewritten as follows: Reject H0 if: | ¯Y1. − ¯Y2.| > s¯Y1.− ¯Y2. × tN−t,α/2, s = √ MSE Terminology: In comparing two treatment means from large experiments involving t treatments, the value s¯Y1.− ¯Y2. × tN−t, α/2 = s 1 r1 + 1 r2 × tN−t,α/2, s = √ MSE is called the least significance difference (LSD) for the test of H0 vs. H1 based on the entire experiment. Thus, from the above expression, we reject H0 in favour of H1 at level α if | ¯Y1. − ¯Y2.| > s¯Y1.− ¯Y2. × tN−t,α/2 The case of equal replication: If all treatments are replicated equally, that is, ri = r the value of the LSD is the same and is given by: s¯Y1.− ¯Y2. = s 2 r , s = √ MSE, LSD = s 2 r × tN−t,α/2 Thus, in case of equal replications, all pairwise comparisons of interest require only a single calculation. Example Consider the pea section data we discussed in Section 5.3.5. In this data we had equal replications (r=10) and 5 treatments (t=5). Suppose it was decided in advance that one investigator was interested in the particular question of whether 2% glucose (treatment 2) differs from control. Let µ1 denote the mean for the control andµ2, µ3, µ4,µ5 denote the means for the sugar treatments, 2% glucose, 2% fructose, 1% fructose and 2% sucrose respectively.
  • 34. 3.5. THE LEAST SIGNIFICANCE DIFFERENCE (LSD) 27 In this situation, we want to test the hypotheses: H02 : µ1 = µ2 vs. H12 : µ1 = µ2 From the information given, we have ¯Y1. = 70.1, ¯Y2. = 59.3, s = √ MSE=2.3357, N − t = 45, tN−t,α/2 = t45,0.025 = 2.01. Thus LSD = s 2 r × tN−t,α/2 = (2.3357) 2/10(2.01) = 2.10 | ¯Y1. − ¯Y2.|=10.8 > 2.10 Conclusion Since | ¯Y2. − ¯Y1.| = 10.8>LSD (= 2.10), we reject H02 at level of significance α =0.05; there is sufficient evidence to suggest that the glucose treatment yields mean pea section lengths different from the control. Exercise 1. Suppose that another investigator was interested in the specific question of whether the 2% fructose (treatment 3) differs from the control. That is, test for: H03 : µ1 = µ3vs. H13 : µ1 = µ3 Use α=0.05 2. Test whether the means of the 2% glucose and 2% fructose differ significantly at 5% level of significance. 3. Calculate 99% confidence limits for the mean of treatment 4 (1% fructose)
  • 35. 28 CHAPTER 3. ANALYSIS OF VARIANCE
  • 36. Chapter 4 Introduction to SPSS 4.1 Introduction SPSS is an extremely useful statistical software package. It provides full statistical analysis capabilities including data management, an analysis tool which embraces both plain and sophisticated but interesting and easy to learn statistical techniques one cannot afford to ignore in the analysis of real-life data! SPSS has historically been applied extensively in the areas of social science, however, these days it is also widely being used in other fields of study. The current version of SPSS is 21. As mentioned, this text of SPSS is not part of MTH 201 course coverage (require- ment) as you have seen in Section 1.1 but is meant to make you understand that all computations of the different theoretical aspects that we have discussed and those still to be discussed, though some have been illustrated using SAS, can also be done in other software packages, SPSS being one of them. Other software packages from which statistical analyses may be carried out include STATA, S-Plus/R. However, S-Plus/R requires good programming knowledge to be able to use it!. Unlike many software packages, SPSS is a user-friendly (easy to use), widely avail- able and well documented such that one can quickly make reference to available and easily accessible citation. These are among the reasons why I have chosen to give you this text! Don’t forget that there is always no free lunch! Like SAS, S-Plus, and STATA, SPSS is not free! You have to pay something to get it! It is important to note that the SPSS statistical analyses presented in this text are specific. That is, does not cover all features available in SPSS but focuses on only few of the many analysis tools that SPSS can offer to the analyst. Consequently, to sharpen your competency in using SPSS especially in carrying out more advanced statistical analyses you are urged to refer to any SPSS Manual. In SPSS, unlike with the other software packages, getting output is relatively easy; however, one needs to be cautious-remember that “there is always no free lunch”. 29
  • 37. 30 CHAPTER 4. INTRODUCTION TO SPSS James Steven, 1996 points out that because it is easy to get output, it is also easy to get “garbage.” Hence, knowing exactly what to focus on in the printout so as to be able to give a practical interpretation of the problem at hand is an important aspect one needs to bear in mind when selecting the output to concentrate on. Throughout all our illustrations in this text it is assumed that the reader will be ac- cessing the data from disk or CD ROM already saved as an SPSS file. Meaning that the data has already undergone through important treatments like editing, coding etc. This is not always the case in practice. Often in practice analysts receive raw data and do the required treatments themselves. Data management (e.g., merging, interleaving or matching files) in SPSS is out of the scope of this text. For those who are interested however, SAS is nicely set up for ease of file manipulation. In this text I will however, briefly describe how data entry is done. It is worth mentioning that coming up with a valid conclusion or answer to a spe- cific scientific question of interest requires not only one’s competency in the software package of analysis but also an understanding of several other facets such as knowing what assumptions are important to check for a given analysis, adequate sample size, and careful selection of variables. SPSS do a wide range of analyses from simple descriptive statistics to various analysis of variance designs and to all kinds of complex multivariate analyses (multivariate analysis of variance –MANOVA-, factor analysis, multiple regression, discriminant analysis, etc.). Multivariate analyses as listed above are complete arenas I do not wish to enter into in this text. I limit myself into only those aspects expected to be covered by the target group(s), i.e., some important SPSS environments or analysis tools. However, I refer any reader interested with both the theoretical and practical treatment of the complex multivariate analyses to the books by Johnson, R.A. and Wichern, D.W. (1998) and Steven, J. (1996). 4.2 Starting SPSS You can start SPSS in two different ways depending on how it is set up on your computer. You can either double-click on the SPSS icon on the desktop of your computer or click on the start button “normally” located at the lower left corner of your computer then on programs, etc, as indicated in the root below: Start>Programs>SPSS for Windows>SPSS 11.0 for Windows When you click on the last option (SPSS 11.0 for Windows) of the above root you will see the “Data Editor Window” In general SPSS has four different types of windows namely:
  • 38. 4.2. STARTING SPSS 31 i. Data Editor; ii. An output Window; iii. A syntax Window; and iv. A Chart Editor We briefly describe each of these windows in turn. The Data Editor Window The Data Editor Window is where data can be entered and edited. The Data Editor is further divided into a data view and a variable view. At the top of the Data Editor you can see a menu line consisting of the following options: File, Edit, View, Data, Transform, Analyze, Graphs, Utilities, Window and Help. Figure 1. Data Editor menu For more details on how you can use each one of these options I refer you to any SPSS manual. I focus my attention on the “Analyze Menu” Here is where all the statistical analyses are carried out. The Output Window Through this window you can read the results of your analysis. Depending on the analysis you are carrying out, you can also see graphs and you can copy your results into another document (e.g., word) for further description. The Syntax Window The syntax window is used for coding your analysis manually. Through this window the user can code more advanced analyses, which may not be available in the stan- dard menu. To open the syntax window select File>New>Syntax. In the window you can enter the program code you want SPSS to perform. This requires a little more programming. However, when the code is ready to be run you mark it (with your mouse) and select Run> Selection. The Chart Editor The chart editor is used when editing a graph in SPSS. To be able to edit your graph you need first to double-click your graph.
  • 39. 32 CHAPTER 4. INTRODUCTION TO SPSS 4.3 Data entry There are basically two ways to enter data into SPSS. One is to manually enter directly into the Data Editor the other is to import data from a different program or a text file. For example from Excel, SAS, etc. I will illustrate both options here. For importing data, I will restrict myself to importing data from excel. 4.4 Keying data into SPSS As we have seen above, when SPSS is opened, by default the Data Editor is opened and this is where you can enter your data. Alternatively, to enter data go to File>New>Data. Before you start entering your data it is always a good idea to first give names to your variables. This is done by selecting the variable view in the Data Editor window. 4.4.1 Osteopathic manipulation data set The following is part of the data collected from a clinical trial1 whose prime objec- tive was to compare the effect of an osteopathic manipulation with a control group in measuring influence on blood flow at two different time points. This effect was assessed in 80 volunteer healthy subjects aged between 17 and 69 years. Blood flow (in ml/min) was measured from the right superficial femoral artery using Duplex- Doppler while subjects lying down on a research table at baseline (minute-0: M1), one minute after manipulation (M2) and four minutes after manipulation (M3). The variable Patid in the table below represents patient’s identity number. The variables initials, age, weight, height and gender carry the usual meaning. M1, M2 and M3 are as described above. Use this simple data set to practice data entry in SPSS. 1 A clinical trial is study that investigates the efficacy of drug (s).
  • 40. 4.4. KEYING DATA INTO SPSS 33 Table 4.1: Osteopathic Manipulation data set Patid Initials Age Weight Height Gender M1 M2 M3 1 SJ 31 75 178 M 109.5 262.1 136.4 2 TG 30 69 178 M 103.2 145.7 121.3 3 SF 38 73 176 M 221.2 231.7 111.9 4 WD 24 78 179 M 230.0 281.3 196.2 5 VF 54 73 162 F 112.4 120.4 139.7 6 SM 64 75 168 M 226.8 369.4 247.1 7 DWM 61 65 160 F 103.7 84.9 109.5 8 GM 34 60 166 F 178.5 139.6 154.6 9 VM 38 64 165 F 103.7 132.4 107.1 10 DWF 47 63 167 M 150.6 158.8 110.4 11 BE 49 85 172 M 149.1 72.0 96.2 12 CV 55 91 177 M 193.5 286.1 245.5 13 CG 25 69 170 F 183.4 270.6 183.7 Exercise Enter these data in your SPSS Data Editor window without labelling the variables. If you enter the data in SPSS without first giving names to the variables, SPSS labels the variables as var00001, var00002, etc. Do you see this? Next try to give names to the variables. As described above, you can give names to your variables via the variable view in the Data Editor window. Alternatively you can double click the variable. Now click on the first cell of the first column “Name” and type the name of the first variable as indicated in Table 3. That is, Patid, and then move on to the second cell and type the name of the second variable and so on. Under Type you define which type your variable is (numeric, string etc.). If you place the marker in the Type cell, a button like the one in Figure 1 below appears. Figure 1: Defining variables This button indicates that you can click it and a window like the one below in Figure 2 will show:
  • 41. 34 CHAPTER 4. INTRODUCTION TO SPSS Figure 2: Variable Type Numeric is selected if your variable exists of numbers. String is selected if your variable is a text (Male/Female). The same way you can specify Values and Missing. By selecting Label you get the possibility to further explain the respective variable in a sentence or so. This is often a very good idea since the variable name is restricted to only 8 characters. Missing is selected when defining if missing values occur among the observations of a variable. In Values you can enter a label for each possible response value of a discrete variable (e.g. 1 = Male and 2 = Female). When entering a variable name the following rules must be obeyed in SPPS for it to work: i. The name has to start with a letter and not end with a full stop (.). ii. No more than 8 characters can be entered. iii. Do not enter space or other characters like e.g.! ? ‘, and *. iv. No two variable names must be the same. When all data are entered and variable names are given you can save your data via select File>Save As. . . in the menu. 4.5 Opening an existing dataset If the dataset already exists in SPSS file you can easily open it. Select File>Open. . . and the dataset will automatically open in the Data Editor.
  • 42. 4.6. IMPORTING DATA 35 4.6 Importing data Sometimes the data are available in a different format than an SPSS data file. E.g. the data might be available as an Excel, SAS, or text file. As already mentioned we describe how to import data from excel. Importing data from Excel If you want to use data from an Excel file in SPSS there are two ways to import the data. i. One is to simply mark all the data in the Excel window (excluding the variable names) you want to enter into SPSS. Then copy and past them into the SPSS data window. The disadvantage by using this method is that the variable names cannot be included meaning that you will have to enter these manually after pasting the data. ii. The other option (where the variable names are automatically entered) is to do the following: • Open SPSS, select File> Open>Data. Choose the drive where the data are stored and then double click on the file you want to open or mark the file and click on the open icon on the open file menu. Under Files of type you select Excel, press ‘Open’, and the data now appear in the Data Editor in SPSS. 4.7 Exporting data Exporting data from SPSS to a different program is done by selecting File Save As. . . Under Save as type you select the format you want the data to be available in e.g. Excel. 4.8 ANOVA for one-way classification in SPSS Let us now see how we can use SPSS to perform analysis of variance for one way classification. I will use the machine data set discussed in Section 5.3.4 to illustrate the construction of the analysis of variance table. As described above, getting out- puts in SPSS is simple. Assuming that you have already entered the data, what you need to do next is to analyze the data by following the root below: Analyze>Compare Means>One-Way ANOVA If you click on the last option (One-Way ANOVA) you will see a window like the one below:
  • 43. 36 CHAPTER 4. INTRODUCTION TO SPSS The only dependent variable in this example is “volume” and the factor is “machine”. You can also include descriptive statistics in your outputs by clicking on the ”Op- tions” and then select descriptive. Below is the SPSS ANOVA table for the machine data set. SPSS ANOVA table-Machine data Source of Sum of df Mean Square F Sig. Variation Squares Between Groups 0.152 3 5.072E-02 0.834 0.500 Within Groups 0.729 12 6.078E-02 Total 0.882 15 From the above ANOVA table we see that, the results presented in SPSS are approx- imated to three decimal places. By default, SPSS, like any other software package gives the p-value (s) of the test (s)-indicated as Sig. in the last column of the table. The p-value indicates “how much evidence against the null hypothesis” has been observed in the outcomes of the experiment. Based on the given p-value (> 0.5) we do not reject the null hypothesis (remember we are testing the hypotheses at 0.05 level of significance), the conclusion we reached by comparing the F calculated value (0.83) and the critical F-value (3.49) from the table. For comparison purposes of the two ANOVA tables-the one we obtained before through mathematical calculations and the one from SPSS above, I reproduce below the ANOVA table obtained by mathematical computations. Are they similar? Source of variation DF SS MS F- ratio Between treatment 3 0.152145 0.051 0.83 Within treatment 12 0.729355 0.061 Total 15 0.881500 Exercise (optional) Use the Pea section data set to perform analysis of variance for one way classification
  • 44. 4.8. ANOVA FOR ONE-WAY CLASSIFICATION IN SPSS 37 with equal replication. Compare your ANOVA table with the one obtained through mathematical computations. Note: The above two examples-the machine and pea section data sets illustrates respectively what is termed as unbalanced and balanced data. Unbalanced in the first case in the sense that there are unequal numbers of replications of machines and balanced in the second case in the sense that the various sugar types are all replicated equal number of times (10 times).
  • 45. 38 CHAPTER 4. INTRODUCTION TO SPSS
  • 46. Chapter 5 Completely Randomized Design 5.1 Introduction When the experimental units are assumed to be fairly uniform or homogeneous, that is, no sources of variations other than the treatments are expected, grouping them (applying error/local control principle) will be pointless in the sense that very little (in terms of precision) may be gained. Thus, the simplest experimental design, which incorporates only the first two principles (randomisation and replication) of experimental designs, is the completely randomized design or CRD. CRD is a design in which the treatments are assigned completely at random to the experimental units, or vice-versa. Since we assume that there are no other sources of variations in the experiment except the treatments under investigation, then CRD imposes no restrictions, such as blocking on the allocation of the treatments to the experimental units. 5.2 Layout Suppose that we have t treatments under investigation and that the ithtreatment is to be replicated ri times, i =1, 2, . . . , t. For an experiment with t treatments each one replicated ri times, the total number of experimental units N = t i=1 ri. When ri = r, that is, the case of equal replication, N=rt. Definition: layout refers to the placement of treatment to the experimental units subject to conditions of the design. Randomisation in CRD can be carried out by using a random number table or any other probabilistic procedures. Example Suppose there are three treatments to be compared in a CRD. Suppose further that the treatments are replicated 4, 3 and 5 times respectively. Thus, a total of 39
  • 47. 40 CHAPTER 5. COMPLETELY RANDOMIZED DESIGN N = t i=1 ri=4 + 3 +5=12 experimental units. One possible layout of this experiment is as follows: T2 1 T1 2 T2 3 T3 4 T3 5 T1 6 T3 7 T1 8 T1 9 T3 10 T2 11 T3 12 5.3 Statistical analysis The analysis of CRD is the same as that of one way classification. Let Yij be the yield on the jthplot receiving treatment i. Thus, the model is: Yij = µ + ti + eij, i=1, 2, . . . , t; j=1, 2, . . . r Where: µ is the grand mean (average) yield over all the N plots, ti is the ith treatment effect eij is the experimental error Sums of squares are computed in the same way we discussed in one way classification. 5.3.1 Statistical hypotheses The statistical hypotheses of interest as we stated before are: Ho : µ1 = µ2 = ... = µt That is, the µi are all equal H1 : µi = µj for at least one i = j. That is, the µi are not all equal. Or simply Ho :There is no variation among the treatments H1 :Variation exists
  • 48. 5.4. ADVANTAGES AND DISADVANTAGES OF CRD 41 Table 5.1: ANOVA table Source of Variation DF SS MS F-ratio Treatments t-1 SSA SSA t−1 = MSA MSA MSE Error N − t SSE SSE N−t = MSE Total N-1 SST 5.3.2 Test procedure At level of significanceα, if F = MSA MSE > Fα, [(t − 1) , (N − t)] then there is evidence for no significance variation, i.e. we reject the null hypothesis. Otherwise, e do not reject. 5.4 Advantages and disadvantages of CRD 5.4.1 Advantages • Useful in small preliminary experiments and also in certain types of animal or laboratory experiments where the experimental units are homogeneous. • Flexibility in the number of treatments and the number of their replications. • Provides maximum number of d.f. for the estimation of experimental error- The precision of small experiment increases with error d.f. 5.4.2 Disadvantages • Its use is restricted to those cases in which homogeneous experimental units are available- local control not utilised. Thus, presence of entire variation may inflate the experimental error. • Rarely used in field experiments because the plots are not homogeneous. 5.5 Example A sample of plant material is thoroughly mixed and 15 aliquots taken from it for determination of potassium contents. 3 laboratory methods (I, II, and III) are em- ployed. “I” being the one generally used. 5 aliquots are analysed by each method, giving the following results (µg/ml). I 1.83 1.81 1.84 1.83 1.79 Method II 1.85 1.82 1.88 1.86 1.84 III 1.80 1.84 1.80 1.82 1.79 Examine whether methods II and III give results comparable to those of method I. Use α = 0.05
  • 49. 42 CHAPTER 5. COMPLETELY RANDOMIZED DESIGN Calculations The statistical model for this problem is Yij = µ + ti + eij. Here, i=1, 2, 3, j=1,2, 3, 4, 5. Grand total, G = 3 i=1 5 j=1 Yij=1.83+1.81+. . . +1.79=27.4 Total number of observations, N = k i=1 ni = 3 i=1 ni = n1 + n2 + n3=5+5+5=15. In the particular situation at hand (equal replication), N =rt =5×3=15 Correction factor, C.F= 3 i=1 5 j=1 Yij 2 rt = (27.4)2 15 = 750.76 15 =50.0507 Total sum of squared observations or uncorrected total sum of squares 3 i=1 5 j=1 Y 2 ij = (1.83)2 + (1.81)2 + ... + (1.79)2 = 50.0602 Total SS (SST) = 3 i=1 5 j=1 Y 2 ij − G2 rt =50.0602-50.0507=0.0095 =0.0095 Treatment (Method) totals: I=9.10, II=9.25, III=9.05 Treatment SS (SSTr) =1 r k i=1 Y 2 i − G2 rt , Yi = r j=1 Yij = 1 5 (9.10)2 + (9.25)2 + (9.05)2 − 50.0507 =50.055-50.0507 =0.0043 Error SS (SSE) =SST-SSTr =0.0095-0.0043 =0.0052 Table 5.2: ANOVA table Source of Variation DF SS MS F-ratio Between treatments 2 0.0043 0.00215 4.9654 Error (within treatments) 12 0.0052 0.00043 Total 14 0.0095
  • 50. 5.5. EXAMPLE 43 F0.05, 2, 12 = 3.89 Statistical hypothesis Ho : µ1 = µ2 = µ3That is, the µi are all equal H1 : µi = µjfor at least one i = j. That is, the µi are not all equal Or Ho: Methods do not differ H1: Methods differ Decision Since the F calculated value (4.9654)> the critical F-value (3.89) at 0.05 level of significance, we reject the null hypothesis and thus conclude that the laboratory results depends on the method of analysis. That is, there exist significance variations among the three laboratory methods. To examine whether methods II and III give results comparable to those of method I, we need to carry out further analysis using the t-test (LSD) as follows: Let the mean of method I be denoted by ¯Y1., of method II by ¯Y2.and that of method III by ¯Y3. Thus, ¯Y1. = 9.10 5 = 1.82, ¯Y2. = 9.25 5 = 1.85, ¯Y3. = 9.05 5 = 1.81 Statistical Hypotheses Here we need to test the hypotheses: H02 : µ1 = µ3 vs. H12 : µ1 = µ3 H03 : µ1 = µ3 vs. H13 : µ1 = µ3 Test procedure Reject: H02 if | ¯Y2. − ¯Y1.| > LSD = s 2 r × tN−t,α/2 and H03 if | ¯Y3. − ¯Y1.| > LSD = s 2 r × tN−t,α/2 Exercises 1. Complete the test. 2. Eight varieties, A − H, of black currant cuttings are planted in square plots in a nursery, each plot containing the same number of cuttings. Four plots of each variety are planted, A and the shoot length made in the first growing season measured.
  • 51. 44 CHAPTER 5. COMPLETELY RANDOMIZED DESIGN The plot totals are: A: 46 29 39 35 E: 16 37 24 30 B: 37 31 28 44 F: 41 28 38 29 C: 38 50 32 36 G: 56 48 44 44 D: 34 19 29 41 H: 23 31 29 37 B and C are standard varieties; assess the remaining six for vigour in comparison with B and C. Use α = 0.05
  • 52. Chapter 6 Randomised Block Design 6.1 Introduction CRD discussed in Chapter 3 will seldom be used if the experimental units are not alike. Hence, when experimental units may be meaningfully grouped, e.g., by area of field, device, hospital, salesmen, etc, clearly a completely randomised design (CRD) will be insufficient. In this situation an alternative strategy for assigning treatments to the experimental units, which takes advantage of the grouping, may be used. The alternative strategy that we are going to discuss is what we call the randomised block design or Randomised Complete Block Design (RCBD). In the randomised block design: • The groups are called blocks • Each treatment appears the same number of times in each block; hence the term complete block design • The simplest case is that where each treatment appears exactly once in each block. Here, because the number of replicates=number of experimental units for each treatment, we therefore have: number of replicates=number of blocks=r • Blocks are often called replicates for this reason • To set up such randomised block design the following steps are involved: (i) Divide the units into r more homogeneous groups commonly known as blocks. (ii) Assign the treatments at random to the experimental units within each block. This randomisation has to be done afresh for each block. Hence, the term randomised block design 45
  • 53. 46 CHAPTER 6. RANDOMISED BLOCK DESIGN Motivation: experimental units within blocks are alike as possible, so observed differences among them should be mainly attributed to the treatments. To ensure this interpretation holds, in the conduct of the experiment, all experimental units within a block should be treated as uniform as possible. Intuitively speaking, randomised block design is an improvement over the CRD. In the RBD the principle of local control can be applied along with the other two principles of experimental design (randomisation and replication). Number of experimental units (N ) Suppose we want to compare the effects of t treatments, each treatment being repli- cated an equal number of times, say r times. Then we need N =rt experimental units. 6.2 Layout To illustrate the layout of an RBD, consider 4 treatments, each replicated 3 times. So we need N =rt= 3×x4=12 experimental units which are grouped into 3 blocks of 4 units. Suppose the blocks formed after grouping the experimental units are labelled as 1, 2, and 3. To ensure randomness in every process involved in the experiment we select the block to start with in allocating the treatments to the experimental units at random. Assume the blocks are selected in the order 3, 1, 2. Thus, we start with the third block and assign the 4 treatments at random to it. As we have discussed, to assign the treatments, we may use any probabilistic procedure. Permutations is one of the probabilistic procedures that may be used to allocate treatments to experimental units. Suppose one of the permutations of the digits 1 to 4 for the treatment is 4, 1, 3, 2. Therefore we allocate treatment 4 in the first unit of block 3, treatment 1 in the second unit of block 3, up to treatment 2 in the fourth unit of block 3. That is, we have the following layout for block 3 (first selected block). T4 T1 T3 T2 Repeating the same procedure, suppose we select the permutations 3, 4, 2, 1 for block 1 and 2, 3, 4, 1 for block 2, finally get the following complete layout.
  • 54. 6.3. STATISTICAL ANALYSIS 47 Block 1 T3 T4 T2 T1 Block 2 T2 T3 T4 T1 Block 3 T4 T1 T3 T2 6.3 Statistical analysis The analysis of the design is the same as that of two-way classified data with one ob- servation per cell-experimental unit- (without replication) we discussed in Section 5.5. We use the same model we have discussed, Yij = µ + ti + bj + eij, i=1, 2, . . ., t, j=1, 2, . . ., r In words: Observation of the ith treatment from the jth block =general mean +ith treatment effect + jth block effect + experimental error component RECAP: we partition the total sum of squares into different components: Total SS=Treatment SS + Block SS + Error SS 6.3.1 Statistical hypotheses The hypotheses of interest are: HO1 : t1 = t2 = ... = tk Ho2 : b1 = b2 = ... = bk Against their alternative that tis, bjs are not all equal. ANOVA table Source of variation DF SS MS F-ratio Blocks r-1 SSB SSB r−1 = MSB MSB MSE = FB Treatment t-1 SSTr SSTr t−1 = MSTr MSTr MSE = FT r Error (r-1)(t-1) SSE SSE (r−1)(t−1) = MSE Total N-1 SST 6.3.2 Test procedure The calculated F-values for treatments and blocks (FBand FTr) are compared with the tabulated (critical) F-values at (t-1) and (r-1)(t-1) and (r-1) and (r-1)(t-1) de- grees of freedom respectively.
  • 55. 48 CHAPTER 6. RANDOMISED BLOCK DESIGN In symbols Fα, [(t − 1), (r − 1)(t − 1)] and Fα, [(r − 1), (r − 1)(t − 1)] Thus, if FB>Fα, [(r − 1), (r − 1)(t − 1)] we reject the null hypothesis, otherwise we do not reject. Also if FTr>Fα, [(t − 1), (r − 1)(t − 1)] we reject the null hypothesis, otherwise we do not reject. 6.4 Advantages and disadvantages of RBD 6.4.1 Advantages • Greater precision • Increased scope of inference is possible because more experimental conditions may be included 6.4.2 Disadvantages • Large number of treatments increases the block size; as a result the block may loose homogeneity leading to large experimental error. • Any missing observation in a unit in a block will lead to either: (i) discard the whole block (ii) estimate the missing value from the unit by special missing plot technique. 6.5 Example The following data are yields in bushels/acre from an agricultural experiment set out in a randomised complete clock design. The experiment was designed to investigate the differences in yield for seven hybrid varieties of wheat, labelled A-G here. A field was divided into 5 blocks, each containing 7 plots. In each plot, the seven plots were assigned at random to be planted with the seven varieties, one plot for each variety. A yield was recorded for each plot. Examine whether varieties affect the yield. Use α=0.05. Variety Block A B C D E F G Total I 10 9 11 15 10 12 11 78 II 11 10 12 12 10 11 12 78 III 12 13 10 14 15 13 13 90 IV 14 15 13 17 14 16 15 104 V 13 14 16 19 17 15 18 112 Total 60 61 62 77 66 67 69 G=462
  • 56. 6.5. EXAMPLE 49 We assume that the measurements are approximately normally distributed, with the same variance σ2. Calculations Total number of experimental observation (N) = r × t= 7×5=35 Grand total (G) = t i=1 r j=1 yij=10 + 9 +. . . + 15 + 18=462 Correction factor (C.F) =G2 N = (462)2 35 =6098.4 Uncorrected total sum of squares t i=1 r j=1 y2 ij=102 +92 +. . . + 152 + 182 =6314.0 Total sum of squares (SST) = t i=1 r j=1 y2 ij-C.F= 6314.0 - 6098.4=215.6 Treatment (variety) sum of squares (SSTr) = 1 r t i=1 Y 2 i -C.F =1 5(602 + ... + 692)-C.F =6140.0-6098.4 =41.6 Block sum of squares (SSB) = 1 t r j=1 Y 2 j -C.F =1 7(782 + ... + 1122)-C.F =6232.6-6098.4 =134.2 Error sum of squares (SSE) =Total SS-Treatment SS-Block SS = 215.6-134.2-41.6=39.8 Treatment (variety) mean squares (MSTr) = SSTr t−1 = 41.6 6 =6.93 Block mean squares (MSB) = SSB r−1 = 134.2 4 =33.54 Error mean squares (MSE) = SSE (t−1)(r−1) = 39.8 24 =1.66. Finally, we estimate the F-values. For block the F-calculated value is
  • 57. 50 CHAPTER 6. RANDOMISED BLOCK DESIGN =MSB MSE = 33.54 1.66 =20.21 and for treatments the F-calculated value is MSTr MSE = 6.93 1.66=4.18 We summarize the calculations in an ANOVA table as follows: ANOVA Table Source of variation D.F SS MS F-ratio Blocks 4 134.2 33.54 20.21 Treatments 6 41.6 6.93 4.18 Error 24 39.8 1.66 Total 34 215.6 To perform the hypothesis test for differences among the treatment means, we com- pare the F calculated values to the appropriate value from the F table. For level of significance α=0.05, we have F0.05;6,24 = 2.51. F-calculated (= 4.18)> F0.05;6,24 = 2.51. Therefore, we reject H0. There is evidence in these data to suggest that there are differences in mean yields among the varieties. To test the hypothesis on block differences, we find F0.05,4,24=2.78. We have 20.21>2.87, thus, we also reject H0. There is strong evidence in these data to suggest differ- ences in mean yield across blocks at the 5% level of significance. In SAS data hybrid; input Block $ Variety $ Yield @@; cards; I A 10 I B 9 I C 11 I D 15 I E 10 I F 12 I G 11 II A 11 II B 10 II C 12 II D 12 II E 10 II F 11 II G 12 III A 12 III B 13 III C 10 III D 14 III E 15 III F 13 III G 13 IV A 14 IV B 15 IV C 13 IV D 17 IV E 14 IV F 16 IV G 15 V A 13 V B 14 V C 16 V D 19 V E 17 V F 15 V G 18 ; run; proc print;run; proc anova; class Block Variety ; model Yield=Block Variety; run;quit; The SAS System Obs Block Variety Yield 1 I A 10
  • 58. 6.5. EXAMPLE 51 2 I B 9 3 I C 11 4 I D 15 5 I E 10 6 I F 12 7 I G 11 8 II A 11 9 II B 10 10 II C 12 11 II D 12 12 II E 10 13 II F 11 14 II G 12 15 III A 12 16 III B 13 17 III C 10 18 III D 14 19 III E 15 20 III F 13 21 III G 13 22 IV A 14 23 IV B 15 24 IV C 13 25 IV D 17 26 IV E 14 27 IV F 16 28 IV G 15 29 V A 13 30 V B 14 31 V C 16 32 V D 19 33 V E 17 34 V F 15 35 V G 18 Class Level Information Class Levels Values Block 5 I II III IV V Variety 7 A B C D E F G Number of observations 35 Dependent Variable: Yield Sum of Source DF Squares Mean Square F Value Pr > F Model 10 175.7714286 17.5771429 10.59 <.0001 Error 24 39.8285714 1.6595238 Corrected Total 34 215.6000000 R-Square Coeff Var Root MSE Yield Mean 0.815266 9.759281 1.288225 13.20000
  • 59. 52 CHAPTER 6. RANDOMISED BLOCK DESIGN Source DF Anova SS Mean Square F Value Pr > F Block 4 134.1714286 33.5428571 20.21 <.0001 Variety 6 41.6000000 6.9333333 4.18 0.0052 6.6 Reasons for blocking in RBD Note from these results that the blocking served to explain much of the overall vari- ation. To appreciate this further, suppose that we had not blocked the experiment, but instead had just conducted the experiment according to a completely random- ized design. Suppose that we ended up with the same data as in the experiment above. Under these conditions variety is the only classification factor for the plots, and we would construct the following analysis of variance table for one way-classification as discussed in the previous sections. ANOVA Table Source of variation DF SS MS F-ratio Between treatments (varieties) 6 41.6 6.93 1.12 Within treatments (error) 28 174.0 6.21 Total 34 215.6 The test for differences in mean yield for the varieties (treatments) would be to compare F=1.12 to F0.05,6,28=2.45. Note that we would thus not reject H0 of no treatment differences at the 5% level of significance. Concluding Remark From the above example, it is immediately clear that if the different sources of variation are not properly identified (e.g., due to erroneously accounting for the experimental design), then invalid conclusions will be drawn. The example discussed above clearly demonstrates the aspect of wrongly identify- ing the experimental design. In the one-way classification experiment and analysis presented above, there is no accounting for the variation in the data that is actually attributable to a systematic source, position in the field (the factor used to block the experiment). The one-way analysis has no choice but to attribute this variation to experimental error; that is, it regards this variation as just part of the inherent variation among experimental units that we cannot explain. The result is that the Error SS in the one-way analysis contains both variation due to position in the field (which is actually systematic variation) and inherent variation. Here, note that 143.2+39.8=174.0 and 4+24=28. This is the Error SS for the one-way classification analysis. Which actually may be regarded as ignoring the blocks (because what we really did was to pretend that the
  • 60. 6.6. REASONS FOR BLOCKING IN RBD 53 blocks didn’t exist) and thus resulting into big MSE in which we could not reject the H0. By blocking the experiment, and explicitly acknowledging position in the field as a potential source of variation, MSE was sufficiently reduced so that we could identify variety differences. It can be learned from this example that: • Blocking may be an effective means of explaining variation (increasing preci- sion) so that differences among treatments that may really exist are more likely to be detected. • The data from an experiment set up according to a particular design should be analysed according to the appropriate procedure for that design. The above shows that if we set up the experiment according to a randomised com- plete block design, but then analyse it as if it had been set up according to a completely randomised design, erroneous inferences results, in this case, failure to identify real treatment differences. Remember, the design of an experiment dictates the analysis!! Exercise Four different plant densities A-D are included in an experiment on the growth of lettuce. The experiment is laid out as a randomised block, and the same number of plants is harvested from each plot, giving the weights (recorded) below. Examine whether density appears to affect the yield. Use α = 0.01
  • 61. 54 CHAPTER 6. RANDOMISED BLOCK DESIGN Block Density I II III IV V VI A 2.7 2.6 3.1 3.0 2.5 3.0 B 3.0 2.8 3.1 3.2 2.8 3.1 C 3.3 3.3 3.5 3.4 3.0 3.2 D 3.2 3.0 3.3 3.2 3.0 3.1
  • 62. Chapter 7 Latin Square Design 7.1 Introduction There are often situations where it may be necessary to account for two sources of variation by blocking. If the number of treatments and levels of each blocking factor is large, the size of the experiment may become unwieldy or resources may be limited. Thus, in agricultural field experiments (and other situations), a particular setup is often used that allows differences among treatments to be assessed with less recourses. The principle of local control was used in the RBD by grouping the units in one way; i.e. according to blocks. The grouping can be carried one step forward and we can group the units in two ways, each way corresponding to a known source of variation among the units, and get the Latin Square Design (LSD). This design is used with advantage in agricultural experiments where the fertility contours are not always known. It has also been used successfully in industry and in the laboratory. Latin square design is a design, which uses the principle of local control twice. RBD removes one systematic source of variation in addition to treatments, but LSD removes two such sources. Hence, LSD is a three-way classification. In a field experiment if two probable fertility trends can be thought of, in directions at right angles, both need to be made the basis of blocking. Thus when there is a slope of the land being used, and also a climatic trend (e.g. effects of wind, rain) at right angles to this, a randomised block cannot take out all the known variation. 7.2 Layout In field experiments, the physical layout is that of a square with rows of plots. In this set up the layout is such that every letter (A, B, C,. . . ) the set of treatments occurs exactly once in each row and in each column. For four letters A, B, C, and D the layout would be as shown below. 55
  • 63. 56 CHAPTER 7. LATIN SQUARE DESIGN Column 1 2 3 4 5 1 E B A D C 2 C A D E B Row 3 B E C A D 4 A D B C E 5 D C E B A This type of setup would be useful when for example variability due to soil differences, etc, arises in two directions. Each plot would constitute a single experimental unit. This particular kind of setup with two blocking variables (rows and columns), in which the numbers of rows, columns and treatments are the same, is known as a Latin square. Notation Because the number of treatments, rows and columns are the same, the number of replicates on each treatment is equal to the number of treatments, rows, and columns. We will denote this as t. For given value of t, there may be several ways to construct a Latin square. This is actually as mathematical exercise. Extensive listings of ways to construct Latin squares for different values of t are often given in texts on experimental design (see for example, Montgomery, 2001). Selection of LSD The totality of LSD’s obtained from a single LSD by permuting rows, columns and treatments (letters) is called a transformation set. e.g. A B C D fixed B C D A C D A B D A B C A B C D C D A B D A B C B C D A A k×k Latin Square with k letters A, B, C, . . . in the natural order occurring in the first column is called a standard square (square in canonical form). e.g. A B C D B C D A C D A B D A B C From a standard k × k Latin Square, we may obtain k! (k-1)! Different LSD’s by permuting all the k columns and the (k-1) rows except the first row. Hence there
  • 64. 7.3. STATISTICAL ANALYSIS 57 are in all k! (k-1)! Different LSDs with the same standard square. Thus the total number of different LSDs in a transformation set is k! (k-1)! times the number of standard LSDs in the set. In order to give all k × k LSDs equal probability of being selected, we select one LSD from all k × k LSDs and then randomise the columns and rows, excluding the first row (if it is the fixed one). Randomisation consists of choosing one of the possible designs for given t at random. Then randomly assign the letters A, B, C, etc to the treatments of interest 7.2.1 Linear additive model To write down a model, we need to be a bit careful with the notation. The key is that, although we have three classifications (row, column and treatment) we do not have t × t × t = t3 observations; rather we only have t × t = t2 The mathematical model will now be, yijk = µ + ti + rj + ck + eijk where: yijk observation for the ith treatment appearing in row j, column k µ is an overall mean rj, ck represents the effects of the jth row and kth column ti represents the effect of the treatment appearing at position, j,k eijk error associated with the experimental unit appearing at position j,k 7.3 Statistical analysis The analysis proceeds along the same lines as for RBD, but instead of the one sum of squares for blocks; systematic variation is now taken out by two sums of squares, which are always called Rows (SSR) and columns (SSC). 7.3.1 Calculation of sums of squares To setup the analysis of variance we define Rjand Ck as the totals of all plots in the jth row and kth column respectively in the layout, the sums of squares required are: Total SS SST = t i=1 t j=1 t k=1 y2 ijk − G2 t2 Row SS (SSR)=1 t t j=1 R2 j − G2 t2 Column SS (SSC)=1 t t k=1 C2 k − G2 t2
  • 65. 58 CHAPTER 7. LATIN SQUARE DESIGN Treatment SS (SSTr) = 1 t t i=1 T2 i − G2 t2 Error SS (SSE)= SST –(SSR+SSC+SSTr) or SSE =SST – SSR –SSC -SSTr The degrees of freedom for SSR, SSC, SSTr are each (t-1); for SST, (t2-1), and so for SSE, (t2-1)-3 (t-1), reducing to (t-1)(t-2). Table 7.1: Three way ANOVA table Source of DF SS MS F-ratio variation Rows t-1 SSR SSR t−1 = MSR MSR MSE = FR Columns t-1 SSC SSC t−1 = MSC MSC MSE = FC Treatments t-1 SSTr SSTr t−1 = MSTr MSTr MSE = FTr Error (t-1)(t-2) SSE SSE (t−1)(t−2) = MSE Total t2-1 SST 7.4 Advantages and disadvantages of LSD 7.4.1 Advantages • Eliminates from the error two major sources of variation. Hence LSD is an improvement over RBD in controlling error by planned grouping just as the RBD is an improvement over CRD. • LSD is a 3-way incomplete layout since LSD considers treatments, rows, and columns at the same number of levels t, we would need a complete three way layout of t3 number of experimental units. However, since we are using t2 number of experimental units, then it is said to be a 3-way incomplete layout. 7.4.2 Disadvantages • A serious limitation of the LSD is that the number of replicates must be the same as the number of treatments, the larger the square the more is the repli- cates, hence the bigger the blocks (columns and rows). Hence larger squares (over 12×12 ) are seldom used in the sense that the squares does not remain homogeneous. On the other hand, small squares provide only a few degrees of freedom for the error. Preferable LSDs are form 5×5 to 8×8. • The analysis depends heavily on the assumption that there are no interactions present. • Analysis becomes very difficult where there are missing observations.