2. Introduction
• Biological studies deal with organisms
which show variety
• We cannot rely on a single measurement
and so we must take a sample
• This sample of data must be summarised
and analyzed to find out if it is reliable
Spacebar to continue
3. Summarising data
• MEAN Sum of samples ÷ sample size
x
÷
n
• MEDIAN Middle number in a list when
arranged in rank order: 2, 5, 7, 7, 8, 23, 31
• MODE The measurement which occurs
most frequently ; 2, 5, 7, 7, 8, 23, 31
Spacebar to continue
4. Distribution Curves
• A visual summary of data
• They can be produced by;
1. Collect data
2. Split results into equal size classes
3. Make a tally chart
4. Plot a histogram of frequency against size class
• Data can show normal distribution or
skewed distribution
Spacebar to continue
5. Distribution curves
• Normal distribution
• Symmetrical bell
shaped curve around
the mean
• Use parametric tests to
analyse data
16
14
12
10
8
6
4
2
0
Spacebar to continue
6. Distribution curves
• Skewed data
• Asymmetrical curve
around the mode
• Use non-parametric
tests to analyse data
18
16
14
12
10
8
6
4
2
0
Spacebar to continue
8. Standard deviation
• A high SD indicates data which shows great
variation from the mean
• A low SD indicates data which shows little
variation from the mean value
• By definition, 68% of all data values lie
within the range MEAN 1SD
• 95% of all values lie within 2SD
Spacebar to continue
10. Calculating SD
• Can only be used for normally distributed
data
• Calculate as follows;
–
–
–
–
–
Sum the values for x2 ie ( x2)
Sum the values for x, then square it ie ( x)2
Divide ( x)2 by n
Take one from the other and divide by n
Take the square root of this.
(see hand-out)
Spacebar to continue
12. Confidence limits
• 95% of all values lie within 2SD of the
mean
• Any value which lies outside this range is
said to be significantly different from the
others
• We say that we are working to 95%
confidence limits or to a 5% significance
level.
Spacebar to continue
13. Comparison tests
• To compare two samples of data we look at
the overlap between the two distribution
curves.
• This depends on;
– The distance between the two mean values
– The spread of each sample (standard deviation)
• The greater the overlap, the more similar
the two samples are.
Spacebar to continue
15. Comparison tests
When the SD is small, the overlap is less;
Sample 2
Overlap
Sample 1
Spacebar to continue
16. The null hypothesis
• In order to compare two sets of data we
must first assume that there is no difference
between them.
• This is called the null hypothesis
• We must also produce an alternative
hypothesis which states that there is a
difference.
Spacebar to continue
17. The t-test
• Used to compare the overlap of two sets of
data
• Samples must show normal distribution
• Sample size (n) should be greater than 30
• This tests for differences between two sets
of data
Spacebar to continue
18. The t-test
• To calculate t;
– Check data is normally distributed by drawing a
tally chart
– Work out difference in means |x1 – x2|
– Calculate variance for each set of data (this is s2
n)
– Put these into the equation for t:
Spacebar to continue
20. The t-test
• Compare the value of t with the critical value
at n1 + n2 – 2 degrees of freedom
• Use a probability value of 5%
• If t is greater than the critical value we can
reject the null hypothesis…
• … there is a significant difference between the
two sets of data
• … there is only a 5% chance that any
similarity is due to chance
21. Mann-Whitney u-test
• Compares two sets of data
• Data can be skewed
• Sample size can be small;
5<n<30
• For details refer to stats book
Spacebar to continue
22. Chi squared
• Some data is categoric
• This means that it belongs to one or more
categories
• Examples include
– eye colour
– presence or absence data
– texture of seeds
• For these we use a chi squared test 2
• This tests for an association between two or more
variables
23. Chi squared
• Draw a contingency table
• These are the observed values
Blue eyes
Green eyes
Row totals
Fair hair
a
b
a+b
Ginger hair
c
d
c+d
Column
totals
a+c
b+d
a+b+c+d
24. Chi squared
• Now work out the expected values:
• Where,
(Row total) x (Column total)
E=
(Grand total)
25. Chi squared
Blue eyes
Fair hair
Ginger hair
Column
totals
Green eyes
(a+b)(a+c)
(a+b+c+d)
(c+d)(a+c)
(a+b+c+d)
(a+b)(b+d)
(a+b+c+d)
(c+d)(b+d)
(a+b+c+d)
a+c
b+d
Row totals
a+b
c+d
a+b+c+d
26. Chi squared
• For each box work out (O-E)2 E
• Find the sum of these to get 2
(O-E)2
2
=
E
27. Chi squared
• Compare 2 with the critical value at 5% confidence
limits
• There will be
(no. rows – 1) x (no. columns – 1)
degrees of freedom
• If 2 is greater than the critical value we can say
that the variables are associated with one
another in some way
• We reject the null hypothesis
28. Spearman Rank
• Two sets of data may show a correlation
• The data can be plotted on a scatter graph:
Negative correlation
Positive correlation
No correlation
29. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
Data 2 Rank
12
24
14
29
18
29
18
38
30. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
12
14
1
Data 2 Rank
This is the
Lowest value –
So we call it
rank 1
24
29
18
29
18
38
31. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
12
14
18
18
Data 2 Rank
1
2
24
This is the
2nd lowest
value – so we
call it rank 2
29
29
38
32. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
12
1
14
2
18
?
18
?
Data 2 Rank
These should be
rank 3 & 4 – but
they are the
same. We find
the average of 3
+ 4 and give
them this rank
24
29
29
38
33. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
Data 2 Rank
12
1
24
14
2
29
18
3.5
29
18
3.5
(3+4)/2 = 3.5
38
34. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
Data 2 Rank
12
1
14
2
18
3.5
29
18
3.5
38
Similarly on this
side
24
29
35. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
Data 2 Rank
12
1
24
14
2
29
18
3.5
29
18
3.5
38
1
36. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
Data 2 Rank
12
1
24
1
14
2
29
2.5
18
3.5
29
2.5
18
3.5
The average
of 2 & 3
38
37. Spearman Rank
• We calculate the correlation by assigning a
rank to the values:
Data 1 Rank
Data 2 Rank
12
1
24
1
14
2
29
2.5
18
3.5
29
2.5
18
3.5
38
4
38. Spearman Rank
•
•
•
•
Find the difference D between each rank
Square this difference
Sum the D2 values
Calculate the Spearman Rank Correlation
Coefficient rs
6 D2
rs = 1 - n(n2-1)
39. Spearman Rank
• Compare rs with the critical value at the 5% level
• If it is greater than the critical value (ignoring the
sign) then we reject the null hypothesis
• … there is a significant correlation between the
two sets of data
• If the value is positive there is a positive
correlation
• If it is negative then there is a negative correlation
40. Quick guide
Is your data interval data or is it
categoric data (it can only be placed in
a number of categories)
Interval
Categoric
41. Quick guide
Are you looking for a correlation
between two sets of data – eg the rate
of photosynthesis and light intensity
Yes
No