2. Probability
• Definition: The chance of any event occuring.
• Probability Density Function (PDF): The chance of
occurrence of a single random value within a range of
continuous values.
• Cumulative distribution function (CDF): The chance
of a single random value being less than a certain
value within a given sample space – Here the CDF is
for value of t-statistic.
3. Normal Distribution
• Standard Normal Dist – Based on infinite samples. Unimodal with
symmetrical tails – Mean = 0, SD = 1 – Hypothetical – Does not
exist – Used as PDF for Z-test.
• Normal Distribution – Unimodal with symmetrical tails with
Mean=Median=Mode, Skewness is (-3 to +3) and Kurtosis is (-1 to
+1). Can be converted to Std Normal by computation of Z-scores
and plotting the z-score frequency distribution. Z= (x-µ)/SD.
4. Property of Normal Distribution
• 68% of samples will lie within 1 SD from mean; 95%
will lie within 2 SD of mean; 99.7% will lie within 3 SD
of mean.
5. T-test
• Single sample T-test – Testing sample mean against
a known value.
• Independent samples T-test – Testing sample 1
mean against sample 2 mean.
• Paired T-test – Testing sample 1 mean before
against sample 1 mean after.
• Based on T-distribution – Similar to Normal
distribution but with lower peak and fatter tails.
6. Hypothesis testing using Normal/t-
Distributions
• Using Confidence Intervals –
Qualitative.
• Using CDF – Gives actual Probability -
Quantitative
7. Hypothesis testing using CDF
• First we try to summarize the “effect” – that is the
actual effect of our sample relative to the random
error that might have crept in – Here, it is the t-
statistic.
• The CDF for t-test basically tells us about the
probability of t-statistic of our study being less than
a certain value of t, given a specific “degree of
freedom”.
• For T-distribution, The PDF changes with increasing
sample size (increased degree of freedom). Thus the
CDF also changes.
8. Central Limit theorem
• The central limit theorem states that, as the sample size
increases, the shape of the sampling distribution approaches
normal shape. For n = 30, the shape of that distribution is
'almost' normal.
• Some researchers say, parametric methods may be used even in
non-normal data if sample size is large enough.
9. T-test
Assumptions:
1. Data are on a numerical scale
2. The distribution of the underlying population is
normal – Shapiro Wilk/Kolmogrov Smirnov
3. The samples have the same variance ('homogeneity of
variances') – Levene’s test – If variances are not
similar, Welch T-test is used to accommodate for
this.
4. Observations within a group are independent
5. The samples are randomly drawn from the population
• Null hypothesis - that there is no difference
between two means.
• Developed by - W.L. Gosset, and published under the
pseudonym Student.
11. T-test
• Involves calculation of T-statistic from difference of
means and the SE. – Basically encapsulated the
difference relative to SE.
• Look up the T-statistic on probability distribution
table based on degrees of freedom (sample space for
the CDF).
• Basically it looks at the probability of one sample
mean belonging to population of the other mean.
• The fatter tails of the T-distribution at lower
dF/sample sizes basically increases the distance of
rejection area from the sample mean and thus makes
the probability testing stricter, to account for larger
SDs in smaller samples.
12. Parametric and Non-parametric Tests
• Parametric tests are based on the assumption of almost normal
distribution of data within the groups – The probability
distribution tables for estimation of p-values are based on this
assumption.
• Parametric tests are based on estimation of statistic based on
actual values of variables – mean, SD.
• Thus if not normally distributed, erroneous p-values may be
computed.
• Non-parametric tests are based on Ranks of data within the set
– Hence not affect by extreme values/non normality of data
distribution/ordinal scale data.
• Parametric tests are usually more powerful than Non-parametric
if normality assumption is maintained– in the sense that Beta
error is low.
• If normality is not maintained, Non-parametric tests become
more powerful.
14. Independent Samples T-test on
SPSS• Necessities:
1. Your grouping variable should be coded numerically, 1/0; 1/2
etc. You may label the values appropriately in the “Variable
view”
2. Your dependent variable of interest should be in a separate
column.
15. Checking for Normality
• Qualitative: Histogram; QQ Plot
• Quantitative: Shapiro-wilk test.
• Here you need to see normality within each group So you need
to conduct separate tests of normality for each group
simultaneously So split the file.
• Go to Data Split File.
• Put the grouping variable in “Organize output by groups” click
Ok.
16. • Next go to Analyze Descriptive Statistics
Explore.
• Put variables of interest in “Dependent List” Click
“Plots” tab Check “Normality plots with tests” and
“Histogram” Continue Ok.
17. Normality Output
Tells you about any missing
Cases.
Both groups
normality
assumption should
be satisfied.
Statistical test of
Normality. If P-value is >
0.05/0.01, then Normal,
otherwise not Normal.
Weight Normal
Height Not Normal
18. Independent Samples T-test
• Although normality assumption was violated, just as an example,
we’ll conduct both parametric and non parametric tests on this
data.
• First Unsplit the file. Go to Data Split File Check “Analyze
all cases..”
• Go to Analyze “Compare Means” “Independent Samples t-
test”.
• Select the variables of interest and transfer to “Test Variables”
window. Transfer Grouping variable and Specify the groups –
Here 1/2. Click continue Ok.
19. Output
Descriptives – Self
Explanatory
If Levene’s test p > 0.05, go for equal variances assumed, else equal variances not
assumed
P-value of t-testT-statistic Degree of
freedom
20. Mann Whitney U-Test
• If you want to go for non parametric test instead,
• Go to Analyze Nonparametric tests -
Independent samples.
• Same procedure as t-test – Place test variable and
grouping variable.
P.S. It is also called Wilcoxon Rank Sum test.
21. Output
Just tells about mean rank and sum of
ranks.
Not important for us.
P-value for difference between groups.
Note this does not provide the descriptives. Take descriptives using
procedure described before – Median and Interquartile range important for
Non-parametric tests.
22. T-test on Graphpad
• Keep data ready in Excel Needs to be copy pasted in
Graphpad
• Open Graphpad.
• Select “Columns” from tabs on the left.
• Click the “Enter replicate values…” option as shown in pic
Create
23. • Create separate columns for group-variable as shown in pic and
paste the values from Excel.
• Click Analyze button.
• Click “Column statistics”
• Select the two columns for comparison and click Ok.
24. • Select all the descriptives you want.
• Select Shapiro-Wilk test.
• Click Ok.
25.
26. • Now we know normality of the group variables. And descriptives.
• Click Analyze button.
• Click “t-tests (and…”
• Select the two columns for comparison and click Ok.
27. • Click the appropriate – parametric or nonparametric
test.
• If using t-test, better go for Welch’s correction.
• Click OK.
T-test
Mean diff stats
Levene’s test
Output
28. Paired t-test on SPSS
• Used to test difference of means for a
variable in matched groups or same
samples at different time points.
• Data of the variable should be in 2
columns.
• Normality assumption has to be
satisfied for both variables – since
same samples are being used, no
splitting required – directly do Shapiro
Wilk on the two variables.
• Take out the descriptives of the two
variables as described before.
• Then Analyze Compare Means
Paired Samples T-test.
29. • Insert the before and after variables as pairs
as shown Click OK.
Output
Mean and SD of difference P-value
30. Wilcoxon Signed Rank Test
• Non Parametric equivalent of Paired t-test.
• Analyze Nonparametric tests 2-Related samples
• Fill test pairs same as paired t-test OK
31. Output
Just tells about mean rank
and sum of ranks.
Not important for us.
P-value for difference
between the variables.
Note this does not provide the descriptives. Take descriptives using
procedure described before – Median and Interquartile range important for
Non-parametric tests.
32. On Graphpad
• Enter column data as previously described.
• Analyze T-test Check the required variables
OK Click “Paired” and parametric/non parametric
as required.
Paired T-test
Mean diff stats
Correlation Stats
Output
33. Single Sample T-test on SPSS
• Used to test difference of Sample mean from that of an another known
mean. In Data View - Variable in a single column.
• Test for normality – for parametric vs non parametric.
• For parametric, Go to Analyze Compare Means One Sample t-test.
• Suppose we want to see whether the sample mean is different from
population average of 65 kg Insert test variable and enter the “Test
Value” as 65 OK.
35. On Graphpad
• Create separate column for
variable by pasting the values from
Excel.
• Click Analyze button.
• Click “Column statistics”
• Select the column for comparison
and click Ok.
• Click the required descriptives,
Normality test and both the one-
sample tests under “Inferences”.
• Enter Hypothetical value and click
OK.
36. Number of values 30
Minimum 63.00
25% Percentile 69.50
Median 73.50
75% Percentile 81.50
Maximum 90.00
Mean 75.23
Std. Deviation 7.899
Std. Error of Mean 1.442
Lower 95% CI of mean 72.28
Upper 95% CI of mean 78.18
Shapiro-Wilk normality test
W 0.9550
P value 0.2290
Passed normality test (alpha=0.05)? Yes
P value summary ns
One sample t test
Theoretical mean 65.00
Actual mean 75.23
Discrepancy -10.23
95% CI of discrepancy 7.284 to 13.18
t, df t=7.096 df=29
P value (two tailed) < 0.0001
Significant (alpha=0.05)? Yes
Wilcoxon Signed Rank Test
Theoretical median 65.00
Actual median 73.50
Discrepancy -8.500
Sum of signed ranks (W) 421.0
Sum of positive ranks 428.0
Sum of negative ranks -7.000
P value (two tailed) < 0.0001
Exact or estimate? Exact
Significant (alpha=0.05)? Yes
Sum 2257
Descriptives
Normality test
One-Sample t-test
One-Sample Wilcoxon Signed Rank test
Output