ANOVA analysis of cancer survival times by organ affected

Chapter 15 The Analysis of Variance

[object Object],A Problem 1 Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival time in terminal human cancer. Proceedings of the National Academy of Science , USA, 75 , 4538-4542.

[object Object],[object Object],A Problem The hypotheses used to answer the question of interest are The question is similar to ones encountered in chapter 11 where we looked at tests for the difference of means of two different variables. In this case we are interested in looking a more than two variable.

[object Object],[object Object],[object Object],[object Object],Single-factor Analysis of Variance (ANOVA)

[object Object],[object Object],[object Object],Single-factor Analysis of Variance (ANOVA)

[object Object],Single-factor Analysis of Variance (ANOVA)

Single-factor Analysis of Variance (ANOVA) Notice that in the above comparative dotplot, the differences in the treatment means is large relative to the variability within the samples.

Single-factor Analysis of Variance (ANOVA) Notice that in the above comparative dotplot, the differences in the treatment means is not easily understood relative to the sample variability. ANOVA techniques will allow us to determined if those differences are significant.

ANOVA Notation k = number of populations or treatments being compared Population or treatment 1 2 … k Population or treatment mean µ 1 µ 2 … µ k Sample size n 1 n 2 … n k Sample mean … Population or treatment variance … Sample variance …

ANOVA Notation N = n 1 + n 2 + … + n k (Total number of observations in the data set) T = grand total = sum of all N observations

Assumptions for ANOVA ,[object Object],[object Object],[object Object],[object Object]

Definitions A measure of disparity among the sample means is the treatment sum of squares , denoted by SSTr is given by A measure of variation within the k samples, called error sum of squares and denoted by SSE is given by

Definitions A mean square is a sum of squares divided by its df. In particular, The error df comes from adding the df’s associated with each of the sample variances: (n 1 - 1) + (n 2 - 1) + …+ (n k - 1) = n 1 + n 2 … + n k - 1 - 1 - … - 1 = N - k mean square for treatments = MSTr = mean square for error = MSE =

Example Three filling machines are used by a bottler to fill 12 oz cans of soda. In an attempt to determine if the three machines are filling the cans to the same (mean) level, independent samples of cans filled by each were selected and the amounts of soda in the cans measured. The samples are given below. Machine 1 12.033 11.985 12.009 12.009 12.033 12.025 12.054 12.050 Machine 2 12.031 11.985 11.998 11.992 11.985 12.027 11.987 Machine 3 12.034 12.021 12.038 12.058 12.001 12.020 12.029 12.011 12.021

Example mean square for treatments = MSTr = mean square for error = MSE =

Comments ,[object Object],[object Object],More specifically, when H 0 is true, µ MSTr = µ MSE . However, when H 0 is false, µ MSTr = µ MSE and the greater the differences among the  ’s, the larger µ MSTr will be relative to µ MSE .

The Single-Factor ANOVA F Test Null hypothesis: H 0 : µ 1 = µ 2 = µ 3 = … = µ k Alternate hypothesis: At least two of the µ ’s are different Test Statistic:

The Single-Factor ANOVA F Test When H 0 is true and the ANOVA assumptions are reasonable, F has an F distribution with df 1 = k - 1 and df 2 = N - k. Values of F more contradictory to H 0 than what was calculated are values even farther out in the upper tail, so the P-value is the area captured in the upper tail of the corresponding F curve.

Example Consider the earlier example involving the three filling machines. Machine 1 12.033 11.985 12.009 12.009 12.033 12.025 12.054 12.050 Machine 2 12.031 11.985 11.998 11.992 11.985 12.027 11.987 Machine 3 12.034 12.021 12.038 12.058 12.001 12.020 12.029 12.011 12.021

Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example ,[object Object],*When the sample sizes are large, we can make judgments about both the equality of the standard deviations and the normality of the underlying populations with a comparative boxplot.

Example ,[object Object],From the F table with numerator df 1 = 2 and denominator df 2 = 21 we can see that 0.025 < P-value < 0.05 (Minitab reports this value to be 0.038 3.835

Example ,[object Object],[object Object]

Total Sum of Squares The relationship between the three sums of squares is SSTo = SSTr + SSE which is often called the fundamental identity for single-factor ANOVA . Informally this relation is expressed as Total variation = Explained variation + Unexplained variation Total sum of squares , denoted by SSTo , is given by with associated df = N - 1.

Single-factor ANOVA Table The following is a fairly standard way of presenting the important calculations from an single-factor ANOVA. The output from most statistical packages will contain an additional column giving the P-value.

Single-factor ANOVA Table The ANOVA table supplied by Minitab One-way ANOVA: Fills versus Machine Analysis of Variance for Fills Source DF SS MS F P Machine 2 0.003016 0.001508 3.84 0.038 Error 21 0.008256 0.000393 Total 23 0.011271

Another Example A food company produces 4 different brands of salsa. In order to determine if the four brands had the same sodium levels, 10 bottles of each Brand were randomly (and independently) obtained and the sodium content in milligrams (mg) per tablespoon serving was measured. The sample data are given on the next slide. Use the data to perform an appropriate hypothesis test at the 0.05 level of significance.

Another Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Another Example ,[object Object],[object Object],[object Object],[object Object],[object Object]

Another Example ,[object Object]

Example Treatment df = k - 1 = 4 - 1 = 3 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example ,[object Object],Error df = N - k = 40 - 4 = 36

Example ,[object Object],[object Object],Using df = 30 we find P-value < 0.001 7.96

Example ,[object Object],[object Object],We need to learn how to interpret the results and will spend some time on developing techniques to describe the differences among the µ ’s.

Multiple Comparisons ,[object Object],[object Object],Specifically, if k populations or treatments are studied, we would create k(k-1)/2 differences. (i.e., with 3 treatments one would generate confidence intervals for µ 1 - µ 2 , µ 1 - µ 3 and µ 2 - µ 3 .) Notice that it is only necessary to look at a confidence interval for µ 1 - µ 2 to see if µ 1 and µ 2 differ.

The Tukey-Kramer Multiple Comparison Procedure When there are k populations or treatments being compared, k(k-1)/2 confidence intervals must be computed. If we denote the relevant Studentized range critical value by q, the intervals are as follows: For  i -  j : Two means are judged to differ significantly if the corresponding interval does not include zero.

The Tukey-Kramer Multiple Comparison Procedure When all of the sample sizes are the same, we denote n by n = n 1 = n 2 = n 3 = … = n k , and the confidence intervals (for µ i - µ j ) simplify to

Example (continued) Continuing with example dealing with the sodium content for the four Brands of salsa we shall compute the Tukey-Kramer 95% Tukey-Kramer confidence intervals for µ A - µ B , µ A - µ C , µ A - µ D , µ B - µ C , µ B - µ D and µ C - µ D .

Example (continued) Notice that the confidence intervals for µ A – µ B , µ A – µ C and µ C – µ D do not contain 0 so we can infer that the mean sodium content for Brands C is different from Brands A, B and D.

Example (continued) We also illustrate the differences with the following listing of the sample means in increasing order with lines underneath those blocks of means that are indistinguishable. Brand B Brand A Brand D Brand C 44.591 44.900 45.180 47.056 Notice that the confidence interval for µ A – µ C , µ B – µ C , and µ C – µ D do not contain 0 so we can infer that the mean sodium content for Brand C and all others differ.

Minitab Output for Example One-way ANOVA: Sodium versus Brand Analysis of Variance for Sodium Source DF SS MS F P Brand 3 36.91 12.30 7.96 0.000 Error 36 55.63 1.55 Total 39 92.54 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ------+---------+---------+---------+ Brand A 10 44.900 1.180 (-----*------) Brand B 10 44.591 1.148 (------*-----) Brand C 10 47.056 1.331 (------*------) Brand D 10 45.180 1.304 (------*-----) ------+---------+---------+---------+ Pooled StDev = 1.243 44.4 45.6 46.8 48.0

Minitab Output for Example Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0107 Critical value = 3.81 Intervals for (column level mean) - (row level mean) Brand A Brand B Brand C Brand B -1.189 1.807 Brand C -3.654 -3.963 -0.658 -0.967 Brand D -1.778 -2.087 0.378 1.218 0.909 3.374

Simultaneous Confidence Level The Tukey-Kramer intervals are created in a manner that controls the simultaneous confidence level . For example at the 95% level, if the procedure is used repeatedly on many different data sets, in the long run only about 5% of the time would at least one of the intervals not include that value of what it is estimating. We then talk about the family error rate being 5% which is the maximum probability of one or more of the confidence intervals of the differences of mean not containing the true difference of mean.

Randomized Block Experiment ,[object Object]

Assumptions and Hypotheses ,[object Object],[object Object],[object Object]

Summary of the Randomized Block F Test Notation: Let k = number of treatments l = number of blocks = average of all observations in block I = average if all observations for treatment i = average of all kl observations in the experiment (the grand mean)

Summary of the Randomized Block F Test Sums of squares and associated df’s are as follows.

Summary of the Randomized Block F Test SSE is obtained by subtraction through the use of the fundamental identity SSTo = SSTr + SSBl + SSE The test is based on df 1 = k - 1 and df 2 = (k - 1)(l - 1) Test statistic: where

The ANOVA Table for a Randomized Block Experiment

Multiple Comparisons As before, in single-factor ANOVA, once H 0 has been rejected, declare that treatments I and j differ significantly if the interval does not include zero, where q is based on a comparison of k treatments and error df = (k - 1)(l - 1).

Example (Food Prices) ,[object Object],[object Object]

Example (Food Prices) ,[object Object],[object Object],[object Object]

Example (Food Prices) H 0 : µ A = µ B = µ C H a : At least two among are µ A , µ B and µ C are different

Conclusions ,[object Object],[object Object]

Conclusions ,[object Object],We therefore conclude that Store A is cheaper on the average than Store B and Store C. Store C Store B Store A $2.24 $2.81 $3.20

Two-Factor ANOVA Notation: k = number of levels of factor A l = number of levels of factor B kl = number of treatments (each one a combination of a factor A level and a factor B level) m = number of observations on each treatment

Two-Factor ANOVA Example A grocery store has two stocking supervisors, Fred & Wilma. The store is open 24 hours a day and would like to schedule these two individuals in a manner that is most effective. To help determine how to schedule them, a sample of their work was obtained by scheduling each of them for 5 times in each of the three shifts and then tracked the number of cases of groceries that were emptied and stacked during the shift. The data follows on the next slide.

Interactions There is said to be an interaction between the factors , if the change in true average response when the level of one factor changes depend on the level of the other factor. One can look at the possible interaction between two factors by drawing an interactions plot , which is a graph of the means of the response for one factor plotted against the values of the other factor.

Two-Factor ANOVA Example A table of the sample means for the 30 observations.

Two-Factor ANOVA Example Typically, only one of these interactions plots will be constructed. As you can see from these diagrams, there is a suggestion that Fred does better during the day and Wilma is better at night or during the swing shift. The question to ask is “Are these differences significant?” Specifically is there an interaction between the supervisor and the shift.

Interactions ,[object Object],[object Object],[object Object],[object Object]

Basic Assumptions for Two-Factor ANOVA The observations on any particular treatment are independently selected from a normal distribution with variance  2 (the same variance for each treatment), and samples from different treatments are independent of one another.

Two-Factor ANOVA Table The following is a fairly standard way of presenting the important calculations for an two-factor ANOVA. The fundamental identity is SSTo = SSA + SSB + SSAB +SSE

Two-Factor ANOVA Example Minitab output for the Two-Factor ANOVA Two-way ANOVA: Cases versus Shift, Supervisor Analysis of Variance for Cases Source DF SS MS F P Shift 2 5437 2719 1.82 0.184 Supervis 1 7584 7584 5.07 0.034 Interaction 2 14365 7183 4.80 0.018 Error 24 35878 1495 Total 29 63265 1. Test of H 0 : no interaction between supervisor and Shift There is evidence of an interaction.

ANOVA analysis of cancer survival times by organ affected

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to ANOVA analysis of cancer survival times by organ affected

Similar to ANOVA analysis of cancer survival times by organ affected (20)

More from rwmiller

More from rwmiller (13)

Recently uploaded

Recently uploaded (20)

ANOVA analysis of cancer survival times by organ affected