Stats chapter 14

Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures

The problem Suppose we open a bag of M&M’s and count the number of M&M’s of each color. How would we know if our color counts are at normal levels? How would we know if our color counts were abnormal?

Chi-Square Distribution When we want to test the proportion of many counts (i.e. a two-way table or an array), we need to use a new distribution- The Chi-Square Distribution (Chi =  = “KAI”) As you might suspect, this is another (the last of the year) PHANTOMS procedure. The 2 distribution is found at table D and the [2nd] -> [Vars] (DIsT) menu on your calculator

The 2 distribution Like the t-distribution, the 2 distribution is variable. i.e. the distribution also has degrees of freedom. It is single peaked, right skewed. As the df increases, the peak decreases in height, moves to the right and becomes more symmetric/Normal. As df increases, the 2 statistic needed for statistically significant results also increases

Chi-Square Goodness of Fit When we want to check whether a distribution fits a hypothesized distribution, we use the “2goodness of fit test” This is procedure is frequently used to see if a distribution is not in equal proportions No, this will not be much different than what we have already been doing for the last 3 chapters.

2GOF Test Parameter Unlike previous tests, you will not need to state a  or a p. You need to state where the distribution come from. EXWe are investigating the proportions of all 15 oz. bags ofchocolate M&M’s of M&M’s

2GOF Test HypothesesThere are two styles for stating hypothesis Style 1 In this style, you will refer to a written table-or- state that all proportions are “equal” H0: the proportions of M&M’s are the same as the table providedHa: at least one color count is different than the table H0: the proportions of accidents for each day is equalHa: at least one day has a count that is not equal

2GOF Test Hypotheses (cont.) Style 2 In this style, you will write out the expected proportions H0: pred = pblue = pyel = pbrn = pgrn = porg = 1/6Ha: at least one probability is different that stated above.

2GOF Test Hypotheses (cont) Notice that the alternative hypothesis in each case is that at least one proportion is different than hypothesized

2GOF Test Assumptions 1. All expected cell counts are greater than 1 2. No more than 20% of the cell counts is less than 5 (that’s a whole lot easier, yeah?) Name of the Test “2Goodness Of Fit Test”

2GOF Test Test Statistic Observed Count (O) is the count for each cell that we observed. The sum of each observed count is ‘n’ Expected Count (E) is the expected frequency of each cell times the sample size ‘n’

2GOF Test Test Statistic (cont) If we opened up a bag of M&M’s and found the following count: RedBlueBrwnYelGrnOrng O : 5 3 10 6 4 3 n = 31 E: 5.17 5.17 5.17 5.17 5.17 5.17 Note: expected counts are all equal to 31/6We are testing to see if M&M’s come in equal proportions

2GOF Test Test Statistic (cont) The test statistic is 2(“kai squared”): Degrees of freedom (df) = # of classes – 1

2GOF Test Test Statistic (cont.)

2GOF Test P Value p val = P(2(df) > test statistic ) on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdf Usage: “2-cdf( lower, upper, df ) pval = P(2(5) > 6.739)

2GOF Test P Value p val = P(2(df) > test statistic ) on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdf Usage: “2-cdf( lower, upper, df ) pval = P(2(5) > 6.739) pval = 0.2409

2GOF Test Decision Similarly to the other tests, reject the null hypothesis when the p-value is below the accepted level Summary Use the same 3 part summary: 1) Interpret the p value w.r.t. sampling distribution 2) Make decision with reference to an alpha level 3) Summarize the results in context of the problem

2GOF Test Summary (cont.) “The given proportions in a sample of 31 would appear in approximately 24% of all random samples.” “Because this p value is greater than any acceptable alpha levels, we fail to reject the null hypothesis.” “We do not have sufficient evidence to conclude that the color distribution in M&M’s is not equally distributed”

Calculator methods TI83/84 Begin by storing the observed counts in “L1” Store the expected counts in “L2”

Calculator methods TI83/84 Begin by storing the observed counts in “L1” Store the expected counts in “L2” From the Home Screen evaluate:“sum((L1 – L2)2/L2)”

Calculator methods TI83/84 Begin by storing the observed counts in “L1” Store the expected counts in “L2” From the Home Screen evaluate:“sum((L1 – L2)2/L2)” This is the value of 2.

Calculator methods TI83/84 Begin by storing the observed counts in “L1” Store the expected counts in “L2” From the Home Screen evaluate:“sum((L1 – L2)2/L2)” This is the value of 2. Use the 2-cdf from the “Dist Menu” to find p-value “2-cdf (lower, upper, df)

14.2 Inference For Two-Way Tables

Comparing two-groups ,[object Object]

Not that information is presented in a two-way table with marginal distributions

Is there a relationship between these two categorical variables??,[object Object]

Expected cell count for 2-way tables

Expected cell count for 2-way tables % of population that are in the column

Expected cell count for 2-way tables Count of cell if the rows “obeyed”the column percentages

Expected cell count for 2-way tables Even for a small table, these calculations get cumbersome

Expected Counts 30 99 243 84 Row total x Column Total Expected = Total

Expected Counts 30 99 243 84 99 x 84 Expected = 243

Expected Counts 30 99 243 84 99 x 84 Expected = = 34.22 243

Expected Counts 34.44 99 243 84 99 x 84 Expected = = 34.22 243

Expected Counts 34.44 99 243 84 99 x 84 Expected = = 34.22 243 Let’s start with the PHANTOMS procedure

2 Test for Homogeneity Parameter State where each proportion comes from and what each count represents “We are investigating the proportions of customers in the store who purchase French, Italian or other wine while listening to French, Italian or other music.”

2 Test for Homogeneity Hypotheses The null hypothesis is always “the distributions of (group A) are the same in all population of (group B)” The alternative hypothesis is always “the distribution of (group A) are not all the same “H0: the distributions of wine types are the same in all populations of music types Ha: the distributions of wine types are not all the same”

2 Test for Homogeneity Assumptions (1) No more than 20% of the expected cell counts are less than 5 (2) All expected cell counts are > 1 (3) In a 2 x 2 table, all expected counts are greater than 5

2 Test for Homogeneity “All expected cell counts are greater than 5”

2 Test for Homogeneity Test Statistic

2 Test for Homogeneity P Value Decision

2 Test for Homogeneity P Value Decision Reject null hypothesis

2 Test for Homogeneity Summary Approximately 0.1% of the time, a random sample of 243 will produce the distribution given. Because the p value is less than an  of 0.05, we will reject the null hypothesis. We have sufficient evidence at the 5% significance level to conclude that the distribution of wine types purchased is not the same in all music types.

Calculator Methods Methods on the TI84

Calculator Methods Methods on the TI84 Before you begin the test, you must enter the “observed counts” into MATRIX [A] [2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]

Calculator Methods Methods on the TI84 Before you begin the test, you must enter the “observed counts” into MATRIX [A] [2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1] Input the correct matrix size and cell counts(Use [ENTER] or the Cursor Keys to switch between fields.)

Calculator Methods Methods on the TI84 (cont.) IMPORTANT: after inputting the observed matrix, quit and go to the home screen [STAT] -> “TESTS” -> “2 Test”

Calculator Methods Methods on the TI84 (cont.) IMPORTANT: after inputting the observed matrix, quit and go to the home screen [STAT] -> “TESTS” -> “2 Test” Ensure that “Observed” is set to [A] and“Expected” is set to [B] “Calculate”

Calculator Methods Methods on the TI84 (cont.) IMPORTANT: after inputting the observed matrix, quit and go to the home screen [STAT] -> “TESTS” -> “2 Test” Ensure that “Observed” is set to [A] and“Expected” is set to [B] “Calculate” The expected cell counts will be calculated and stored in Matrix [B] (go back to the Matrix menu to see the expected Counts)

2 Tests Occasionally, you will be asked to find the cell that “contributed the most to the 2 statistic.” When this is asked, you must calculate the 2 statistic by hand and find the largest value of(O – E)2 / E. This is usually the cell that differs the most from the expected count Since this is a percent calculation, it is not always predictable.

2 Test for Independence A similar test for two way tables is the “2 Test for Independence” sometimes called“2 Test for Association” This test is asks the question, “do the two variables influence each other?” When there is no association, the observed two-way table is close to the expected table

2 Test for Independence This test really only differs from the test for homogeneity in the hypotheses and the conclusion. Hypotheses The null hypothesis is “there is no association between (group 1) and (group 2)” The alternative hypothesis is “there is an association between (group 1) and (group 2)”

2 Test for Independence Conclusion Phrase your conclusions similar to the ones we have been constructing. When failing to reject H0:After interpreting the p value and comparing the p value to alpha, state that there is “no evidence to conclude that an association exists between (group 1) and (group 2)” Likewise, when rejecting H0, state that “there is sufficient evidence to conclude that an association exists between (group 1) and (group 2)”

Assignment 14.2 Page 877 #29, 31, 32, 33

Stats chapter 14

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Stats chapter 14

Similar to Stats chapter 14 (20)

More from Richard Ferreria

More from Richard Ferreria (20)

Recently uploaded

Recently uploaded (20)

Stats chapter 14