2. Focus Fox
1. Using six words, describe your spring break:
2. How will you select the individuals in your sample for your
significance test?
3. In general, what is the claim you wish to test?
3. Inference -Dist. by Categories
By the end of this chapter, we will be able to answer questions
like the following:
- Are births evenly distributed across the days of the week?
- Does background music influence customer purchases?
- Is there an association between anger level and heart disease?
- Is the distribution of the colors of skittles in each package true
to expected distribution produced in the factory?
4. What’s in Your Package??
Assuming the company’s claim is true, we would expect 24% of
the M&M’s to be blue.
If you had 60 M&M’s in your package, how many should be
blue??
Compute the expected counts for your bag and record your
results in the “Expected” column of your table.
Check that the sum of the expected counts equal the number of
M&M’s in your package.
5. What’s in Your Package??
How close are your observed counts to the expected counts?
Calculate the difference for each color and record in your table:
Observed – Expected
*What do you notice about the ∑ (Observed – Expected)??
Since the difference = 0, this does not help us determine how far
off your package is from the claim….
6. What’s in Your Package??
So square all the values (remember variance and stnd dev)
Compute the squared values for the differences in the observed and
expected counts and find the sum.
Compare your results with your peers.
7. What’s in Your Package??
The last column has you divide the difference by the expected
count for each color – this is a distance difference
The sum of this column is called a chi-square statistic denoted by
χ2. (similar to our “z-score”)
8. What’s in Your Package??
If your sample reflects the claim
- Your observed should be close to your expected
- Your values making up χ2 should be very small
Are the entries in the last row all similar or does one stand out as
much larger or much smaller than the others? Did you get way more
or way less than expected of one color?
Compare your χ2 with your peers’ χ2 answers.
Does anyone have a χ2 that provides convincing evidence against the
company’s claim?
9. Inference -Dist. by Categories
We could run a 1 sample test on each color, but…
- That is inefficient and we would get conflicting results
- That wouldn’t tell us how likely it is to get a random sample of
60 candies with a color distribution that differs as much from the
one claimed by the company of all colors at one time
Need a new test:
chi-square goodness-of-fit test
10. Inference -Dist. by Categories
Null hypothesis in chi-square goodness-of-fit test:
- States the claim about the distribution of a single categorical
variable in the population of interest
H0: The company’s stated color distribution for M&M’s
Milk Chocolate Candies is correct
Alternative hypothesis in a chi-square goodness-of-fit test:
- States the categorical variable does not have the specified
distribution
Ha: The company’s stated color distribution for M&M’s
Milk Chocolate Candies is not correct
11. Inference -Dist. by Categories
H0: The company’s stated color distribution for M&M’s
Milk Chocolate Candies is correct
Ha: The company’s stated color distribution for M&M’s
Milk Chocolate Candies is not correct
Hypotheses can also be written as:
H0: pblue = 0.24, porange = 0.20, pgreen = 0.16, pyellow = 0.14,
pred = .13, pbrown = .13
Ha: at least one of the pi’s is incorrect
12. Inference -Dist. by Categories
Caution:
DON’T state the alternative in a way that suggests the all the
proportions in the hypothesized distribution are wrong
H0: pblue ≠ 0.24, porange ≠ 0.20, pgreen ≠ 0.16, pyellow ≠ 0.14,
pred ≠ .13, pbrown ≠ .13
Goal: to compare observed counts with expected counts IF null is
true
The more the observed counts differ, the more evidence we have
against the null