SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
BUSINESS STATISTICS I
Excercises — Weeks 36 – 50
Antonio Rivero Ostoic
School of Business and Social Sciences
 September −  December 
AARHUS
UNIVERSITYAU
Example
Using X
A company’s vending machines consume on average 460 kwh of
electricity with a standard deviation of 5 kwh
• What is the probability that a vending machine in a given
location consumes less than 470 kwh?
P(X < 470) = P X−µ
σ
< 470−460
5
= P(Z < 2) = 0.9772 = 98%
• ...and the probability for using more than 470 kwh?
P(X > 470) = P X−µ
σ
> 470−460
5
= P(Z > 2) = 1 − P(Z < 2)
= 1 − .9772 = 0.0228 = 2%
16 / 32
Example II
Using X
• What is the probability that 3 vending machines consume less
than 465 kwh?
 i.e. P(X  465)
Since we assume that X is normally distributed, the standard
error of the mean must consider the sample size
σx = σ√
n
= 5√
3
= 2.89
P(X  465) = P
(X−µx)
σx
 465−460
2.89
= P(Z  1.73) = .9582 = 96%
17 / 32
Inference of the sample mean
Example
A sample distribution with n = 3 tells us that a vending machine
consumes on average 470 kwh with σ = 5 kwh
Then we can compute the 95% probability that the mean is located
in a certain range from the sample mean
Since z.025 = 1.96, then P(−1.96  Z  1.96) = .95
By adding µ and by multiplying by σ/
√
n to all terms in the
probability statement we get:
P(µ − 1.96 σ√
n
 X  1.96 σ√
n
+ µ) = .95
P(470 − 1.96 5√
3
 X  1.96 5√
3
+ 470) = .95
P(464.3  X  475.7) = .95
ª hence the sample mean will fall between 464.3 and 475.7 with 95%
probability, which means that the computed sample mean is
supported by the sample statistic
20 / 32
Example
finding ˆP
Last year 30% of the schools in town have installed our vending
machine cooler, and we want to see whether or not a proportion of
schools will continue using our machine in the next year
• If we make a random sample of 25 schools, what is the probability
that more than 35% of the sample schools will choose our machine?
Since we have just a success or failure we have a binomial
experiment with p = .30 with n = 25
We want to find P(ˆP  .35)
σˆp = p(1 − p)/n = (.30)(.70)/25 = .0917
P(ˆP  .35) =
ˆP−p
√
p(1−p)/n
 .35−.30
.0917
= P(Z  .545) = 1 − P(Z  .545) ≈ 1 − .705 = .295 = 30%
26 / 32
Example X1 − X2
Our company’s vending machines electricity consumption is normally
distributed with mean of 460 kwh and standard deviation of 5 kwh.
A rival company produces vending machine coolers with normally
distributed consumption of electricity with 455 kwh on average and
10 kwh as standard deviation.
• What is the probability that the average of electricity consumption
of our company’s machines exceed the rival machines if we take
random samples of size 30 and 10 respectively?
i.e. P(X1 − X2  0) with µ1 − µ2 = 460 − 455 = 5 and
σx1−x2
=
σ2
1
n1
+
σ2
2
n2
= 52
30
+ 102
10
=
√
10.833 = 3.29
P(X1 − X2  0) = P
(X1−X2)−(µ1−µ2)
σ2
1
n1
+
σ2
2
n2
 0−5
3.29
= P(Z  −1.52) = 1 − P(Z  −1.52) = 1 − .0643 = .9357
29 / 32
Example finding µ with known σ
Our vending machines delivers a soft drink can after few seconds
the costumer press the bottom, but the competition is about to
launch a new vending machine model that we suspect that is faster
than our product
• We need to estimate a 95% confidence interval estimate of the mean
from a sample of 15 machines, and we know from technical
specifications that the standard deviation from the mean in our
machines is .38 seconds.
Response time in seconds to deliver the product:
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
 CI estimator for µ with known σ:
x = 2.13. 95% confidence level means α = .05; zα/2 = z.025 = 1.96
x ± zα/2
σ√
n
 2.13 ± 1.96 .38√
15
 ‘error’: ± 0.19
Thus LCL = 1.93 and UCL = 2.32 or else 1.93; 2.32
17 / 24
Hypothesis Testing about Parameters
Examples
We have seen in the examples that: the mean of our company’s
vending machines electricity consumption is 460 kwh, or
H0 : µ = 460
• Is there enough evidence that the mean parameter is not
equal to this value?
H1 : µ = 460
• What if we want to test whether there is evidence of an
increase or decrease in the average electricity
consumption of the machines:
H1 : µ  460
H1 : µ  460
 Similar statements are made for µ less/more than or equal to a certain value
9 / 31
Example
Testing µ when σ is known
We have µ = 460 as the average electricity consumption in kwh
for our machines with σ = 5
• Compute the rejection region for a sample mean of 465 with
random sample size 3 and a 5% significance level
xL−µ
σ/
√
n
= zα  xL−460
5/
√
3
= 1.645  xL = 464.7
Which means that the rejection region is:
x  464.7
Since the sample mean of 465 is in the rejection region we
reject the null hypothesis
 We conclude that there is sufficient evidence that the average
electricity consumption for our machines exceeds 460 khw
17 / 31
Example
Standardized test
With a standardized test statistic we check that
z  zα
z = x−µ
σ/
√
n
= 465−460
5/
√
3
= 1.73
Because the value of z (= 1.73) is greater than the z-score of
the chosen significance level (z.05 = 1.65), then we reject the
null hypothesis
 ...and conclude once more that there is sufficient evidence
that the average electricity consumption for our machines
exceeds 460 khw
ª The results of both the test statistic x and the standardized test statistic z
are identical, and hence the standardized test statistic is typically used and
it is just called as the test statistic
18 / 31
Computing the p-value
In order to compute the p-value the example we calculate the
probability of observing a sample mean at least as large as
465 given that µ = 470
That is,
p-value = P(X  465) = x−µ
σ/
√
n
 465−460
5/
√
3
= P(Z  1.73)
= 1 − P(Z  1.73) = 1 − .9582 = 0.0418
As a result, the probability of observing a sample mean at
least as large as 465 given that µ = 470 is 4%
ª which is relative small and we reject H0 in favor of the
alternative hypothesis
20 / 31
Example
computing β
In case that the hypothesized population mean is 470 kwh with
σ = 5, and a sample size of 3, the value of β for a 5% confidence
level is:
β = P X−µ
σ/
√
n
 464.7−470
5/
√
3
= P(Z  −1.84) = 0.03
Thus when the mean is 470, the probability of incorrectly not
rejecting a false H0 is 3%
26 / 31
Example t-statistic
Vending machines
A random sample of 15 of our vending machines shows that the
response time in seconds to deliver a soft drink is 2.125
seconds on average. Data cf. lecture week 41(43)
But the competition is about to launch a new vending machine
model that might deliver the product faster than ours and
according to their specifications it is going take about 2
seconds.
 Do we have enough evidence to conclude that our vending
machines are still competitive?
• In this case the research hypothesis is in relation to the
competition
H1 : µ  2
whereas the null hypothesis is
H0 : µ = 2
10 / 22
Example t-statistic II
Since we do not count with the population standard deviation, we
apply the t test statistic with the usual 5% α level and check that
t  tα,ν
ª In this case we need the value of s, which is .411
• The test statistic is computed next
t = 2.125−2
.411/
√
15
= 1.18
and we calculate the rejection region for n − 1 df as
t.05,14 = 1.761
ª Because the score of the test statistic is smaller than the
critical value we do not reject the null hypothesis in favor to H1
 There is not sufficient statistical evidence that the response time
for our machines exceeds the 2 seconds
11 / 22
Example t-statistic revisited
Vending machines
In order to be ‘competitive’ our machines should be faster
than the machines of the competition, and the research
hypothesis can be formulated as
H1 : µ  1.9
In this case the outcome of the test statistic becomes
t = 2.125−1.9
.411/
√
15
= 2.12
ª Now the test statistic score is greater than the critical
value at .05 α level, which means that we reject the null
hypothesis in favor of the alternative
 As a result there is enough statistical evidence that the
response time for our machines exceeds the 1.9 seconds
15 / 22
Example t-statistic revisited II
Vending machines
For a t score of 2.12 the p-value is calculated as follows
1.761  2.12  2.145
which means that
.05  P(t  2.12)  .025
 Thus the p-value lies between 2.5% and 5% and it is under
the significance level, which means that H0 is rejected in
favor of the alternative
16 / 22
Example χ2
statistic
Vending machines
We have seen previously from a random sample n = 15 that the response
time of our vending machines to deliver the product is on average
2.125 seconds.
From technical specifications the standard deviation parameter is .38
seconds. cf. lecture week 41(43)
• Thus x = 2.125, whereas σ2 = (σ)2 = .382 = .1444
 Is there sufficient statistical evidence from the sample data to
claim that the variability in time response is not larger such
theoretical value?
• Now the research hypothesis is related to the population variance
H1 : σ2
 .1444
and the null hypothesis is (or should be)
H0 : σ2 = .1444 (H0 : σ2 .1444)
12 / 21
Example χ2
statistic II
test statistic
For the variance parameter we apply the χ2 statistic with the
standard significance level of 5%
χ2 =
(n−1)s2
σ2
0
ª And the computation requires the sample variance
s2 = (s)2 = .4112 = .169
So the test statistic is
χ2 = 14×.169
.1444
= 16.39
With the lower-tail critical value of χ2
.95,14 = 6.57
• Because the test statistic (16.39) is larger than the critical value
(6.57) we are unable to reject the null hypothesis
 There is insufficient evidence to claim that to the variability in
time response is not greater than the value given in the technical
specifications
13 / 21
Example χ2
statistic III
upper test
However, most of the times we want to test whether or not the lack
of precision in a certain process or product exceeds a particular
value specified in the standards.
In such case the claim we want to test is whether or not the
standard deviation of the vending machines do not exceed .38 seconds,
which means that the research hypothesis becomes
H1 : σ2  .1444
• Here the comparison of the test statistic with the critical value is
χ2  χ2
α,ν
• And since the test statistic value 16.39 is smaller than χ2
.05, 14 = 23.7,
then we are unable to reject the null hypothesis
 We do not have sufficient evidence either to claim that the time
response is greater than the value given in the specifications
14 / 21
Example testing proportions
Vending machines
• The proportion of defective vending machines in our sample are the
units having a response time greater than the upper 95% confidence
level
ª The value of the UCL was calculated to 2.32 seconds -cf. lecture week
41 (43)- and the defective units in the sampling are then emphasized
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
 This means that, out of 15 machines, 5 have shown to be defective.
Hence ˆp = 5
15
= 1
3
or equally 0.333
8 / 26
Example testing proportions
p0
• The specifications for the vending machines however establish
that a defective machine is a unit having a response time
greater than 2.50
 The defective machines in the sample according to the prescribed
limits are emphasized in bold
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
And the proportion in this case is 3
15
= 0.2, which will
represent the parameter
ª but be aware that the proportion parameter can come from
another source
9 / 26
Example testing proportions
hypothesis formulation
The hypotheses are formulated in order to answer a question
like:
• Should we invest in a new product delivery system in the vending
machines to fulfill the technical specifications?
Hence we want to see whether there is enough statistical
evidence to claim that the proportion of defects units found in
the sample data is larger than the population proportion.
 And in this case we perform a one-sided test where the null
hypothesis is that p = p0, whereas the alternative hypothesis is
that p  p0
10 / 26
Example testing proportions
test statistic
The test statistic takes the sample proportion as the
departure from the prescribed limit and the standard error of
the proportion (e.g.)
z =
ˆP−p0
p0(1−p0)
n
= 0.333−0.2
0.2(0.8)
15
= 1.41
ª This outcome has to be confronted to the critical value
z.95 = 1.645 from the cumulative Z probabilities table
• Since the value of this test statistic 1.41 is less than the
critical value (1.645) we are not able to reject H0
 There is not sufficient evidence at the 5% significance level
to claim that the sample proportion of defect machines exceeds
the population proportion and hence no new product delivery
system is indeed needed.
11 / 26
Example CI for Proportions
Vending machines
• We wish to find the proportion of defective vending machines in our
sample that are the units having a response time greater than the
upper 95% confidence level.
ª Since the UCL equals to 2.32 seconds -cf. lecture week 41 (43)- the
defective units in the sampling are emphasized
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
 This means that, out of 15 machines, 5 have shown to be defective.
• As a result, ˆp = 5/15 = 0.333, and the maximum error of the estimate is
1.96 × 0.333(1 − 0.333)/15 = 0.239
 0.333 ± 0.239  LCL: 0.094; UCL: 0.572
14 / 26
Example Sample size
method 2
• The sample size for proportions within 3% error margin with
95% confidence level is
n =
1.96·
√
ˆp(1−ˆp)
.03
ª In the case of the vending machines the interval estimation
for such confidential level is ˆp = 0.333, which means that
n =
1.96×
√
0.333(0.666)
.03
2
= (30.79)2  948
 Thus we need a sample size of at least 948 units to get a 3%
error estimate with the desired confidence level
19 / 26
Example F-Test
 The response time in seconds for the vending machines has been
measured in a random sample of size 15 with the following
statistics: mean = 2.125, SD = 0.411.
While a random sample of 20 machines from the competition presumably
gives a mean of 2.089 and SD of .5 in the sample distribution
• The question is whether or not our machines are more accurate than
the machines from the competition, i.e. have smaller variability?
ª The parameter of interest is the ratio of two variances and we
perform a F-test of equality of variances.
Since the SD (and hence the variance) is smaller in our machines
than the competition, then σ2
2 corresponds to our machines and σ2
1 to
the competition
18 / 30
Stem-and-Leaf display and Box Plot of the data
OWN COMPETITION
0 | 9
1 | 4 1 | 4
1 | 5799 1 | 56899
2 | 0122344 2 | 00112244
2 | 578 2 | 55889
1 1.5 2 2.5 3
comp
own
Response time (seconds)
Gropus
19 / 30
Example F-Test
test statistic
• The research hypothesis is H1 : σ2
1/σ2
2  1 or whether the true ratio
of the variances is greater than one
• The test statistic is computed next
F = s2
1/s2
2 = 0.5
0.411
2
= 1.48
• And for an upper one-sided test with a standard significance
level the critical value is F.05, 19,14 = 2.4
ª Because the test statistic (1.48) is not greater than 2.4 then we
fail to reject the null hypothesis
 There is not enough statistical evidence at a 5% significance
level to claim that our machines are more accurate than the
machines from the competition
20 / 30
Example µ1 − µ2
vending machines
 In the F-test performed before we were unable to reject the
null hypothesis that the ratio of the two variances equals 1.
• This implies that to test the difference between two means in
our machines and in the competition we assume equal variances
in the populations
ª Recall that the sample statistics are s1 = .5, x1 = 2.089, n1 = 20;
s2 = .411, x2 = 2.125, n2 = 15
• Now the question is whether the mean µ2 for our machines is
greater than the mean µ1 from the competition?
 This is a one-sided (lower-tail) test for the difference
between two means where H1 : (µ1 − µ2)  0.
26 / 30
Example test µ1 − µ2
ª For equal variances we compute first the pooled variance estimator
s2
p =
(20−1)(.52
) + (15−1)(.4112
)
20 + 15 − 2
= 7.13
33
= .216
• And the test statistic is calculated using such estimate
t =
(2.089−2.125) − 0
.216× 1
20
+ 1
15
= −.227
ª To determine the critical value we compute the number of degrees
of freedom ν = n1 + n2 − 2 = 33
• The rejection region at the standard alpha value is
t  t1−α,ν = t1−.05,33 = −t.05,33 ≈ −t.05,35 = −1.69
• Because the t score is greater than the critical value we are
unable to reject H0.
 There is not sufficient evidence at 5% α level to claim that the
average response time in our machines differ from the competition.
27 / 30
Example estimation µ1 − µ2
The estimation of the difference between two means with
equal variance parameters is based on the two-sided 95%
confidence intervals
(x1 − x2) ± tα/2, ν · s2
p
1
n1
+ 1
n2
(2.089 − 2.125) ± 2.03 · .216 × 1
20
+ 1
15
−.036 ± .322
 (µ1 − µ2) ∈ [−0.36; 0.29]
 As the 0 is within the interval, we cannot conclude that
there is a significant difference between the mean response
time in our machines and the competition.
28 / 30
p-value
Despite we did not rejected the null hypotheses in the example;
we compute a range of the p-values from the test statistics.
• For µ1 − µ2, we find from the t table for ν ≈ 35 that the
closest critical value to the t score is
1.306  −.227
which means that
.10  P(t  −.227)
 and hence we can reasonably say that p-value  10%
• For σ2
1/σ2
2 the F score is 1.48 that is smaller than (≈) F.100,20,15
1.92  1.48
 So the p-value in this case is more than 10% as well because
.100  P(F  1.48)  (or  5% with F.050,20,15 = 2.33  1.48).
29 / 30
Example Test of µD
• Recall that the response time in our vending machines was measured
in a random sample with n = 15, and we obtained the response time
from the competition from a random sample with n = 20.
Suppose that the first 15 measurements from the competitors sample
match the observations in our sample, and this is because e.g. the
machines use the same version of the main control board.
ª The paired data with CPU v., and the difference between samples are
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
1.55 2.09 2.79 1.43 1.96 2.07 2.35 1.88 2.75 1.93 2.21 2.49 1.80 1.48 2.89
0.82 -0.14 -1.30 0.62 0.31 0.46 -0.48 0.54 -1.32 0.31 0.48 -0.33 0.08 0.23 -0.07
ª The sample statistics for our machines are x2 = 2.125, s2 = .411; and
for the paired competitors are x1 = 2.111, s1 = .468
• Due the experiment design the parameter of interest in this case is
µD or the mean of the population differences for dependent samples.
11 / 27
Example Test of µD II
• The question is whether there is a difference between the true mean
for our machines and the mean from the competition?
 Because the sample mean in our machines is still higher than the
competition the alternative hypothesis is formulated as H1 : µD  0
• Next we compute the statistics of the paired differences xD = .014,
and sD = .646, and these outcomes are used in the t test statistic
t = 0.014 − 0
0.646/
√
15
= .084
• The rejection region at the standard α value is calculated
t  t1−α, ν = t1−.05,14 = −t.05,14 = −1.76
• Because the t score (.084) is greater than the critical value we are
unable to reject the null hypothesis.
 There is not sufficient evidence at 5% α level to infer that there
is a difference between the response time in our machines and from
the competition using the same version of the control board.
12 / 27
Example Estimation of µD
• The estimation of the mean difference for paired data is based on
a two-sided standard confidence level
xD ± tα/2, ν
sD
√
nD
0.014 ± 2.145 ×
0.646
√
15
ª Hence the maximum error estimate is .358
 LCL = −.34 and UCL = .37
 We cannot conclude either that there is a significant difference
between the mean response time in our machines and the competition
using the same version of the control board.
13 / 27
Example test of p1 − p2
vending machines
By testing the difference between two population proportions
we can compare e.g. the introduction years’ market for
vending machines in a given town belonging to either our
company or to the competition.
 Assuming that 12 machines in the samples are placed in this
market; the question is whether or not there enough evidence
to conclude that our machines are more popular than the
competition in this town?
• To be consistent with the coding of the populations the
research hypothesis is that (p1 − p2)  0, and hence the null
hypothesis adopts equal proportions in the testing.
• For the test statistics we require both sample proportions
and also the pooled estimate
ª ˆp1 =
12
20
= .6 ˆp2 =
12
15
= .8 and ˆp =
12 + 12
20 + 15
= .686
23 / 27
Example test of p1 − p2
test statistic
• The test statistic is computed next
z =
(.6 − .8)
(.686)(1−.686)× 1
20
+ 1
15
= −1.26
ª And the standard rejection region for a cumulated Z statistic is
z  zα = z.05 = −1.645
• Since the outcome of the z test statistic is greater than the
critical value we do not reject H0
 At 5% significance level there is not sufficient evidence to
conclude that our vending machines were more popular than the
competition in town during the introductory year.
 Here P(Z  z) = .1038 constitutes the associated p-value.
24 / 27
Example test of p1 − p2 Revisited
• However if we consider only the paired sample data and only 7 machines
from the competition were placed on town during the introductory year,
then for the same hypothesis testing design the test statistics become
z =
(.533 − .8)
(.667)(1−.667)× 1
15
+ 1
15
= −1.89
• Which means that zobs  zα and hence we reject the null hypothesis in
favor to the alternative.
 There is enough statistical evidence at the standard significance
level to conclude that our machines were more popular in town during
the introductory year than the competition if we consider that they
were using the same main control board (cf. test of µD)
 The p-value then must be computed P(Z  z) = .0294, and since it is
less than α, it means that the result is statistical significant.
 It is also possible to test proportion parameters with an hypothesized
difference value different than zero, so e.g. H0 : (p1 − p2) = .05
25 / 27
Example estimation of p1 − p2
 To estimate the difference between the two proportions we
use the standard confidence intervals with a 95% level.
(ˆp1 − ˆp2) ± zα/2
ˆp1(1−ˆp1)
n1
+
ˆp2(1−ˆp2)
n2
• For the example with all observations:
(.6 − .8) ± 1.96 ×
.6(1−.6)
20
+
.8(1−.8)
15
−.2 ± .3  [−.5; .1]
• And for the revisited version with paired data:
(.533 − .8) ± 1.96 ×
.533(1−.533)
15
+
.8(1−.8)
15
−.333 ± .32  [−.66; −.01]
26 / 27
Summary
Descriptive Statistics
Probability
Distributions
Statistical Inference
One-sample estimation and testing of µ, σ2
, p
Two-sample estimation and testing of µ1 − µ2, µD,
σ2
1 /σ2
2 , p1 − p2

Contenu connexe

Tendances

Mba i qt unit-2.1_measures of variations
Mba i qt unit-2.1_measures of variationsMba i qt unit-2.1_measures of variations
Mba i qt unit-2.1_measures of variationsRai University
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributionsnszakir
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
 
Chapter 09
Chapter 09Chapter 09
Chapter 09bmcfad01
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsUniversity of Salerno
 
Inferential vs descriptive tutorial of when to use
Inferential vs descriptive tutorial of when to useInferential vs descriptive tutorial of when to use
Inferential vs descriptive tutorial of when to useKen Plummer
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation Remyagharishs
 
Point estimate for a population proportion p
Point estimate for a population proportion pPoint estimate for a population proportion p
Point estimate for a population proportion pMuel Clamor
 
Estimating population mean
Estimating population meanEstimating population mean
Estimating population meanRonaldo Cabardo
 
Confidence Interval Estimation
Confidence Interval EstimationConfidence Interval Estimation
Confidence Interval EstimationYesica Adicondro
 
Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1nurun2010
 
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
 

Tendances (20)

Descriptive statistics and graphs
Descriptive statistics and graphsDescriptive statistics and graphs
Descriptive statistics and graphs
 
Mba i qt unit-2.1_measures of variations
Mba i qt unit-2.1_measures of variationsMba i qt unit-2.1_measures of variations
Mba i qt unit-2.1_measures of variations
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
Chapter 5 and Chapter 6
Chapter 5 and Chapter 6 Chapter 5 and Chapter 6
Chapter 5 and Chapter 6
 
Sampling theory
Sampling theorySampling theory
Sampling theory
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributions
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
 
STATISTIC ESTIMATION
STATISTIC ESTIMATIONSTATISTIC ESTIMATION
STATISTIC ESTIMATION
 
Chapter 09
Chapter 09Chapter 09
Chapter 09
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
Inferential vs descriptive tutorial of when to use
Inferential vs descriptive tutorial of when to useInferential vs descriptive tutorial of when to use
Inferential vs descriptive tutorial of when to use
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation
 
Inorganic CHEMISTRY
Inorganic CHEMISTRYInorganic CHEMISTRY
Inorganic CHEMISTRY
 
Point estimate for a population proportion p
Point estimate for a population proportion pPoint estimate for a population proportion p
Point estimate for a population proportion p
 
Statistical ppt
Statistical pptStatistical ppt
Statistical ppt
 
Estimating population mean
Estimating population meanEstimating population mean
Estimating population mean
 
Confidence Interval Estimation
Confidence Interval EstimationConfidence Interval Estimation
Confidence Interval Estimation
 
Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1
 
Chapter08
Chapter08Chapter08
Chapter08
 
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
 

Similaire à Business statistics I - Excercises

Similaire à Business statistics I - Excercises (20)

Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
 
ANSWERS
ANSWERSANSWERS
ANSWERS
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
Lecture 03 Inferential Statistics 1
Lecture 03 Inferential Statistics 1Lecture 03 Inferential Statistics 1
Lecture 03 Inferential Statistics 1
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
Hypothese concerning proportion by kapil jain MNIT
Hypothese concerning proportion by kapil jain MNITHypothese concerning proportion by kapil jain MNIT
Hypothese concerning proportion by kapil jain MNIT
 
Estimation part III
Estimation part IIIEstimation part III
Estimation part III
 
Quality Control Chart
 Quality Control Chart Quality Control Chart
Quality Control Chart
 
Binomial and Poission Probablity distribution
Binomial and Poission Probablity distributionBinomial and Poission Probablity distribution
Binomial and Poission Probablity distribution
 
S3 pn
S3 pnS3 pn
S3 pn
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Process Capability for certificate course for marketing engineers online
Process Capability for certificate course for marketing engineers onlineProcess Capability for certificate course for marketing engineers online
Process Capability for certificate course for marketing engineers online
 
Week8 livelecture2010
Week8 livelecture2010Week8 livelecture2010
Week8 livelecture2010
 
227_ch7-hw-soln.pdf
227_ch7-hw-soln.pdf227_ch7-hw-soln.pdf
227_ch7-hw-soln.pdf
 
Cases of areas_under_the_normal_curve(2)
Cases of areas_under_the_normal_curve(2)Cases of areas_under_the_normal_curve(2)
Cases of areas_under_the_normal_curve(2)
 
Mech ma6452 snm_notes
Mech ma6452 snm_notesMech ma6452 snm_notes
Mech ma6452 snm_notes
 
S5 sp
S5 spS5 sp
S5 sp
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
 
Ch4_Uncertainty Analysis_1(3).pdf
Ch4_Uncertainty Analysis_1(3).pdfCh4_Uncertainty Analysis_1(3).pdf
Ch4_Uncertainty Analysis_1(3).pdf
 
BUS173 Lecture 5.pdf
BUS173 Lecture 5.pdfBUS173 Lecture 5.pdf
BUS173 Lecture 5.pdf
 

Dernier

Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 

Dernier (20)

Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 

Business statistics I - Excercises

  • 1. BUSINESS STATISTICS I Excercises — Weeks 36 – 50 Antonio Rivero Ostoic School of Business and Social Sciences  September −  December  AARHUS UNIVERSITYAU
  • 2. Example Using X A company’s vending machines consume on average 460 kwh of electricity with a standard deviation of 5 kwh • What is the probability that a vending machine in a given location consumes less than 470 kwh? P(X < 470) = P X−µ σ < 470−460 5 = P(Z < 2) = 0.9772 = 98% • ...and the probability for using more than 470 kwh? P(X > 470) = P X−µ σ > 470−460 5 = P(Z > 2) = 1 − P(Z < 2) = 1 − .9772 = 0.0228 = 2% 16 / 32
  • 3. Example II Using X • What is the probability that 3 vending machines consume less than 465 kwh? i.e. P(X 465) Since we assume that X is normally distributed, the standard error of the mean must consider the sample size σx = σ√ n = 5√ 3 = 2.89 P(X 465) = P (X−µx) σx 465−460 2.89 = P(Z 1.73) = .9582 = 96% 17 / 32
  • 4. Inference of the sample mean Example A sample distribution with n = 3 tells us that a vending machine consumes on average 470 kwh with σ = 5 kwh Then we can compute the 95% probability that the mean is located in a certain range from the sample mean Since z.025 = 1.96, then P(−1.96 Z 1.96) = .95 By adding µ and by multiplying by σ/ √ n to all terms in the probability statement we get: P(µ − 1.96 σ√ n X 1.96 σ√ n + µ) = .95 P(470 − 1.96 5√ 3 X 1.96 5√ 3 + 470) = .95 P(464.3 X 475.7) = .95 ª hence the sample mean will fall between 464.3 and 475.7 with 95% probability, which means that the computed sample mean is supported by the sample statistic 20 / 32
  • 5. Example finding ˆP Last year 30% of the schools in town have installed our vending machine cooler, and we want to see whether or not a proportion of schools will continue using our machine in the next year • If we make a random sample of 25 schools, what is the probability that more than 35% of the sample schools will choose our machine? Since we have just a success or failure we have a binomial experiment with p = .30 with n = 25 We want to find P(ˆP .35) σˆp = p(1 − p)/n = (.30)(.70)/25 = .0917 P(ˆP .35) = ˆP−p √ p(1−p)/n .35−.30 .0917 = P(Z .545) = 1 − P(Z .545) ≈ 1 − .705 = .295 = 30% 26 / 32
  • 6. Example X1 − X2 Our company’s vending machines electricity consumption is normally distributed with mean of 460 kwh and standard deviation of 5 kwh. A rival company produces vending machine coolers with normally distributed consumption of electricity with 455 kwh on average and 10 kwh as standard deviation. • What is the probability that the average of electricity consumption of our company’s machines exceed the rival machines if we take random samples of size 30 and 10 respectively? i.e. P(X1 − X2 0) with µ1 − µ2 = 460 − 455 = 5 and σx1−x2 = σ2 1 n1 + σ2 2 n2 = 52 30 + 102 10 = √ 10.833 = 3.29 P(X1 − X2 0) = P (X1−X2)−(µ1−µ2) σ2 1 n1 + σ2 2 n2 0−5 3.29 = P(Z −1.52) = 1 − P(Z −1.52) = 1 − .0643 = .9357 29 / 32
  • 7. Example finding µ with known σ Our vending machines delivers a soft drink can after few seconds the costumer press the bottom, but the competition is about to launch a new vending machine model that we suspect that is faster than our product • We need to estimate a 95% confidence interval estimate of the mean from a sample of 15 machines, and we know from technical specifications that the standard deviation from the mean in our machines is .38 seconds. Response time in seconds to deliver the product: 2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82 CI estimator for µ with known σ: x = 2.13. 95% confidence level means α = .05; zα/2 = z.025 = 1.96 x ± zα/2 σ√ n 2.13 ± 1.96 .38√ 15 ‘error’: ± 0.19 Thus LCL = 1.93 and UCL = 2.32 or else 1.93; 2.32 17 / 24
  • 8. Hypothesis Testing about Parameters Examples We have seen in the examples that: the mean of our company’s vending machines electricity consumption is 460 kwh, or H0 : µ = 460 • Is there enough evidence that the mean parameter is not equal to this value? H1 : µ = 460 • What if we want to test whether there is evidence of an increase or decrease in the average electricity consumption of the machines: H1 : µ 460 H1 : µ 460 Similar statements are made for µ less/more than or equal to a certain value 9 / 31
  • 9. Example Testing µ when σ is known We have µ = 460 as the average electricity consumption in kwh for our machines with σ = 5 • Compute the rejection region for a sample mean of 465 with random sample size 3 and a 5% significance level xL−µ σ/ √ n = zα xL−460 5/ √ 3 = 1.645 xL = 464.7 Which means that the rejection region is: x 464.7 Since the sample mean of 465 is in the rejection region we reject the null hypothesis We conclude that there is sufficient evidence that the average electricity consumption for our machines exceeds 460 khw 17 / 31
  • 10. Example Standardized test With a standardized test statistic we check that z zα z = x−µ σ/ √ n = 465−460 5/ √ 3 = 1.73 Because the value of z (= 1.73) is greater than the z-score of the chosen significance level (z.05 = 1.65), then we reject the null hypothesis ...and conclude once more that there is sufficient evidence that the average electricity consumption for our machines exceeds 460 khw ª The results of both the test statistic x and the standardized test statistic z are identical, and hence the standardized test statistic is typically used and it is just called as the test statistic 18 / 31
  • 11. Computing the p-value In order to compute the p-value the example we calculate the probability of observing a sample mean at least as large as 465 given that µ = 470 That is, p-value = P(X 465) = x−µ σ/ √ n 465−460 5/ √ 3 = P(Z 1.73) = 1 − P(Z 1.73) = 1 − .9582 = 0.0418 As a result, the probability of observing a sample mean at least as large as 465 given that µ = 470 is 4% ª which is relative small and we reject H0 in favor of the alternative hypothesis 20 / 31
  • 12. Example computing β In case that the hypothesized population mean is 470 kwh with σ = 5, and a sample size of 3, the value of β for a 5% confidence level is: β = P X−µ σ/ √ n 464.7−470 5/ √ 3 = P(Z −1.84) = 0.03 Thus when the mean is 470, the probability of incorrectly not rejecting a false H0 is 3% 26 / 31
  • 13. Example t-statistic Vending machines A random sample of 15 of our vending machines shows that the response time in seconds to deliver a soft drink is 2.125 seconds on average. Data cf. lecture week 41(43) But the competition is about to launch a new vending machine model that might deliver the product faster than ours and according to their specifications it is going take about 2 seconds. Do we have enough evidence to conclude that our vending machines are still competitive? • In this case the research hypothesis is in relation to the competition H1 : µ 2 whereas the null hypothesis is H0 : µ = 2 10 / 22
  • 14. Example t-statistic II Since we do not count with the population standard deviation, we apply the t test statistic with the usual 5% α level and check that t tα,ν ª In this case we need the value of s, which is .411 • The test statistic is computed next t = 2.125−2 .411/ √ 15 = 1.18 and we calculate the rejection region for n − 1 df as t.05,14 = 1.761 ª Because the score of the test statistic is smaller than the critical value we do not reject the null hypothesis in favor to H1 There is not sufficient statistical evidence that the response time for our machines exceeds the 2 seconds 11 / 22
  • 15. Example t-statistic revisited Vending machines In order to be ‘competitive’ our machines should be faster than the machines of the competition, and the research hypothesis can be formulated as H1 : µ 1.9 In this case the outcome of the test statistic becomes t = 2.125−1.9 .411/ √ 15 = 2.12 ª Now the test statistic score is greater than the critical value at .05 α level, which means that we reject the null hypothesis in favor of the alternative As a result there is enough statistical evidence that the response time for our machines exceeds the 1.9 seconds 15 / 22
  • 16. Example t-statistic revisited II Vending machines For a t score of 2.12 the p-value is calculated as follows 1.761 2.12 2.145 which means that .05 P(t 2.12) .025 Thus the p-value lies between 2.5% and 5% and it is under the significance level, which means that H0 is rejected in favor of the alternative 16 / 22
  • 17. Example χ2 statistic Vending machines We have seen previously from a random sample n = 15 that the response time of our vending machines to deliver the product is on average 2.125 seconds. From technical specifications the standard deviation parameter is .38 seconds. cf. lecture week 41(43) • Thus x = 2.125, whereas σ2 = (σ)2 = .382 = .1444 Is there sufficient statistical evidence from the sample data to claim that the variability in time response is not larger such theoretical value? • Now the research hypothesis is related to the population variance H1 : σ2 .1444 and the null hypothesis is (or should be) H0 : σ2 = .1444 (H0 : σ2 .1444) 12 / 21
  • 18. Example χ2 statistic II test statistic For the variance parameter we apply the χ2 statistic with the standard significance level of 5% χ2 = (n−1)s2 σ2 0 ª And the computation requires the sample variance s2 = (s)2 = .4112 = .169 So the test statistic is χ2 = 14×.169 .1444 = 16.39 With the lower-tail critical value of χ2 .95,14 = 6.57 • Because the test statistic (16.39) is larger than the critical value (6.57) we are unable to reject the null hypothesis There is insufficient evidence to claim that to the variability in time response is not greater than the value given in the technical specifications 13 / 21
  • 19. Example χ2 statistic III upper test However, most of the times we want to test whether or not the lack of precision in a certain process or product exceeds a particular value specified in the standards. In such case the claim we want to test is whether or not the standard deviation of the vending machines do not exceed .38 seconds, which means that the research hypothesis becomes H1 : σ2 .1444 • Here the comparison of the test statistic with the critical value is χ2 χ2 α,ν • And since the test statistic value 16.39 is smaller than χ2 .05, 14 = 23.7, then we are unable to reject the null hypothesis We do not have sufficient evidence either to claim that the time response is greater than the value given in the specifications 14 / 21
  • 20. Example testing proportions Vending machines • The proportion of defective vending machines in our sample are the units having a response time greater than the upper 95% confidence level ª The value of the UCL was calculated to 2.32 seconds -cf. lecture week 41 (43)- and the defective units in the sampling are then emphasized 2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82 This means that, out of 15 machines, 5 have shown to be defective. Hence ˆp = 5 15 = 1 3 or equally 0.333 8 / 26
  • 21. Example testing proportions p0 • The specifications for the vending machines however establish that a defective machine is a unit having a response time greater than 2.50 The defective machines in the sample according to the prescribed limits are emphasized in bold 2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82 And the proportion in this case is 3 15 = 0.2, which will represent the parameter ª but be aware that the proportion parameter can come from another source 9 / 26
  • 22. Example testing proportions hypothesis formulation The hypotheses are formulated in order to answer a question like: • Should we invest in a new product delivery system in the vending machines to fulfill the technical specifications? Hence we want to see whether there is enough statistical evidence to claim that the proportion of defects units found in the sample data is larger than the population proportion. And in this case we perform a one-sided test where the null hypothesis is that p = p0, whereas the alternative hypothesis is that p p0 10 / 26
  • 23. Example testing proportions test statistic The test statistic takes the sample proportion as the departure from the prescribed limit and the standard error of the proportion (e.g.) z = ˆP−p0 p0(1−p0) n = 0.333−0.2 0.2(0.8) 15 = 1.41 ª This outcome has to be confronted to the critical value z.95 = 1.645 from the cumulative Z probabilities table • Since the value of this test statistic 1.41 is less than the critical value (1.645) we are not able to reject H0 There is not sufficient evidence at the 5% significance level to claim that the sample proportion of defect machines exceeds the population proportion and hence no new product delivery system is indeed needed. 11 / 26
  • 24. Example CI for Proportions Vending machines • We wish to find the proportion of defective vending machines in our sample that are the units having a response time greater than the upper 95% confidence level. ª Since the UCL equals to 2.32 seconds -cf. lecture week 41 (43)- the defective units in the sampling are emphasized 2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82 This means that, out of 15 machines, 5 have shown to be defective. • As a result, ˆp = 5/15 = 0.333, and the maximum error of the estimate is 1.96 × 0.333(1 − 0.333)/15 = 0.239 0.333 ± 0.239 LCL: 0.094; UCL: 0.572 14 / 26
  • 25. Example Sample size method 2 • The sample size for proportions within 3% error margin with 95% confidence level is n = 1.96· √ ˆp(1−ˆp) .03 ª In the case of the vending machines the interval estimation for such confidential level is ˆp = 0.333, which means that n = 1.96× √ 0.333(0.666) .03 2 = (30.79)2 948 Thus we need a sample size of at least 948 units to get a 3% error estimate with the desired confidence level 19 / 26
  • 26. Example F-Test The response time in seconds for the vending machines has been measured in a random sample of size 15 with the following statistics: mean = 2.125, SD = 0.411. While a random sample of 20 machines from the competition presumably gives a mean of 2.089 and SD of .5 in the sample distribution • The question is whether or not our machines are more accurate than the machines from the competition, i.e. have smaller variability? ª The parameter of interest is the ratio of two variances and we perform a F-test of equality of variances. Since the SD (and hence the variance) is smaller in our machines than the competition, then σ2 2 corresponds to our machines and σ2 1 to the competition 18 / 30
  • 27. Stem-and-Leaf display and Box Plot of the data OWN COMPETITION 0 | 9 1 | 4 1 | 4 1 | 5799 1 | 56899 2 | 0122344 2 | 00112244 2 | 578 2 | 55889 1 1.5 2 2.5 3 comp own Response time (seconds) Gropus 19 / 30
  • 28. Example F-Test test statistic • The research hypothesis is H1 : σ2 1/σ2 2 1 or whether the true ratio of the variances is greater than one • The test statistic is computed next F = s2 1/s2 2 = 0.5 0.411 2 = 1.48 • And for an upper one-sided test with a standard significance level the critical value is F.05, 19,14 = 2.4 ª Because the test statistic (1.48) is not greater than 2.4 then we fail to reject the null hypothesis There is not enough statistical evidence at a 5% significance level to claim that our machines are more accurate than the machines from the competition 20 / 30
  • 29. Example µ1 − µ2 vending machines In the F-test performed before we were unable to reject the null hypothesis that the ratio of the two variances equals 1. • This implies that to test the difference between two means in our machines and in the competition we assume equal variances in the populations ª Recall that the sample statistics are s1 = .5, x1 = 2.089, n1 = 20; s2 = .411, x2 = 2.125, n2 = 15 • Now the question is whether the mean µ2 for our machines is greater than the mean µ1 from the competition? This is a one-sided (lower-tail) test for the difference between two means where H1 : (µ1 − µ2) 0. 26 / 30
  • 30. Example test µ1 − µ2 ª For equal variances we compute first the pooled variance estimator s2 p = (20−1)(.52 ) + (15−1)(.4112 ) 20 + 15 − 2 = 7.13 33 = .216 • And the test statistic is calculated using such estimate t = (2.089−2.125) − 0 .216× 1 20 + 1 15 = −.227 ª To determine the critical value we compute the number of degrees of freedom ν = n1 + n2 − 2 = 33 • The rejection region at the standard alpha value is t t1−α,ν = t1−.05,33 = −t.05,33 ≈ −t.05,35 = −1.69 • Because the t score is greater than the critical value we are unable to reject H0. There is not sufficient evidence at 5% α level to claim that the average response time in our machines differ from the competition. 27 / 30
  • 31. Example estimation µ1 − µ2 The estimation of the difference between two means with equal variance parameters is based on the two-sided 95% confidence intervals (x1 − x2) ± tα/2, ν · s2 p 1 n1 + 1 n2 (2.089 − 2.125) ± 2.03 · .216 × 1 20 + 1 15 −.036 ± .322 (µ1 − µ2) ∈ [−0.36; 0.29] As the 0 is within the interval, we cannot conclude that there is a significant difference between the mean response time in our machines and the competition. 28 / 30
  • 32. p-value Despite we did not rejected the null hypotheses in the example; we compute a range of the p-values from the test statistics. • For µ1 − µ2, we find from the t table for ν ≈ 35 that the closest critical value to the t score is 1.306 −.227 which means that .10 P(t −.227) and hence we can reasonably say that p-value 10% • For σ2 1/σ2 2 the F score is 1.48 that is smaller than (≈) F.100,20,15 1.92 1.48 So the p-value in this case is more than 10% as well because .100 P(F 1.48) (or 5% with F.050,20,15 = 2.33 1.48). 29 / 30
  • 33. Example Test of µD • Recall that the response time in our vending machines was measured in a random sample with n = 15, and we obtained the response time from the competition from a random sample with n = 20. Suppose that the first 15 measurements from the competitors sample match the observations in our sample, and this is because e.g. the machines use the same version of the main control board. ª The paired data with CPU v., and the difference between samples are 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82 1.55 2.09 2.79 1.43 1.96 2.07 2.35 1.88 2.75 1.93 2.21 2.49 1.80 1.48 2.89 0.82 -0.14 -1.30 0.62 0.31 0.46 -0.48 0.54 -1.32 0.31 0.48 -0.33 0.08 0.23 -0.07 ª The sample statistics for our machines are x2 = 2.125, s2 = .411; and for the paired competitors are x1 = 2.111, s1 = .468 • Due the experiment design the parameter of interest in this case is µD or the mean of the population differences for dependent samples. 11 / 27
  • 34. Example Test of µD II • The question is whether there is a difference between the true mean for our machines and the mean from the competition? Because the sample mean in our machines is still higher than the competition the alternative hypothesis is formulated as H1 : µD 0 • Next we compute the statistics of the paired differences xD = .014, and sD = .646, and these outcomes are used in the t test statistic t = 0.014 − 0 0.646/ √ 15 = .084 • The rejection region at the standard α value is calculated t t1−α, ν = t1−.05,14 = −t.05,14 = −1.76 • Because the t score (.084) is greater than the critical value we are unable to reject the null hypothesis. There is not sufficient evidence at 5% α level to infer that there is a difference between the response time in our machines and from the competition using the same version of the control board. 12 / 27
  • 35. Example Estimation of µD • The estimation of the mean difference for paired data is based on a two-sided standard confidence level xD ± tα/2, ν sD √ nD 0.014 ± 2.145 × 0.646 √ 15 ª Hence the maximum error estimate is .358 LCL = −.34 and UCL = .37 We cannot conclude either that there is a significant difference between the mean response time in our machines and the competition using the same version of the control board. 13 / 27
  • 36. Example test of p1 − p2 vending machines By testing the difference between two population proportions we can compare e.g. the introduction years’ market for vending machines in a given town belonging to either our company or to the competition. Assuming that 12 machines in the samples are placed in this market; the question is whether or not there enough evidence to conclude that our machines are more popular than the competition in this town? • To be consistent with the coding of the populations the research hypothesis is that (p1 − p2) 0, and hence the null hypothesis adopts equal proportions in the testing. • For the test statistics we require both sample proportions and also the pooled estimate ª ˆp1 = 12 20 = .6 ˆp2 = 12 15 = .8 and ˆp = 12 + 12 20 + 15 = .686 23 / 27
  • 37. Example test of p1 − p2 test statistic • The test statistic is computed next z = (.6 − .8) (.686)(1−.686)× 1 20 + 1 15 = −1.26 ª And the standard rejection region for a cumulated Z statistic is z zα = z.05 = −1.645 • Since the outcome of the z test statistic is greater than the critical value we do not reject H0 At 5% significance level there is not sufficient evidence to conclude that our vending machines were more popular than the competition in town during the introductory year. Here P(Z z) = .1038 constitutes the associated p-value. 24 / 27
  • 38. Example test of p1 − p2 Revisited • However if we consider only the paired sample data and only 7 machines from the competition were placed on town during the introductory year, then for the same hypothesis testing design the test statistics become z = (.533 − .8) (.667)(1−.667)× 1 15 + 1 15 = −1.89 • Which means that zobs zα and hence we reject the null hypothesis in favor to the alternative. There is enough statistical evidence at the standard significance level to conclude that our machines were more popular in town during the introductory year than the competition if we consider that they were using the same main control board (cf. test of µD) The p-value then must be computed P(Z z) = .0294, and since it is less than α, it means that the result is statistical significant. It is also possible to test proportion parameters with an hypothesized difference value different than zero, so e.g. H0 : (p1 − p2) = .05 25 / 27
  • 39. Example estimation of p1 − p2 To estimate the difference between the two proportions we use the standard confidence intervals with a 95% level. (ˆp1 − ˆp2) ± zα/2 ˆp1(1−ˆp1) n1 + ˆp2(1−ˆp2) n2 • For the example with all observations: (.6 − .8) ± 1.96 × .6(1−.6) 20 + .8(1−.8) 15 −.2 ± .3 [−.5; .1] • And for the revisited version with paired data: (.533 − .8) ± 1.96 × .533(1−.533) 15 + .8(1−.8) 15 −.333 ± .32 [−.66; −.01] 26 / 27
  • 40. Summary Descriptive Statistics Probability Distributions Statistical Inference One-sample estimation and testing of µ, σ2 , p Two-sample estimation and testing of µ1 − µ2, µD, σ2 1 /σ2 2 , p1 − p2