Poisson statistics

Measures of Distribution Shape,
Relative Location, and Detecting Outliers

Distribution Shape
z-Scores
Empirical Rule
Detecting Outliers

2

Distribution Shape: Skewness

An important measure of the shape of a distribution
is called skewness.
The formula for the skewness of sample data is
3
n  xi − x 
Skewness = ∑ s 
(n − 1)( n − 2)  
Skewness can be easily computed using statistical
software.

3

Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.
.35
Skewness = 0
.30
Relative Frequency

.25
.20
.15
.10
.05
0

4

Moderately Skewed Left
• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = − .31
.30
Relative Frequency

.25
.20
.15
.10
.05
0

5

Moderately Skewed Right
• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
.30
Relative Frequency

.25
.20
.15
.10
.05
0

6

Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency

.25
.20
.15
.10
.05
0

7


Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

8


Example: Apartment Rents

.35 Skewness = .92
.30
Relative Frequency

.25

.20

.15

.10
.05
0

9

z-Scores

The z-score is often called the standardized value.
The z-score is often called the standardized value.

It denotes the number of standard deviations a data
It denotes the number of standard deviations a data
value xii is from the mean.
value x is from the mean.

xi − x
zi =
s

Excel’s STANDARDIZE function can be used to
Excel’s STANDARDIZE function can be used to
compute the z-score.
compute the z-score.

10

z-Scores

 An observation’s z-score is a measure of the relative
location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.

11

z-Scores

 Example: Apartment Rents
• z-Score of Smallest Value (425)
xi − x 425 − 490.80
z= = = − 1.20
s 54.74

Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

12

Empirical Rule

When the data are believed to approximate a
bell-shaped distribution with moderate skew …

The empirical rule can be used to determine the
The empirical rule can be used to determine the
percentage of data values that must be within a
percentage of data values that must be within a
specified number of standard deviations of the
specified number of standard deviations of the
mean.
mean.

The empirical rule is based on the normal
The empirical rule is based on the normal
distribution, which we will discuss later.
distribution, which we will discuss later.

13

Empirical Rule

For data having a bell-shaped distribution, approximately
68.26% of the values are within
of the values are within
+/- 1 standard deviation of its mean.
of its mean.

95.44% values are within
of the
+/- 2 standard deviations of its mean.
of its mean.

99.72% values are within
of the
+/- 3 standard deviations its mean.
of its mean.
of

14

Empirical Rule

99.72%
95.44%
68.26%

µ
x
µ – 3σ µ – 1σ µ + 1σ µ + 3σ
µ – 2σ µ + 2σ

15

Detecting Outliers

 An outlier is an unusually small or unusually large
value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.

 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a data value that has occurred by chance

16

Detecting Outliers

• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

17

Exploratory Data Analysis

Exploratory data analysis is looking at methods
Exploratory data analysis is looking at methods
to summarize data.
to summarize data.

For now we simply sort the data values into ascending
For now we simply sort the data values into ascending
order and identify the five-number summary and then
order and identify the five-number summary and then
construct a box plot..
construct a box plot

18

Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value

19

Five-Number Summary
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

20

Box Plot

A box plot is a graphical summary of data that is
A box plot is a graphical summary of data that is
based on a five-number summary.
based on a five-number summary.

A key to the development of a box plot is the
A key to the development of a box plot is the
computation of the median and the quartiles Q11 and
computation of the median and the quartiles Q and
Q33..
Q

Box plots provide another way to identify outliers.
Box plots provide another way to identify outliers.

They also tell us whether the data are skewed.
They also tell us whether the data are skewed.

21

Box Plot

• A box is drawn with its ends located at the first and
third quartiles (Q1 & Q3).
• A vertical line is drawn in the box at the location of
the median (second quartile).

400 425 450 475 500 525 550 575 600 625

Q1 = 445 Q3 = 525
Q2 = 475

22

Box Plot

 Limits are located (not drawn) using the interquartile
range (IQR = Q3-Q1): they are 1.5IQR below Q1 and
1.5 IQR above Q3.

 Data outside these limits are considered outliers.

 The locations of each outlier is shown with the
symbol * .

23

Box Plot

• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or
greater than 645) in the apartment rent data.

24

Box Plot

• Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.

400 425 450 475 500 525 550 575 600 625

Smallest value Largest value
inside limits = 425 inside limits = 615

25

Box Plot

An excellent graphical technique for making
comparisons among two or more groups.

26

Measures of Association
Between Two Variables
Thus far we have examined numerical methods used
Thus far we have examined numerical methods used
to summarize the data for one variable at a time.
to summarize the data for one variable at a time.

Often a manager or decision maker is interested in
Often a manager or decision maker is interested in
the relationship between two variables..
the relationship between two variables

Two descriptive measures of the relationship
Two descriptive measures of the relationship
between two variables are covariance and correlation
between two variables are covariance and correlation
coefficient..
coefficient

27

Covariance

The covariance is a measure of the linear association
The covariance is a measure of the linear association
between two variables.
between two variables.

Positive values indicate a positive relationship.
Positive values indicate a positive relationship.

Negative values indicate a negative relationship.
Negative values indicate a negative relationship.

28

Covariance

The covariance is computed as follows:
The covariance is computed as follows:

(x1 − µ x )(y1 − µ y ) + L + (x N − µ x )(y N − µ y )
σ xy = for
N populations

(x1 − x)(y1 − y) + L + (x n − x)(y n − y)
s xy = for
n−1 samples

29

Correlation Coefficient

Correlation is a measure of linear association.
Correlation is a measure of linear association.

There are also other types of associations not captured
There are also other types of associations not captured
by correlation.
by correlation.

30


The correlation coefficient is computed as follows:
The correlation coefficient is computed as follows:
sxy σ xy
rxy = ρ xy =
sx s y σ xσ y

for for
samples populations

(x1 − µ x ) + L + (x N − µ x )
2 2 (y1 − µ y )2 + L + (y N − µ y )2
σ =
2
σ2=
x N y N
(x1 − x)2 + L + (x n − x) 2 (y1 − y) + L + (y n − y)
2 2
s2 =
x s =
2

n−1 y
n−1
31


The coefficient can take on values between -1 and +1.
The coefficient can take on values between -1 and +1.

Values near -1 indicate a strong negative linear
Values near -1 indicate a strong negative linear
relationship..
relationship

Values near +1 indicate a strong positive linear
Values near +1 indicate a strong positive linear
relationship..
relationship

The closer the correlation is to zero, the weaker the
The closer the correlation is to zero, the weaker the
relationship.
relationship.

32

Correlation
A Positive Relationship: correlation close to 1
y

x

33

Correlation
A Negative Relationship: correlation close to -1
y

x

34

Correlation

No Apparent Relationship: Correlation near 0
y

x

35

Covariance and Correlation Coefficient

 Example: Golfing Study
A golfer is interested in investigating the
relationship, if any, between driving distance and
18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69

36



x y ( xi − x ) ( yi − y ) ( xi − x )( yi − y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944

37


• Sample Covariance
sxy =
∑ (x − x )(y − y ) = − 35.40 =
i i
− 7.08
n− 1 6−1
• Sample Correlation Coefficient
sxy −7.08
rxy = = = -.9631
sx sy (8.2192)(.8944)

So, increasing driving distance decreases score, and the relation
is really strong.

38

Random Variables

A random variable is a numerical description of the
A random variable is a numerical description of the
outcome of an experiment.
outcome of an experiment.

A discrete random variable may assume either a
A discrete random variable may assume either a
finite number of values or an infinite sequence of
finite number of values or an infinite sequence of
values.
values.

A continuous random variable may assume any
A continuous random variable may assume any
numerical value in an interval or collection of
numerical value in an interval or collection of
intervals.
intervals.

39

Random Variables

Question Random Variable x Type
Family x = Number of dependents Discrete
size reported on tax return

Distance from x = Distance in miles from Continuous
home to store home to the store site
Own dog x = 1 if own no pet; Discrete
or cat = 2 if own dog(s) only;
= 3 if own cat(s) only;
= 4 if own dog(s) and cat(s)

40

Discrete Probability Distributions

The probability distribution for a random variable
The probability distribution for a random variable
describes how probabilities are distributed over
describes how probabilities are distributed over
the values of the random variable.
the values of the random variable.

41

Discrete Probability Distributions

The probability distribution is defined by a
The probability distribution is defined by a
probability function,, denoted by ff((x), which provides
probability function denoted by x), which provides
the probability for each value of the random variable.
the probability for each value of the random variable.

The required conditions for a discrete probability
The required conditions for a discrete probability
function are:
function are:
>0 (probabilities are not negative)

(x) = 1 (sum of all probabilities =1)

Remember that any probability is a number between 0 and 1.

42

Expected Value

The expected value,, or mean, of a random variable
The expected value or mean, of a random variable
is a measure of its central location.
is a measure of its central location.
E(x) = µ = Σxf(x)

The expected value does not have to be a value the
The expected value does not have to be a value the
random variable can assume.
random variable can assume.

43

Variance and Standard Deviation

The variance summarizes the variability in the
The variance summarizes the variability in the
values of a random variable.
values of a random variable.
Var(x) = σ 2 = Σ(x - µ)2f(x)

The standard deviation,, σ,, is defined as the positive
The standard deviation σ is defined as the positive
square root of the variance.
square root of the variance.

44

Binomial Probability Distribution

Four Properties of a Binomial Experiment
1. The experiment consists of a sequence of n
identical trials.

2. Two outcomes, success and failure, are possible
on each trial.

3. The probability of a success, denoted by p, does
not change from trial to trial.

4. The trials are independent.

45


Our interest is in the number of successes
occurring in the n trials.

46


Binomial Probability Function

n x
f ( x ) =   p (1 − p )( n − x )
x
where:
x = the number of successes
p = the probability of a success on one trial
n = the number of trials
f(x) = the probability of x successes in n trials

n n!
=
( 1 × 2 × 3L × n )
 ÷=
 x  (n − x )! x ! ( 1 × 2 × 3L × (n − x ) ) ( 1 × 2 × 3L × x )
= `n choose x’ = number of ways x people can be chosen out of n
47


Binomial Probability Function
 n x
f ( x ) =   p (1 − p)( n − x )
x

Probability of a particular
Number of experimental
sequence of trial outcomes
outcomes providing exactly
with x successes in n trials
x successes in n trials

These values are available in Table 5 of our textbook.

48


Example: IIT Entrance
It is known that about 10% of the examinees taking
the IIT entrance qualify.
Thus, for any examinee chosen at random, there is a
probability of 0.1 that the person will qualify.

Choosing 3 examinees at random, what is
the probability that exactly 1 of them will qualify?

49


Example: IIT Entrance
Using the
p = .10, n = 3, x = 1 probability
function
n!
f ( x) = p x (1 − p ) (n − x )
x !( n − x )!
3!
f (1) = (0.1)1 (0.9)2 = 3(.1)(.81) = .243
1!(3 − 1)!

You can just check the binomial probability table in textbook for
n= 3, p = 0.1, x = 1.
Just f(1) if
Or, in Excel, use ‘=BINOMDIST(1,3,0.1,FALSE)’ FALSE,
f(0)+f(1) if
TRUE 50

Expected Value

E(x) = µ = np

Variance

Var(x) = σ 2 = np(1 − p)

Standard Deviation

σ = np(1 − p )

51

Example: Evans Electronics

• Expected Value
E(x) = np = 3(.1) = .3 employees out of 3

• Variance
Var(x) = np(1 – p) = 3(.1)(.9) = .27

• Standard Deviation

σ = 3(.1)(.9) = .52 employees

52

Poisson Probability Distribution

A Poisson distributed random variable is often
A Poisson distributed random variable is often
useful in estimating the number of occurrences
useful in estimating the number of occurrences
over a specified interval of time or space
over a specified interval of time or space

It is a discrete random variable that may assume
It is a discrete random variable that may assume
an infinite sequence of values (x = 0, 1, 2, .. .. .. ).
an infinite sequence of values (x = 0, 1, 2, ).

53


Examples of a Poisson distributed random variable:
Examples of a Poisson distributed random variable:

the number of defects in 14 pages of a book
the number of defects in 14 pages of a book

the number of customers arriving at the post
the number of customers arriving at the post
office in one hour
office in one hour

Bell Labs used the Poisson distribution to model the
Bell Labs used the Poisson distribution to model the
arrival of phone calls.
arrival of phone calls.

54


Two Properties of a Poisson Experiment

1. The probability of an occurrence is the same
1. The probability of an occurrence is the same
for any two time intervals of equal length.
for any two time intervals of equal length.

2. The occurrence or nonoccurrence in any time
2. The occurrence or nonoccurrence in any time
interval is independent of the occurrence or
interval is independent of the occurrence or
nonoccurrence in any other time interval.
nonoccurrence in any other time interval.

55


Poisson Probability Function

µ xe−µ
f ( x) =
x!
where:
x = the number of occurrences in an interval
f(x) = the probability of x occurrences in an interval
µ = mean number of occurrences in an interval
e = 2.71828


56


Poisson Probability Function

Since there is no stated upper limit for the number
of occurrences, the probability function f(x) is
applicable for values x = 0, 1, 2, … without limit.

In practical applications, x will eventually become
large enough so that f(x) is very small and negligible.

57


Example: Mercy Hospital
Patients arrive at the emergency room of Mercy
Hospital at the average rate of 6 per hour on
weekend evenings.
What is the probability of 4 arrivals in 30 minutes
on a weekend evening?

58

Using the
probability
function
µ = 6/hour = 3/half-hour, x = 4
3 4 (2.71828)−3
f (4) = = .16801
4!

Or, simply check the table for Poisson probabilities in the book
for μ = 3, x = 4.

Just f(4) if FALSE,
In Excel, use ‘=POISSON(4,3,FALSE)’ f(0)+f(1)+...+f(4) if
TRUE
59



Poisson Probabilities
0.25

0.20
Probability

0.15
actually, the
sequence
0.10 continues:
11, 12, …
0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Number of Arrivals in 30 Minutes

60


A property of the Poisson distribution is that
A property of the Poisson distribution is that
the mean and variance are equal.
the mean and variance are equal.
µ=σ2

61

Continuous Probability Distributions

A continuous random variable can assume any value
in an interval on the real line or in a collection of
intervals.
It is not possible to talk about the probability of the
random variable assuming a particular value.
Instead, we talk about the probability of the random
variable assuming a value within a given interval.

62

We denote the ‘density function’ by f(x). Also

f ( x ) ≥ 0; ∫ f ( x )dx = 1

E( X ) = ∫ xf ( x )dx
Var ( X ) = ∫ ( x − E( X ) ) f ( x )dx
2

63

Area as a Measure of Probability

The area under the graph of f(x) and probability are
identical.
This is valid for all continuous random variables.
The probability that x takes on a value between some
lower value x1 and some higher value x2 can be found
by computing the area under the graph of f(x) over
the interval from x1 to x2.

64

Normal Probability Distribution

The normal probability distribution is the most
important distribution for describing a continuous
random variable.

It is used in a wide variety of applications
including:

• Heights of people • Test scores
• Rainfall amounts • Scientific measurements

For a large number of similar variables that are
unrelated, sum and average are approximately normal.

65


Normal Probability Density Function

1 − ( x − µ )2 /2σ 2
f (x) = e
σ 2π

where:
µ = mean
σ = standard deviation
π = 3.14159
e = 2.71828

66


Characteristics

The distribution is symmetric; its skewness
measure is zero.

x

67


Characteristics

The highest point on the normal curve is at the
mean, the middle point.

x

68


Characteristics

The mean can be any numerical value: negative,
zero, or positive.

x
-10 0 25

69


Characteristics

The standard deviation determines the width of the
curve: larger values result in wider, flatter curves.

σ = 15

σ = 25

x

70


Characteristics

Probabilities for the normal random variable are
given by areas under the curve. The total area
under the curve is 1 (.5 to the left of the mean and
.5 to the right).

.5 .5
x

71


Characteristics (basis for the empirical rule)

68.26% of values of a normal random variable
68.26%
are within +/- 1 standard deviation of its mean.
+/- 1 standard deviation

95.44%
are within +/- 2 standard deviations of its mean.
+/- 2 standard deviations

99.72%
are within +/- 3 standard deviations of its mean.
+/- 3 standard deviations

72


Characteristics (basis for the empirical rule)
99.72%
95.44%
68.26%

µ
x
µ – 3σ µ – 1σ µ + 1σ µ + 3σ
µ – 2σ µ + 2σ

73

Standard Normal Probability Distribution

Characteristics

A random variable having a normal distribution
A random variable having a normal distribution
with a mean of 0 and a standard deviation of 1 is
with a mean of 0 and a standard deviation of 1 is
said to have a standard normal probability
said to have a standard normal probability
distribution..
distribution

74


Characteristics

The letter z is used to designate the standard
normal random variable.

σ=1

z
0

75


Converting to the Standard Normal Distribution

x−µ
z=
σ

76

Example: Demand
The daily demand of the new ipad in a store seems
to follow a normal distribution with an average of
15 and a standard deviation of 6.

The manager, who does not want to keep more than
20 ipads in his store at a time, would like to know
the probability of a stockout, i.e. that the demand in
a day will exceed 20.

P(x > 20) = ?

77

Solving for the Stockout Probability

Step 1: Convert x to the standard normal distribution.

z = (x - µ)/σ
= (20 - 15)/6
= .83

78

Step 2: Find the area under the standard normal
curve to the left of z = .83.

Cumulative Probability Table for Standard Normal Distribution

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .

P(z < .83)


79

Just f(0.83) if
FALSE, area
upto 0.83 if
In Excel, use ‘=NORMDIST(0.83,0,1,TRUE)’ TRUE

In fact, you can straightaway use ‘=NORMDIST(20,15,6,TRUE)’

P(X ≤ 20) with
μ = 15, σ = 6

80



Step 3: Compute the area under the standard normal
Step 3: Compute the area under the standard normal
curve to the right of z = .83.
curve to the right of z = .83.

P(z > .83) = 1 – P(z < .83)
= 1- .7967
= .2033

Probability
of a stockout P(x > 20)

81



Area = 1 - .7967
Area = .7967
= .2033

z
0 .83

82


If the manager of wants the probability of a stockout
during replenishment lead-time to be no more than .
05, what should the reorder point be?
---------------------------------------------------------------
(Hint: Given a probability, we can use the standard
normal table in an inverse fashion to find the
corresponding z value. Give it a try.)

83

Poisson statistics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Poisson statistics

Similaire à Poisson statistics (20)

Poisson statistics