SlideShare une entreprise Scribd logo
1  sur  84
Measures of Distribution Shape,
Relative Location, and Detecting Outliers

Distribution Shape
  z-Scores
  Empirical Rule
  Detecting Outliers




                                            2
Distribution Shape: Skewness

  An important measure of the shape of a distribution
  is called skewness.
  The formula for the skewness of sample data is
                                                 3
                           n         xi − x 
         Skewness =                ∑ s 
                    (n − 1)( n − 2)         
  Skewness can be easily computed using statistical
  software.




                                                        3
Distribution Shape: Skewness
  Symmetric (not skewed)
  • Skewness is zero.
  • Mean and median are equal.
                           .35
                                 Skewness = 0
                           .30
      Relative Frequency




                           .25
                           .20
                           .15
                           .10
                           .05
                            0



                                                4
Distribution Shape: Skewness
 Moderately Skewed Left
  • Skewness is negative.
  • Mean will usually be less than the median.
                           .35
                                 Skewness = − .31
                           .30
      Relative Frequency




                           .25
                           .20
                           .15
                           .10
                           .05
                            0



                                                    5
Distribution Shape: Skewness
 Moderately Skewed Right
  • Skewness is positive.
  • Mean will usually be more than the median.
                           .35
                                 Skewness = .31
                           .30
      Relative Frequency




                           .25
                           .20
                           .15
                           .10
                           .05
                            0



                                                  6
Distribution Shape: Skewness
          Highly Skewed Right
                 • Skewness is positive (often above 1.0).
                 • Mean will usually be more than the median.
                      .35
                                 Skewness = 1.25
                      .30
 Relative Frequency




                      .25
                      .20
                      .15
                      .10
                      .05
                       0



                                                                7
Distribution Shape: Skewness

  Example: Apartment Rents
  Seventy efficiency apartments were randomly
  sampled in a college town. The monthly rent prices
  for the apartments are listed below in ascending order.
 425   430   430   435   435   435   435   435   440   440
 440   440   440   445   445   445   445   445   450   450
 450   450   450   450   450   460   460   460   465   465
 465   470   470   472   475   475   475   480   480   480
 480   485   490   490   490   500   500   500   500   510
 510   515   525   525   525   535   549   550   570   570
 575   575   580   590   600   600   600   600   615   615




                                                             8
Distribution Shape: Skewness

      Example: Apartment Rents

                      .35   Skewness = .92
                      .30
 Relative Frequency




                      .25

                      .20

                      .15

                      .10
                      .05
                       0



                                             9
z-Scores

 The z-score is often called the standardized value.
 The z-score is often called the standardized value.

 It denotes the number of standard deviations a data
 It denotes the number of standard deviations a data
 value xii is from the mean.
 value x is from the mean.

                          xi − x
                     zi =
                             s

 Excel’s STANDARDIZE function can be used to
 Excel’s STANDARDIZE function can be used to
 compute the z-score.
 compute the z-score.



                                                       10
z-Scores

 An observation’s z-score is a measure of the relative
  location of the observation in a data set.
 A data value less than the sample mean will have a
  z-score less than zero.
 A data value greater than the sample mean will have
  a z-score greater than zero.
 A data value equal to the sample mean will have a
  z-score of zero.




                                                          11
z-Scores

 Example: Apartment Rents
  • z-Score of Smallest Value (425)
                    xi − x 425 − 490.80
                 z=       =             = − 1.20
                       s      54.74

        Standardized Values for Apartment Rents
-1.20   -1.11   -1.11   -1.02   -1.02   -1.02   -1.02   -1.02   -0.93   -0.93
-0.93   -0.93   -0.93   -0.84   -0.84   -0.84   -0.84   -0.84   -0.75   -0.75
-0.75   -0.75   -0.75   -0.75   -0.75   -0.56   -0.56   -0.56   -0.47   -0.47
-0.47   -0.38   -0.38   -0.34   -0.29   -0.29   -0.29   -0.20   -0.20   -0.20
-0.20   -0.11   -0.01   -0.01   -0.01   0.17     0.17   0.17     0.17    0.35
0.35     0.44    0.62    0.62    0.62   0.81     1.06   1.08     1.45    1.45
1.54     1.54    1.63    1.81    1.99   1.99     1.99   1.99     2.27    2.27


                                                                                12
Empirical Rule

  When the data are believed to approximate a
  bell-shaped distribution with moderate skew …

   The empirical rule can be used to determine the
   The empirical rule can be used to determine the
   percentage of data values that must be within a
   percentage of data values that must be within a
   specified number of standard deviations of the
   specified number of standard deviations of the
   mean.
   mean.

   The empirical rule is based on the normal
   The empirical rule is based on the normal
   distribution, which we will discuss later.
   distribution, which we will discuss later.




                                                     13
Empirical Rule

For data having a bell-shaped distribution, approximately
     68.26%           of the values are within
                      of the values are within
      +/- 1 standard deviation    of its mean.
                                   of its mean.

     95.44% values are within
         of the values are within
         of the
     +/- 2 standard deviations of its mean.
                                  of its mean.

     99.72% values are within
         of the values are within
         of the
     +/- 3 standard deviations its mean.
                             of its mean.
                             of



                                                            14
Empirical Rule

                        99.72%
                        95.44%
                        68.26%




                            µ
                                                     x
      µ – 3σ       µ – 1σ       µ + 1σ      µ + 3σ
            µ – 2σ                    µ + 2σ


                                                         15
Detecting Outliers

 An outlier is an unusually small or unusually large
  value in a data set.
 A data value with a z-score less than -3 or greater
  than +3 might be considered an outlier.

 It might be:
  • an incorrectly recorded data value
  • a data value that was incorrectly included in the
     data set
  • a data value that has occurred by chance



                                                        16
Detecting Outliers

 Example: Apartment Rents
 • The most extreme z-scores are -1.20 and 2.27
  • Using |z| > 3 as the criterion for an outlier, there
    are no outliers in this data set.

         Standardized Values for Apartment Rents
-1.20   -1.11   -1.11   -1.02   -1.02   -1.02   -1.02   -1.02   -0.93   -0.93
-0.93   -0.93   -0.93   -0.84   -0.84   -0.84   -0.84   -0.84   -0.75   -0.75
-0.75   -0.75   -0.75   -0.75   -0.75   -0.56   -0.56   -0.56   -0.47   -0.47
-0.47   -0.38   -0.38   -0.34   -0.29   -0.29   -0.29   -0.20   -0.20   -0.20
-0.20   -0.11   -0.01   -0.01   -0.01    0.17   0.17    0.17    0.17     0.35
 0.35    0.44   0.62    0.62     0.62    0.81   1.06    1.08    1.45     1.45
 1.54    1.54   1.63    1.81     1.99    1.99   1.99    1.99    2.27     2.27


                                                                                17
Exploratory Data Analysis

 Exploratory data analysis is looking at methods
 Exploratory data analysis is looking at methods
to summarize data.
to summarize data.

 For now we simply sort the data values into ascending
 For now we simply sort the data values into ascending
order and identify the five-number summary and then
order and identify the five-number summary and then
construct a box plot..
construct a box plot




                                                         18
Five-Number Summary

1   Smallest Value

2   First Quartile

3   Median

4   Third Quartile

5   Largest Value




                      19
Five-Number Summary
 Example: Apartment Rents
      Lowest Value = 425    First Quartile = 445
                    Median = 475
      Third Quartile = 525 Largest Value = 615
425   430   430   435   435   435   435   435   440   440
440   440   440   445   445   445   445   445   450   450
450   450   450   450   450   460   460   460   465   465
465   470   470   472   475   475   475   480   480   480
480   485   490   490   490   500   500   500   500   510
510   515   525   525   525   535   549   550   570   570
575   575   580   590   600   600   600   600   615   615




                                                            20
Box Plot

 A box plot is a graphical summary of data that is
 A box plot is a graphical summary of data that is
 based on a five-number summary.
 based on a five-number summary.

 A key to the development of a box plot is the
 A key to the development of a box plot is the
 computation of the median and the quartiles Q11 and
 computation of the median and the quartiles Q and
 Q33..
 Q

 Box plots provide another way to identify outliers.
 Box plots provide another way to identify outliers.

 They also tell us whether the data are skewed.
 They also tell us whether the data are skewed.




                                                       21
Box Plot

 Example: Apartment Rents
  • A box is drawn with its ends located at the first and
    third quartiles (Q1 & Q3).
  • A vertical line is drawn in the box at the location of
    the median (second quartile).




      400 425 450 475 500 525 550 575 600 625

            Q1 = 445      Q3 = 525
                 Q2 = 475

                                                             22
Box Plot

 Limits are located (not drawn) using the interquartile
  range (IQR = Q3-Q1): they are 1.5IQR below Q1 and
  1.5 IQR above Q3.

 Data outside these limits are considered outliers.

 The locations of each outlier is shown with the
  symbol * .




                                                           23
Box Plot

 Example: Apartment Rents
  • The lower limit is located 1.5(IQR) below Q1.
   Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

  • The upper limit is located 1.5(IQR) above Q3.
   Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

  • There are no outliers (values less than 325 or
     greater than 645) in the apartment rent data.




                                                      24
Box Plot

 Example: Apartment Rents
  • Whiskers (dashed lines) are drawn from the ends
     of the box to the smallest and largest data values
     inside the limits.




       400 425 450 475 500 525 550 575 600 625

    Smallest value                      Largest value
    inside limits = 425                 inside limits = 615

                                                              25
Box Plot




      An excellent graphical technique for making
       comparisons among two or more groups.


                                                    26
Measures of Association
Between Two Variables
 Thus far we have examined numerical methods used
 Thus far we have examined numerical methods used
 to summarize the data for one variable at a time.
 to summarize the data for one variable at a time.

 Often a manager or decision maker is interested in
  Often a manager or decision maker is interested in
 the relationship between two variables..
  the relationship between two variables

 Two descriptive measures of the relationship
 Two descriptive measures of the relationship
 between two variables are covariance and correlation
 between two variables are covariance and correlation
 coefficient..
 coefficient




                                                        27
Covariance

 The covariance is a measure of the linear association
 The covariance is a measure of the linear association
 between two variables.
 between two variables.

 Positive values indicate a positive relationship.
 Positive values indicate a positive relationship.

 Negative values indicate a negative relationship.
 Negative values indicate a negative relationship.




                                                         28
Covariance

The covariance is computed as follows:
The covariance is computed as follows:

            (x1 − µ x )(y1 − µ y ) + L + (x N − µ x )(y N − µ y )
   σ xy =                                                           for
                                     N                              populations


           (x1 − x)(y1 − y) + L + (x n − x)(y n − y)
    s xy =                                                             for
                             n−1                                       samples




                                                                                  29
Correlation Coefficient


 Correlation is a measure of linear association.
 Correlation is a measure of linear association.


 There are also other types of associations not captured
 There are also other types of associations not captured
 by correlation.
 by correlation.




                                                           30
Correlation Coefficient

       The correlation coefficient is computed as follows:
       The correlation coefficient is computed as follows:
                             sxy                   σ xy
                     rxy =               ρ xy =
                             sx s y               σ xσ y

                      for                for
                      samples            populations


    (x1 − µ x ) + L + (x N − µ x )
               2                     2         (y1 − µ y )2 + L + (y N − µ y )2
σ =
  2
                                         σ2=
 x                N                       y                   N
     (x1 − x)2 + L + (x n − x) 2             (y1 − y) + L + (y n − y)
                                                          2                2
s2 =
 x                                       s =
                                          2

                n−1                       y
                                                       n−1
                                                                                  31
Correlation Coefficient

 The coefficient can take on values between -1 and +1.
 The coefficient can take on values between -1 and +1.

 Values near -1 indicate a strong negative linear
 Values near -1 indicate a strong negative linear
 relationship..
 relationship

 Values near +1 indicate a strong positive linear
 Values near +1 indicate a strong positive linear
 relationship..
 relationship

 The closer the correlation is to zero, the weaker the
 The closer the correlation is to zero, the weaker the
 relationship.
 relationship.




                                                         32
Correlation
  A Positive Relationship: correlation close to 1
            y




                                        x



                                                    33
Correlation
 A Negative Relationship: correlation close to -1
           y




                                      x



                                                    34
Correlation

No Apparent Relationship: Correlation near 0
          y




                                   x



                                               35
Covariance and Correlation Coefficient

 Example: Golfing Study
     A golfer is interested in investigating the
  relationship, if any, between driving distance and
     18-hole score.
                     Average Driving Average
                     Distance (yds.) 18-Hole Score
                           277.6               69
                           259.5               71
                           269.1               70
                           267.0               70
                           255.6               71
                           272.9               69


                                                       36
Covariance and Correlation Coefficient

 Example: Golfing Study

           x      y      ( xi − x ) ( yi − y ) ( xi − x )( yi − y )
          277.6   69       10.65       -1.0           -10.65
          259.5   71        -7.45       1.0            -7.45
          269.1   70         2.15         0                0
          267.0   70         0.05         0                0
          255.6   71      -11.35        1.0          -11.35
          272.9   69         5.95      -1.0            -5.95
Average 267.0 70.0                            Total -35.40
Std. Dev. 8.2192 .8944


                                                                      37
Covariance and Correlation Coefficient

 Example: Golfing Study
  • Sample Covariance
            sxy   =
                    ∑ (x − x )(y − y ) = − 35.40 =
                         i          i
                                                     − 7.08
                             n− 1         6−1
  • Sample Correlation Coefficient
                       sxy         −7.08
                  rxy =      =                = -.9631
                        sx sy (8.2192)(.8944)


  So, increasing driving distance decreases score, and the relation
  is really strong.

                                                                      38
Random Variables

  A random variable is a numerical description of the
  A random variable is a numerical description of the
  outcome of an experiment.
  outcome of an experiment.

  A discrete random variable may assume either a
  A discrete random variable may assume either a
  finite number of values or an infinite sequence of
  finite number of values or an infinite sequence of
  values.
  values.

  A continuous random variable may assume any
   A continuous random variable may assume any
  numerical value in an interval or collection of
   numerical value in an interval or collection of
  intervals.
   intervals.



                                                        39
Random Variables

  Question       Random Variable x              Type
Family         x = Number of dependents       Discrete
size               reported on tax return

Distance from x = Distance in miles from      Continuous
home to store     home to the store site
Own dog        x = 1 if own no pet;           Discrete
or cat           = 2 if own dog(s) only;
                 = 3 if own cat(s) only;
                 = 4 if own dog(s) and cat(s)



                                                           40
Discrete Probability Distributions



 The probability distribution for a random variable
  The probability distribution for a random variable
 describes how probabilities are distributed over
  describes how probabilities are distributed over
 the values of the random variable.
  the values of the random variable.




                                                       41
Discrete Probability Distributions

            The probability distribution is defined by a
             The probability distribution is defined by a
            probability function,, denoted by ff((x), which provides
             probability function denoted by x), which provides
            the probability for each value of the random variable.
             the probability for each value of the random variable.

            The required conditions for a discrete probability
            The required conditions for a discrete probability
            function are:
            function are:
>0          (probabilities are not negative)

(x) = 1      (sum of all probabilities =1)


          Remember that any probability is a number between 0 and 1.

                                                                       42
Expected Value

  The expected value,, or mean, of a random variable
   The expected value or mean, of a random variable
  is a measure of its central location.
   is a measure of its central location.
                 E(x) = µ = Σxf(x)



  The expected value does not have to be a value the
  The expected value does not have to be a value the
  random variable can assume.
  random variable can assume.




                                                       43
Variance and Standard Deviation

  The variance summarizes the variability in the
  The variance summarizes the variability in the
  values of a random variable.
  values of a random variable.
              Var(x) = σ 2 = Σ(x - µ)2f(x)



  The standard deviation,, σ,, is defined as the positive
  The standard deviation σ is defined as the positive
  square root of the variance.
  square root of the variance.




                                                            44
Binomial Probability Distribution

  Four Properties of a Binomial Experiment
   1. The experiment consists of a sequence of n
      identical trials.

2. Two outcomes, success and failure, are possible
   on each trial.

 3. The probability of a success, denoted by p, does
    not change from trial to trial.

   4. The trials are independent.



                                                       45
Binomial Probability Distribution

   Our interest is in the number of successes
   occurring in the n trials.




                                                46
Binomial Probability Distribution

   Binomial Probability Function

                             n x
                   f ( x ) =   p (1 − p )( n − x )
                             x
   where:
        x = the number of successes
        p = the probability of a success on one trial
        n = the number of trials
     f(x) = the probability of x successes in n trials

  n       n!
                     =
                                  ( 1 × 2 × 3L × n )
   ÷=
   x  (n − x )! x ! ( 1 × 2 × 3L × (n − x ) ) ( 1 × 2 × 3L × x )
= `n choose x’ = number of ways x people can be chosen out of n
                                                                     47
Binomial Probability Distribution

   Binomial Probability Function
                             n x
                  f ( x ) =   p (1 − p)( n − x )
                            x


                                    Probability of a particular
Number of experimental
                                    sequence of trial outcomes
 outcomes providing exactly
                                    with x successes in n trials
x successes in n trials


   These values are available in Table 5 of our textbook.



                                                                   48
Binomial Probability Distribution

  Example: IIT Entrance
   It is known that about 10% of the examinees taking
   the IIT entrance qualify.
  Thus, for any examinee chosen at random, there is a
   probability of 0.1 that the person will qualify.

  Choosing 3 examinees at random, what is
  the probability that exactly 1 of them will qualify?




                                                         49
Binomial Probability Distribution

  Example: IIT Entrance
                                                      Using the
                   p = .10, n = 3, x = 1              probability
                                                      function
                    n!
     f ( x) =               p x (1 − p ) (n − x )
              x !( n − x )!
                 3!
     f (1) =            (0.1)1 (0.9)2 = 3(.1)(.81) = .243
             1!(3 − 1)!


  You can just check the binomial probability table in textbook for
  n= 3, p = 0.1, x = 1.
                                                            Just f(1) if
 Or, in Excel, use ‘=BINOMDIST(1,3,0.1,FALSE)’                FALSE,
                                                            f(0)+f(1) if
                                                               TRUE 50
Binomial Probability Distribution
  Expected Value

             E(x) = µ = np

  Variance

               Var(x) = σ 2 = np(1 − p)

  Standard Deviation

                   σ = np(1 − p )




                                          51
Binomial Probability Distribution
  Example: Evans Electronics

  • Expected Value
      E(x) = np = 3(.1) = .3 employees out of 3

  • Variance
      Var(x) = np(1 – p) = 3(.1)(.9) = .27

  • Standard Deviation

            σ = 3(.1)(.9) = .52 employees


                                                  52
Poisson Probability Distribution

  A Poisson distributed random variable is often
  A Poisson distributed random variable is often
  useful in estimating the number of occurrences
  useful in estimating the number of occurrences
  over a specified interval of time or space
  over a specified interval of time or space

  It is a discrete random variable that may assume
  It is a discrete random variable that may assume
  an infinite sequence of values (x = 0, 1, 2, .. .. .. ).
  an infinite sequence of values (x = 0, 1, 2,          ).




                                                             53
Poisson Probability Distribution

  Examples of a Poisson distributed random variable:
  Examples of a Poisson distributed random variable:

      the number of defects in 14 pages of a book
       the number of defects in 14 pages of a book


      the number of customers arriving at the post
       the number of customers arriving at the post
      office in one hour
      office in one hour


  Bell Labs used the Poisson distribution to model the
  Bell Labs used the Poisson distribution to model the
  arrival of phone calls.
  arrival of phone calls.



                                                         54
Poisson Probability Distribution

  Two Properties of a Poisson Experiment

 1. The probability of an occurrence is the same
 1. The probability of an occurrence is the same
    for any two time intervals of equal length.
    for any two time intervals of equal length.

 2. The occurrence or nonoccurrence in any time
 2. The occurrence or nonoccurrence in any time
    interval is independent of the occurrence or
    interval is independent of the occurrence or
    nonoccurrence in any other time interval.
    nonoccurrence in any other time interval.




                                                   55
Poisson Probability Distribution

Poisson Probability Function

                               µ xe−µ
                      f ( x) =
                                  x!
  where:
       x = the number of occurrences in an interval
    f(x) = the probability of x occurrences in an interval
         µ = mean number of occurrences in an interval
          e = 2.71828


 These values are available in Table 7 of our textbook.



                                                             56
Poisson Probability Distribution

Poisson Probability Function

   Since there is no stated upper limit for the number
   of occurrences, the probability function f(x) is
   applicable for values x = 0, 1, 2, … without limit.


In practical applications, x will eventually become
large enough so that f(x) is very small and negligible.




                                                          57
Poisson Probability Distribution

   Example: Mercy Hospital
   Patients arrive at the emergency room of Mercy
   Hospital at the average rate of 6 per hour on
   weekend evenings.
   What is the probability of 4 arrivals in 30 minutes
   on a weekend evening?




                                                         58
Poisson Probability Distribution
                                                      Using the
  Example: Mercy Hospital
                                                      probability
                                                      function
            µ = 6/hour = 3/half-hour, x = 4
                       3 4 (2.71828)−3
               f (4) =                 = .16801
                               4!



  Or, simply check the table for Poisson probabilities in the book
  for μ = 3, x = 4.

                                                  Just f(4) if FALSE,
 In Excel, use ‘=POISSON(4,3,FALSE)’              f(0)+f(1)+...+f(4) if
                                                         TRUE
                                                                      59
Poisson Probability Distribution

  Example: Mercy Hospital

                                 Poisson Probabilities
                  0.25

                  0.20
    Probability




                  0.15
                                                                  actually, the
                                                                  sequence
                  0.10                                            continues:
                                                                  11, 12, …
                  0.05

                  0.00
                         0   1    2   3   4   5   6   7   8   9   10
                             Number of Arrivals in 30 Minutes


                                                                              60
Poisson Probability Distribution

  A property of the Poisson distribution is that
  A property of the Poisson distribution is that
  the mean and variance are equal.
  the mean and variance are equal.
                         µ=σ2




                                                   61
Continuous Probability Distributions

  A continuous random variable can assume any value
  in an interval on the real line or in a collection of
  intervals.
  It is not possible to talk about the probability of the
  random variable assuming a particular value.
  Instead, we talk about the probability of the random
  variable assuming a value within a given interval.




                                                            62
We denote the ‘density function’ by f(x). Also

         f ( x ) ≥ 0; ∫ f ( x )dx = 1

         E( X ) = ∫ xf ( x )dx
         Var ( X ) = ∫ ( x − E( X ) ) f ( x )dx
                                        2




                                                  63
Area as a Measure of Probability

  The area under the graph of f(x) and probability are
  identical.
  This is valid for all continuous random variables.
  The probability that x takes on a value between some
  lower value x1 and some higher value x2 can be found
  by computing the area under the graph of f(x) over
  the interval from x1 to x2.




                                                         64
Normal Probability Distribution

  The normal probability distribution is the most
  important distribution for describing a continuous
  random variable.

  It is used in a wide variety of applications
  including:

     • Heights of people   • Test scores
     • Rainfall amounts    • Scientific measurements

  For a large number of similar variables that are
  unrelated, sum and average are approximately normal.


                                                         65
Normal Probability Distribution

Normal Probability Density Function

                          1    − ( x − µ )2 /2σ 2
                f (x) =      e
                        σ 2π

           where:
                    µ   =   mean
                    σ   =   standard deviation
                    π   =   3.14159
                    e   =   2.71828




                                                    66
Normal Probability Distribution

  Characteristics

   The distribution is symmetric; its skewness
   measure is zero.




                                             x




                                                 67
Normal Probability Distribution

  Characteristics

   The highest point on the normal curve is at the
   mean, the middle point.




                                             x




                                                     68
Normal Probability Distribution

  Characteristics

   The mean can be any numerical value: negative,
   zero, or positive.




                                                    x
                -10   0            25



                                                        69
Normal Probability Distribution

  Characteristics

  The standard deviation determines the width of the
  curve: larger values result in wider, flatter curves.

                             σ = 15




                                  σ = 25

                                            x


                                                          70
Normal Probability Distribution

  Characteristics

   Probabilities for the normal random variable are
   given by areas under the curve. The total area
   under the curve is 1 (.5 to the left of the mean and
   .5 to the right).




                         .5    .5
                                               x


                                                          71
Normal Probability Distribution

  Characteristics (basis for the empirical rule)

   68.26% of values of a normal random variable
    68.26%
   are within +/- 1 standard deviation of its mean.
              +/- 1 standard deviation

   95.44% of values of a normal random variable
    95.44%
   are within +/- 2 standard deviations of its mean.
              +/- 2 standard deviations

   99.72% of values of a normal random variable
    99.72%
   are within +/- 3 standard deviations of its mean.
              +/- 3 standard deviations




                                                       72
Normal Probability Distribution

  Characteristics (basis for the empirical rule)
                           99.72%
                           95.44%
                           68.26%




                               µ
                                                        x
         µ – 3σ       µ – 1σ       µ + 1σ      µ + 3σ
               µ – 2σ                    µ + 2σ

                                                            73
Standard Normal Probability Distribution

  Characteristics

   A random variable having a normal distribution
   A random variable having a normal distribution
   with a mean of 0 and a standard deviation of 1 is
   with a mean of 0 and a standard deviation of 1 is
   said to have a standard normal probability
   said to have a standard normal probability
   distribution..
   distribution




                                                       74
Standard Normal Probability Distribution

  Characteristics

   The letter z is used to designate the standard
   normal random variable.


                                 σ=1




                                              z
                            0



                                                    75
Standard Normal Probability Distribution

  Converting to the Standard Normal Distribution

                        x−µ
                     z=
                         σ




                                                   76
Example: Demand
The daily demand of the new ipad in a store seems
to follow a normal distribution with an average of
15 and a standard deviation of 6.

The manager, who does not want to keep more than
20 ipads in his store at a time, would like to know
the probability of a stockout, i.e. that the demand in
a day will exceed 20.


                    P(x > 20) = ?


                                                         77
Solving for the Stockout Probability

Step 1: Convert x to the standard normal distribution.

                   z = (x - µ)/σ
                     = (20 - 15)/6
                     = .83




                                                         78
Step 2: Find the area under the standard normal
            curve to the left of z = .83.

Cumulative Probability Table for Standard Normal Distribution

    z    .00   .01   .02   .03   .04   .05    .06   .07   .08     .09
    .     .     .     .     .     .      .     .     .        .    .
    .5   .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
    .6   .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
    .7   .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
    .8   .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
    .9   .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
     .     .     .     .     .     .     .     .     .     .     .

                                 P(z < .83)

     These values are available in Table 1 of our textbook.

                                                                        79
Just f(0.83) if
                                                  FALSE, area
                                                   upto 0.83 if
In Excel, use ‘=NORMDIST(0.83,0,1,TRUE)’             TRUE



In fact, you can straightaway use ‘=NORMDIST(20,15,6,TRUE)’


                                                  P(X ≤ 20) with
                                                   μ = 15, σ = 6



                                                                    80
Standard Normal Probability Distribution

  Solving for the Stockout Probability

  Step 3: Compute the area under the standard normal
  Step 3: Compute the area under the standard normal
          curve to the right of z = .83.
          curve to the right of z = .83.

                P(z > .83) = 1 – P(z < .83)
                           = 1- .7967
                          = .2033

       Probability
       of a stockout                  P(x > 20)



                                                       81
Standard Normal Probability Distribution

  Solving for the Stockout Probability


                                   Area = 1 - .7967
       Area = .7967
                                         = .2033




                                                  z
                           0 .83

  These values are available in Table 1 of our textbook.
                                                           82
Standard Normal Probability Distribution



  If the manager of wants the probability of a stockout
  during replenishment lead-time to be no more than .
  05, what should the reorder point be?
      ---------------------------------------------------------------
  (Hint: Given a probability, we can use the standard
  normal table in an inverse fashion to find the
  corresponding z value. Give it a try.)




                                                                        83
End of Lecture




                 84

Contenu connexe

Tendances

Probability distributions
Probability distributionsProbability distributions
Probability distributionsmvskrishna
 
Descriptive stat
Descriptive statDescriptive stat
Descriptive stato_devinyak
 
The normal distribution
The normal distributionThe normal distribution
The normal distributionShakeel Nouman
 
Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...
Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...
Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...Sloan Sports Conference
 
Statistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normalStatistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normalSelvin Hadi
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 SlidesBrent Heard
 
Shortfall Aversion
Shortfall AversionShortfall Aversion
Shortfall Aversionguasoni
 
Leveraged ETFs Performance Evaluation
Leveraged ETFs Performance EvaluationLeveraged ETFs Performance Evaluation
Leveraged ETFs Performance Evaluationguasoni
 
Probability Distributions
Probability Distributions Probability Distributions
Probability Distributions Anthony J. Evans
 
Healthcare and Consumption with Aging
Healthcare and Consumption with AgingHealthcare and Consumption with Aging
Healthcare and Consumption with Agingguasoni
 
Who Should Sell Stocks?
Who Should Sell Stocks?Who Should Sell Stocks?
Who Should Sell Stocks?guasoni
 
Standard Deviation and Variance
Standard Deviation and VarianceStandard Deviation and Variance
Standard Deviation and VarianceJufil Hombria
 

Tendances (20)

Qm1 notes
Qm1 notesQm1 notes
Qm1 notes
 
Normal distribution stat
Normal distribution statNormal distribution stat
Normal distribution stat
 
Probability distributions
Probability distributionsProbability distributions
Probability distributions
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Staisticsii
StaisticsiiStaisticsii
Staisticsii
 
Ch05
Ch05Ch05
Ch05
 
The Standard Normal Distribution
The Standard Normal Distribution  The Standard Normal Distribution
The Standard Normal Distribution
 
Descriptive stat
Descriptive statDescriptive stat
Descriptive stat
 
The normal distribution
The normal distributionThe normal distribution
The normal distribution
 
Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...
Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...
Scoring Strategies for the Underdog – Using Risk as an Ally in Determining Op...
 
Statistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normalStatistik 1 6 distribusi probabilitas normal
Statistik 1 6 distribusi probabilitas normal
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 Slides
 
Shortfall Aversion
Shortfall AversionShortfall Aversion
Shortfall Aversion
 
Leveraged ETFs Performance Evaluation
Leveraged ETFs Performance EvaluationLeveraged ETFs Performance Evaluation
Leveraged ETFs Performance Evaluation
 
Probability Distributions
Probability Distributions Probability Distributions
Probability Distributions
 
Normal distri
Normal distriNormal distri
Normal distri
 
Normal distribution
Normal distribution Normal distribution
Normal distribution
 
Healthcare and Consumption with Aging
Healthcare and Consumption with AgingHealthcare and Consumption with Aging
Healthcare and Consumption with Aging
 
Who Should Sell Stocks?
Who Should Sell Stocks?Who Should Sell Stocks?
Who Should Sell Stocks?
 
Standard Deviation and Variance
Standard Deviation and VarianceStandard Deviation and Variance
Standard Deviation and Variance
 

Similaire à Poisson statistics

Standard Scores
Standard ScoresStandard Scores
Standard Scoresshoffma5
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)jillmitchell8778
 
St201 d normal distributions
St201 d normal distributionsSt201 d normal distributions
St201 d normal distributionsSharayah Becker
 
Lecture 4 The Normal Distribution.pptx
Lecture 4 The Normal Distribution.pptxLecture 4 The Normal Distribution.pptx
Lecture 4 The Normal Distribution.pptxshakirRahman10
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)마이캠퍼스
 
The standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesThe standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesAbhi Manu
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis TechniquesGajanan Bochare
 
Lesson 5 - Chebyshev and Normal.ppt
Lesson 5 - Chebyshev and Normal.pptLesson 5 - Chebyshev and Normal.ppt
Lesson 5 - Chebyshev and Normal.pptlokeshgupta130
 
Introduction to statistics 3
Introduction to statistics 3Introduction to statistics 3
Introduction to statistics 3David Balfour
 
Gerstman_PP07.ppt
Gerstman_PP07.pptGerstman_PP07.ppt
Gerstman_PP07.pptINDRAJJEPH
 
biostatistics normal.ppt
biostatistics normal.pptbiostatistics normal.ppt
biostatistics normal.pptswetachaudhari7
 
Gerstman_PP07.ppt
Gerstman_PP07.pptGerstman_PP07.ppt
Gerstman_PP07.pptkarthiksmp
 
Gerstman_PP07.ppt
Gerstman_PP07.pptGerstman_PP07.ppt
Gerstman_PP07.pptTemporary57
 
MTH120_Chapter7
MTH120_Chapter7MTH120_Chapter7
MTH120_Chapter7Sida Say
 

Similaire à Poisson statistics (20)

Standard Scores
Standard ScoresStandard Scores
Standard Scores
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)
 
St201 d normal distributions
St201 d normal distributionsSt201 d normal distributions
St201 d normal distributions
 
Lecture 4 The Normal Distribution.pptx
Lecture 4 The Normal Distribution.pptxLecture 4 The Normal Distribution.pptx
Lecture 4 The Normal Distribution.pptx
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
The standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesThe standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciences
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis Techniques
 
TESCO Evaluation of Non-Normal Meter Data
TESCO Evaluation of Non-Normal Meter DataTESCO Evaluation of Non-Normal Meter Data
TESCO Evaluation of Non-Normal Meter Data
 
Lesson 5 - Chebyshev and Normal.ppt
Lesson 5 - Chebyshev and Normal.pptLesson 5 - Chebyshev and Normal.ppt
Lesson 5 - Chebyshev and Normal.ppt
 
Les5e ppt 05
Les5e ppt 05Les5e ppt 05
Les5e ppt 05
 
Les5e ppt 05
Les5e ppt 05Les5e ppt 05
Les5e ppt 05
 
Stats chapter 2
Stats chapter 2 Stats chapter 2
Stats chapter 2
 
statics in research
statics in researchstatics in research
statics in research
 
Introduction to statistics 3
Introduction to statistics 3Introduction to statistics 3
Introduction to statistics 3
 
Gerstman_PP07.ppt
Gerstman_PP07.pptGerstman_PP07.ppt
Gerstman_PP07.ppt
 
biostatistics normal.ppt
biostatistics normal.pptbiostatistics normal.ppt
biostatistics normal.ppt
 
Gerstman_PP07.ppt
Gerstman_PP07.pptGerstman_PP07.ppt
Gerstman_PP07.ppt
 
Gerstman_PP07.ppt
Gerstman_PP07.pptGerstman_PP07.ppt
Gerstman_PP07.ppt
 
Lecture 10.4 bt
Lecture 10.4 btLecture 10.4 bt
Lecture 10.4 bt
 
MTH120_Chapter7
MTH120_Chapter7MTH120_Chapter7
MTH120_Chapter7
 

Poisson statistics

  • 1.
  • 2. Measures of Distribution Shape, Relative Location, and Detecting Outliers Distribution Shape z-Scores Empirical Rule Detecting Outliers 2
  • 3. Distribution Shape: Skewness An important measure of the shape of a distribution is called skewness. The formula for the skewness of sample data is 3 n  xi − x  Skewness = ∑ s  (n − 1)( n − 2)   Skewness can be easily computed using statistical software. 3
  • 4. Distribution Shape: Skewness Symmetric (not skewed) • Skewness is zero. • Mean and median are equal. .35 Skewness = 0 .30 Relative Frequency .25 .20 .15 .10 .05 0 4
  • 5. Distribution Shape: Skewness Moderately Skewed Left • Skewness is negative. • Mean will usually be less than the median. .35 Skewness = − .31 .30 Relative Frequency .25 .20 .15 .10 .05 0 5
  • 6. Distribution Shape: Skewness Moderately Skewed Right • Skewness is positive. • Mean will usually be more than the median. .35 Skewness = .31 .30 Relative Frequency .25 .20 .15 .10 .05 0 6
  • 7. Distribution Shape: Skewness Highly Skewed Right • Skewness is positive (often above 1.0). • Mean will usually be more than the median. .35 Skewness = 1.25 .30 Relative Frequency .25 .20 .15 .10 .05 0 7
  • 8. Distribution Shape: Skewness Example: Apartment Rents Seventy efficiency apartments were randomly sampled in a college town. The monthly rent prices for the apartments are listed below in ascending order. 425 430 430 435 435 435 435 435 440 440 440 440 440 445 445 445 445 445 450 450 450 450 450 450 450 460 460 460 465 465 465 470 470 472 475 475 475 480 480 480 480 485 490 490 490 500 500 500 500 510 510 515 525 525 525 535 549 550 570 570 575 575 580 590 600 600 600 600 615 615 8
  • 9. Distribution Shape: Skewness Example: Apartment Rents .35 Skewness = .92 .30 Relative Frequency .25 .20 .15 .10 .05 0 9
  • 10. z-Scores The z-score is often called the standardized value. The z-score is often called the standardized value. It denotes the number of standard deviations a data It denotes the number of standard deviations a data value xii is from the mean. value x is from the mean. xi − x zi = s Excel’s STANDARDIZE function can be used to Excel’s STANDARDIZE function can be used to compute the z-score. compute the z-score. 10
  • 11. z-Scores  An observation’s z-score is a measure of the relative location of the observation in a data set.  A data value less than the sample mean will have a z-score less than zero.  A data value greater than the sample mean will have a z-score greater than zero.  A data value equal to the sample mean will have a z-score of zero. 11
  • 12. z-Scores  Example: Apartment Rents • z-Score of Smallest Value (425) xi − x 425 − 490.80 z= = = − 1.20 s 54.74 Standardized Values for Apartment Rents -1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93 -0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75 -0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47 -0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20 -0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35 0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45 1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27 12
  • 13. Empirical Rule When the data are believed to approximate a bell-shaped distribution with moderate skew … The empirical rule can be used to determine the The empirical rule can be used to determine the percentage of data values that must be within a percentage of data values that must be within a specified number of standard deviations of the specified number of standard deviations of the mean. mean. The empirical rule is based on the normal The empirical rule is based on the normal distribution, which we will discuss later. distribution, which we will discuss later. 13
  • 14. Empirical Rule For data having a bell-shaped distribution, approximately 68.26% of the values are within of the values are within +/- 1 standard deviation of its mean. of its mean. 95.44% values are within of the values are within of the +/- 2 standard deviations of its mean. of its mean. 99.72% values are within of the values are within of the +/- 3 standard deviations its mean. of its mean. of 14
  • 15. Empirical Rule 99.72% 95.44% 68.26% µ x µ – 3σ µ – 1σ µ + 1σ µ + 3σ µ – 2σ µ + 2σ 15
  • 16. Detecting Outliers  An outlier is an unusually small or unusually large value in a data set.  A data value with a z-score less than -3 or greater than +3 might be considered an outlier.  It might be: • an incorrectly recorded data value • a data value that was incorrectly included in the data set • a data value that has occurred by chance 16
  • 17. Detecting Outliers  Example: Apartment Rents • The most extreme z-scores are -1.20 and 2.27 • Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents -1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93 -0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75 -0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47 -0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20 -0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35 0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45 1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27 17
  • 18. Exploratory Data Analysis Exploratory data analysis is looking at methods Exploratory data analysis is looking at methods to summarize data. to summarize data. For now we simply sort the data values into ascending For now we simply sort the data values into ascending order and identify the five-number summary and then order and identify the five-number summary and then construct a box plot.. construct a box plot 18
  • 19. Five-Number Summary 1 Smallest Value 2 First Quartile 3 Median 4 Third Quartile 5 Largest Value 19
  • 20. Five-Number Summary  Example: Apartment Rents Lowest Value = 425 First Quartile = 445 Median = 475 Third Quartile = 525 Largest Value = 615 425 430 430 435 435 435 435 435 440 440 440 440 440 445 445 445 445 445 450 450 450 450 450 450 450 460 460 460 465 465 465 470 470 472 475 475 475 480 480 480 480 485 490 490 490 500 500 500 500 510 510 515 525 525 525 535 549 550 570 570 575 575 580 590 600 600 600 600 615 615 20
  • 21. Box Plot A box plot is a graphical summary of data that is A box plot is a graphical summary of data that is based on a five-number summary. based on a five-number summary. A key to the development of a box plot is the A key to the development of a box plot is the computation of the median and the quartiles Q11 and computation of the median and the quartiles Q and Q33.. Q Box plots provide another way to identify outliers. Box plots provide another way to identify outliers. They also tell us whether the data are skewed. They also tell us whether the data are skewed. 21
  • 22. Box Plot  Example: Apartment Rents • A box is drawn with its ends located at the first and third quartiles (Q1 & Q3). • A vertical line is drawn in the box at the location of the median (second quartile). 400 425 450 475 500 525 550 575 600 625 Q1 = 445 Q3 = 525 Q2 = 475 22
  • 23. Box Plot  Limits are located (not drawn) using the interquartile range (IQR = Q3-Q1): they are 1.5IQR below Q1 and 1.5 IQR above Q3.  Data outside these limits are considered outliers.  The locations of each outlier is shown with the symbol * . 23
  • 24. Box Plot  Example: Apartment Rents • The lower limit is located 1.5(IQR) below Q1. Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325 • The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645 • There are no outliers (values less than 325 or greater than 645) in the apartment rent data. 24
  • 25. Box Plot  Example: Apartment Rents • Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. 400 425 450 475 500 525 550 575 600 625 Smallest value Largest value inside limits = 425 inside limits = 615 25
  • 26. Box Plot An excellent graphical technique for making comparisons among two or more groups. 26
  • 27. Measures of Association Between Two Variables Thus far we have examined numerical methods used Thus far we have examined numerical methods used to summarize the data for one variable at a time. to summarize the data for one variable at a time. Often a manager or decision maker is interested in Often a manager or decision maker is interested in the relationship between two variables.. the relationship between two variables Two descriptive measures of the relationship Two descriptive measures of the relationship between two variables are covariance and correlation between two variables are covariance and correlation coefficient.. coefficient 27
  • 28. Covariance The covariance is a measure of the linear association The covariance is a measure of the linear association between two variables. between two variables. Positive values indicate a positive relationship. Positive values indicate a positive relationship. Negative values indicate a negative relationship. Negative values indicate a negative relationship. 28
  • 29. Covariance The covariance is computed as follows: The covariance is computed as follows: (x1 − µ x )(y1 − µ y ) + L + (x N − µ x )(y N − µ y ) σ xy = for N populations (x1 − x)(y1 − y) + L + (x n − x)(y n − y) s xy = for n−1 samples 29
  • 30. Correlation Coefficient Correlation is a measure of linear association. Correlation is a measure of linear association. There are also other types of associations not captured There are also other types of associations not captured by correlation. by correlation. 30
  • 31. Correlation Coefficient The correlation coefficient is computed as follows: The correlation coefficient is computed as follows: sxy σ xy rxy = ρ xy = sx s y σ xσ y for for samples populations (x1 − µ x ) + L + (x N − µ x ) 2 2 (y1 − µ y )2 + L + (y N − µ y )2 σ = 2 σ2= x N y N (x1 − x)2 + L + (x n − x) 2 (y1 − y) + L + (y n − y) 2 2 s2 = x s = 2 n−1 y n−1 31
  • 32. Correlation Coefficient The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear Values near -1 indicate a strong negative linear relationship.. relationship Values near +1 indicate a strong positive linear Values near +1 indicate a strong positive linear relationship.. relationship The closer the correlation is to zero, the weaker the The closer the correlation is to zero, the weaker the relationship. relationship. 32
  • 33. Correlation A Positive Relationship: correlation close to 1 y x 33
  • 34. Correlation A Negative Relationship: correlation close to -1 y x 34
  • 35. Correlation No Apparent Relationship: Correlation near 0 y x 35
  • 36. Covariance and Correlation Coefficient  Example: Golfing Study A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole score. Average Driving Average Distance (yds.) 18-Hole Score 277.6 69 259.5 71 269.1 70 267.0 70 255.6 71 272.9 69 36
  • 37. Covariance and Correlation Coefficient  Example: Golfing Study x y ( xi − x ) ( yi − y ) ( xi − x )( yi − y ) 277.6 69 10.65 -1.0 -10.65 259.5 71 -7.45 1.0 -7.45 269.1 70 2.15 0 0 267.0 70 0.05 0 0 255.6 71 -11.35 1.0 -11.35 272.9 69 5.95 -1.0 -5.95 Average 267.0 70.0 Total -35.40 Std. Dev. 8.2192 .8944 37
  • 38. Covariance and Correlation Coefficient  Example: Golfing Study • Sample Covariance sxy = ∑ (x − x )(y − y ) = − 35.40 = i i − 7.08 n− 1 6−1 • Sample Correlation Coefficient sxy −7.08 rxy = = = -.9631 sx sy (8.2192)(.8944) So, increasing driving distance decreases score, and the relation is really strong. 38
  • 39. Random Variables A random variable is a numerical description of the A random variable is a numerical description of the outcome of an experiment. outcome of an experiment. A discrete random variable may assume either a A discrete random variable may assume either a finite number of values or an infinite sequence of finite number of values or an infinite sequence of values. values. A continuous random variable may assume any A continuous random variable may assume any numerical value in an interval or collection of numerical value in an interval or collection of intervals. intervals. 39
  • 40. Random Variables Question Random Variable x Type Family x = Number of dependents Discrete size reported on tax return Distance from x = Distance in miles from Continuous home to store home to the store site Own dog x = 1 if own no pet; Discrete or cat = 2 if own dog(s) only; = 3 if own cat(s) only; = 4 if own dog(s) and cat(s) 40
  • 41. Discrete Probability Distributions The probability distribution for a random variable The probability distribution for a random variable describes how probabilities are distributed over describes how probabilities are distributed over the values of the random variable. the values of the random variable. 41
  • 42. Discrete Probability Distributions The probability distribution is defined by a The probability distribution is defined by a probability function,, denoted by ff((x), which provides probability function denoted by x), which provides the probability for each value of the random variable. the probability for each value of the random variable. The required conditions for a discrete probability The required conditions for a discrete probability function are: function are: >0 (probabilities are not negative) (x) = 1 (sum of all probabilities =1) Remember that any probability is a number between 0 and 1. 42
  • 43. Expected Value The expected value,, or mean, of a random variable The expected value or mean, of a random variable is a measure of its central location. is a measure of its central location. E(x) = µ = Σxf(x) The expected value does not have to be a value the The expected value does not have to be a value the random variable can assume. random variable can assume. 43
  • 44. Variance and Standard Deviation The variance summarizes the variability in the The variance summarizes the variability in the values of a random variable. values of a random variable. Var(x) = σ 2 = Σ(x - µ)2f(x) The standard deviation,, σ,, is defined as the positive The standard deviation σ is defined as the positive square root of the variance. square root of the variance. 44
  • 45. Binomial Probability Distribution Four Properties of a Binomial Experiment 1. The experiment consists of a sequence of n identical trials. 2. Two outcomes, success and failure, are possible on each trial. 3. The probability of a success, denoted by p, does not change from trial to trial. 4. The trials are independent. 45
  • 46. Binomial Probability Distribution Our interest is in the number of successes occurring in the n trials. 46
  • 47. Binomial Probability Distribution Binomial Probability Function n x f ( x ) =   p (1 − p )( n − x ) x where: x = the number of successes p = the probability of a success on one trial n = the number of trials f(x) = the probability of x successes in n trials n n! = ( 1 × 2 × 3L × n )  ÷=  x  (n − x )! x ! ( 1 × 2 × 3L × (n − x ) ) ( 1 × 2 × 3L × x ) = `n choose x’ = number of ways x people can be chosen out of n 47
  • 48. Binomial Probability Distribution Binomial Probability Function  n x f ( x ) =   p (1 − p)( n − x ) x Probability of a particular Number of experimental sequence of trial outcomes outcomes providing exactly with x successes in n trials x successes in n trials These values are available in Table 5 of our textbook. 48
  • 49. Binomial Probability Distribution Example: IIT Entrance It is known that about 10% of the examinees taking the IIT entrance qualify. Thus, for any examinee chosen at random, there is a probability of 0.1 that the person will qualify. Choosing 3 examinees at random, what is the probability that exactly 1 of them will qualify? 49
  • 50. Binomial Probability Distribution Example: IIT Entrance Using the p = .10, n = 3, x = 1 probability function n! f ( x) = p x (1 − p ) (n − x ) x !( n − x )! 3! f (1) = (0.1)1 (0.9)2 = 3(.1)(.81) = .243 1!(3 − 1)! You can just check the binomial probability table in textbook for n= 3, p = 0.1, x = 1. Just f(1) if Or, in Excel, use ‘=BINOMDIST(1,3,0.1,FALSE)’ FALSE, f(0)+f(1) if TRUE 50
  • 51. Binomial Probability Distribution Expected Value E(x) = µ = np Variance Var(x) = σ 2 = np(1 − p) Standard Deviation σ = np(1 − p ) 51
  • 52. Binomial Probability Distribution Example: Evans Electronics • Expected Value E(x) = np = 3(.1) = .3 employees out of 3 • Variance Var(x) = np(1 – p) = 3(.1)(.9) = .27 • Standard Deviation σ = 3(.1)(.9) = .52 employees 52
  • 53. Poisson Probability Distribution A Poisson distributed random variable is often A Poisson distributed random variable is often useful in estimating the number of occurrences useful in estimating the number of occurrences over a specified interval of time or space over a specified interval of time or space It is a discrete random variable that may assume It is a discrete random variable that may assume an infinite sequence of values (x = 0, 1, 2, .. .. .. ). an infinite sequence of values (x = 0, 1, 2, ). 53
  • 54. Poisson Probability Distribution Examples of a Poisson distributed random variable: Examples of a Poisson distributed random variable: the number of defects in 14 pages of a book the number of defects in 14 pages of a book the number of customers arriving at the post the number of customers arriving at the post office in one hour office in one hour Bell Labs used the Poisson distribution to model the Bell Labs used the Poisson distribution to model the arrival of phone calls. arrival of phone calls. 54
  • 55. Poisson Probability Distribution Two Properties of a Poisson Experiment 1. The probability of an occurrence is the same 1. The probability of an occurrence is the same for any two time intervals of equal length. for any two time intervals of equal length. 2. The occurrence or nonoccurrence in any time 2. The occurrence or nonoccurrence in any time interval is independent of the occurrence or interval is independent of the occurrence or nonoccurrence in any other time interval. nonoccurrence in any other time interval. 55
  • 56. Poisson Probability Distribution Poisson Probability Function µ xe−µ f ( x) = x! where: x = the number of occurrences in an interval f(x) = the probability of x occurrences in an interval µ = mean number of occurrences in an interval e = 2.71828 These values are available in Table 7 of our textbook. 56
  • 57. Poisson Probability Distribution Poisson Probability Function Since there is no stated upper limit for the number of occurrences, the probability function f(x) is applicable for values x = 0, 1, 2, … without limit. In practical applications, x will eventually become large enough so that f(x) is very small and negligible. 57
  • 58. Poisson Probability Distribution Example: Mercy Hospital Patients arrive at the emergency room of Mercy Hospital at the average rate of 6 per hour on weekend evenings. What is the probability of 4 arrivals in 30 minutes on a weekend evening? 58
  • 59. Poisson Probability Distribution Using the Example: Mercy Hospital probability function µ = 6/hour = 3/half-hour, x = 4 3 4 (2.71828)−3 f (4) = = .16801 4! Or, simply check the table for Poisson probabilities in the book for μ = 3, x = 4. Just f(4) if FALSE, In Excel, use ‘=POISSON(4,3,FALSE)’ f(0)+f(1)+...+f(4) if TRUE 59
  • 60. Poisson Probability Distribution Example: Mercy Hospital Poisson Probabilities 0.25 0.20 Probability 0.15 actually, the sequence 0.10 continues: 11, 12, … 0.05 0.00 0 1 2 3 4 5 6 7 8 9 10 Number of Arrivals in 30 Minutes 60
  • 61. Poisson Probability Distribution A property of the Poisson distribution is that A property of the Poisson distribution is that the mean and variance are equal. the mean and variance are equal. µ=σ2 61
  • 62. Continuous Probability Distributions A continuous random variable can assume any value in an interval on the real line or in a collection of intervals. It is not possible to talk about the probability of the random variable assuming a particular value. Instead, we talk about the probability of the random variable assuming a value within a given interval. 62
  • 63. We denote the ‘density function’ by f(x). Also f ( x ) ≥ 0; ∫ f ( x )dx = 1 E( X ) = ∫ xf ( x )dx Var ( X ) = ∫ ( x − E( X ) ) f ( x )dx 2 63
  • 64. Area as a Measure of Probability The area under the graph of f(x) and probability are identical. This is valid for all continuous random variables. The probability that x takes on a value between some lower value x1 and some higher value x2 can be found by computing the area under the graph of f(x) over the interval from x1 to x2. 64
  • 65. Normal Probability Distribution The normal probability distribution is the most important distribution for describing a continuous random variable. It is used in a wide variety of applications including: • Heights of people • Test scores • Rainfall amounts • Scientific measurements For a large number of similar variables that are unrelated, sum and average are approximately normal. 65
  • 66. Normal Probability Distribution Normal Probability Density Function 1 − ( x − µ )2 /2σ 2 f (x) = e σ 2π where: µ = mean σ = standard deviation π = 3.14159 e = 2.71828 66
  • 67. Normal Probability Distribution Characteristics The distribution is symmetric; its skewness measure is zero. x 67
  • 68. Normal Probability Distribution Characteristics The highest point on the normal curve is at the mean, the middle point. x 68
  • 69. Normal Probability Distribution Characteristics The mean can be any numerical value: negative, zero, or positive. x -10 0 25 69
  • 70. Normal Probability Distribution Characteristics The standard deviation determines the width of the curve: larger values result in wider, flatter curves. σ = 15 σ = 25 x 70
  • 71. Normal Probability Distribution Characteristics Probabilities for the normal random variable are given by areas under the curve. The total area under the curve is 1 (.5 to the left of the mean and .5 to the right). .5 .5 x 71
  • 72. Normal Probability Distribution Characteristics (basis for the empirical rule) 68.26% of values of a normal random variable 68.26% are within +/- 1 standard deviation of its mean. +/- 1 standard deviation 95.44% of values of a normal random variable 95.44% are within +/- 2 standard deviations of its mean. +/- 2 standard deviations 99.72% of values of a normal random variable 99.72% are within +/- 3 standard deviations of its mean. +/- 3 standard deviations 72
  • 73. Normal Probability Distribution Characteristics (basis for the empirical rule) 99.72% 95.44% 68.26% µ x µ – 3σ µ – 1σ µ + 1σ µ + 3σ µ – 2σ µ + 2σ 73
  • 74. Standard Normal Probability Distribution Characteristics A random variable having a normal distribution A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is with a mean of 0 and a standard deviation of 1 is said to have a standard normal probability said to have a standard normal probability distribution.. distribution 74
  • 75. Standard Normal Probability Distribution Characteristics The letter z is used to designate the standard normal random variable. σ=1 z 0 75
  • 76. Standard Normal Probability Distribution Converting to the Standard Normal Distribution x−µ z= σ 76
  • 77. Example: Demand The daily demand of the new ipad in a store seems to follow a normal distribution with an average of 15 and a standard deviation of 6. The manager, who does not want to keep more than 20 ipads in his store at a time, would like to know the probability of a stockout, i.e. that the demand in a day will exceed 20. P(x > 20) = ? 77
  • 78. Solving for the Stockout Probability Step 1: Convert x to the standard normal distribution. z = (x - µ)/σ = (20 - 15)/6 = .83 78
  • 79. Step 2: Find the area under the standard normal curve to the left of z = .83. Cumulative Probability Table for Standard Normal Distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 . . . . . . . . . . . .5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 .6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 .7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 .8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 .9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 . . . . . . . . . . . P(z < .83) These values are available in Table 1 of our textbook. 79
  • 80. Just f(0.83) if FALSE, area upto 0.83 if In Excel, use ‘=NORMDIST(0.83,0,1,TRUE)’ TRUE In fact, you can straightaway use ‘=NORMDIST(20,15,6,TRUE)’ P(X ≤ 20) with μ = 15, σ = 6 80
  • 81. Standard Normal Probability Distribution Solving for the Stockout Probability Step 3: Compute the area under the standard normal Step 3: Compute the area under the standard normal curve to the right of z = .83. curve to the right of z = .83. P(z > .83) = 1 – P(z < .83) = 1- .7967 = .2033 Probability of a stockout P(x > 20) 81
  • 82. Standard Normal Probability Distribution Solving for the Stockout Probability Area = 1 - .7967 Area = .7967 = .2033 z 0 .83 These values are available in Table 1 of our textbook. 82
  • 83. Standard Normal Probability Distribution If the manager of wants the probability of a stockout during replenishment lead-time to be no more than . 05, what should the reorder point be? --------------------------------------------------------------- (Hint: Given a probability, we can use the standard normal table in an inverse fashion to find the corresponding z value. Give it a try.) 83