1. Measures of Dispersion Variance and the standard deviation
• Concept and objective Variance is the average squared deviation from the mean
• Range, inter-quartile range (IQR =Q3 – Q1)
• Variance and standard deviation (sd) 1 N
• Computation
σ2 =
N
∑(X
i =1
i − µ )2
x x x x
• Chebyshev’s Inequality X1 Xi µ X2 XN
• Relative dispersion -- the coefficient of
σ, the standard deviation is the (+ve) square root of the variance
variation
In the sample variance calculation, use the denominator (n-1)
σ
C.V. = × 100%
µ 1 n
S2 = ∑ ( X i − X )2
n − 1 i =1
Chebyshev’s Inequality
Chebyshev’s Inequality:
• At least (1 - 1/k2) proportion of the data must be
within k standard deviation of the mean. Illustration
• Here k is a number (not necessarily an integer)
greater than 1.
• The statement is valid for ANY distribution,
discrete or continuous, symmetric or otherwise.
• If a r.v. X has a mean µ and a standard deviation σ
then a equivalent probability statement is :
µ-3σ µ-2σ µ-σ µ µ+σ µ+2σ µ+3σ
1
P [ | X − µ | > kσ ] ≤ At least 75%
k2
At least 88.89%
Understanding Standard deviation
The Coefficient of Variation:
294 MLA of WB has an average wealth of 68 Lakh A measure of relative dispersion
and s.d. = 10 Lakh. What does that tell you? σ S
or × 100 %
In particular, what can you say about % of MLAs µ X
having wealth • Unit free
• between 58L and 78L ? • Amenable to comparison
• Often expressed in terms of percentages
• between 53L and 83L ?
• Less than 53 L or more than 83L?
• between 50L and 1 crore ?
• More than 1 crore?
1
2. Box Plot Box Plot
Elements of a Box Plot
Smallest data Largest data point
point not below not exceeding Suspected
Outlier inner fence inner fence outlier
o X X * o X X
*
Inner Q1 Median
Outer Q3 Inner Outer
Fence Fence Fence Fence
Q1-1.5(IQR) Interquartile Q3+1.5(IQR)
Range
Q1-3(IQR) Q3+3(IQR)
Review Descriptive Statistics
Poll Forecasting – Exit polls
• Graphical representations, frequency
distribution • Exit Polls in US election 2000 in the critical
state of Florida
• Measures of central tendency and dispersion
– Computation and interpretation • Indian Election 2004, 2009,
– Chebychev’s result • UP 2012
• Skewness and Kurtosis • Karnataka 2008
• Outliers
– What are they? How to detect?
– What to do if there are outliers?
A simple example: Overview Some of the questions to be
Population = all projects undertaken by a company answered
An unknown proportion π of them took longer than scheduled
* • To what degree, the randomness in Y can be
* Delay *
* ** * * attributed to sampling fluctuations?
* *
*
* *
• How close is Y = p to π ?
* * n
**
* * * * * • If we want p to be within ±0.05 of π , how
many projects do we need to look at?
A random sample of n projects are selected. Y of them are found to be delayed.
(sample outcome) Y is random, but the randomness depends on π.
Given the value of Y, one can make an objective inference about π.
2
3. Myth and Mystery of Probability
Overview of Probability
• What is chance of getting any rain today in • Approaches for defining probability
the campus?
• What is the probability that India will win • Basic Probability rules
WC2015?
• What is the chance that India’s space • Conditional probability and notion of
mission will send a human being to moon independence
by 2020?
• Bayes’ rule
Approaches for defining
Probability Laws
probability
• Classical approach • 0 ≤ P[A] ≤1;
• P[impossible event]=0; P[Sure event]=1
• (Asymptotic) Relative frequency approach • P[A or B] = P[A] + P[B] - P[AB]
• In particular, P[not A] = 1- P[A]
• Subjective probability • Look at the Venn diagram and write down
other formulae like
• P[A] = P[A and B] + P[A and (not B)]
3