Statistics - Probability theory 1

Outline
Topics
Probability, sample space, random variable
Probability distribution
Expected value
Variance
Moments
Linear transformations of random variables
Joint distributions

Applied Statistics for Economics
2. Introduction to Probability Theory

SFC - juliohuato@gmail.com

Spring 2012

SFC - juliohuato@gmail.com Applied Statistics for Economics 2. Introduction to Probability

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Topics
Expected value
Variance
Moments
Joint distributions


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Topics

The main topics in this chapter are:
random variables and probability distributions,
expected values: mean and variance, and
Two random variables jointly considered.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Probability
The world in motion is viewed as a set of random processes or
random experiments.
Randomness means that, no matter how much our understanding
of the world may advance, there is always an element of ignorance
or uncertainty in such understanding. In other words: given
specific causes, we don’t know fully which states of the world will
result. Or, given specific states of the world, we don’t know fully
what specifically caused such states of the world.
In other words, we are uncertain or – more plainly said – ignorant
about the specific causes or, alternatively, effects involved in these
processes.

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Probability

Examples of random processes: Your meeting the next person, SFC
students commuting to school, residents of the U.S. producing new
goods in a given year, etc.
Why are they random? Because we are uncertain about the gender
or the age of the next person you’ll meet, the commuting time of
SFC students or the means of transportation they will use, the
annual gross domestic product in the U.S. or its composition, etc.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Probability

The mutually exclusive possible results of these experiments are
called the outcomes. E.g. the next person you’ll meet could be
female or male, young or old; SFC students may take a few or
many minutes to commute to school; U.S. annual GDP may go up
or down by some amount compared to the previous year, etc.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Probability
Probability: the degree of belief that the outcome of an
experiment will be a particular one.
How to decide which probability to assign to a particular outcome of an
experiment (e.g. that if you meet another person, the gender of such
person will be female)? How to make this decision in a well-informed,
disciplined, scientiﬁc way?1
One can only use experience – individual or collective – i.e. history. We
may keep record of the gender of the people we meet over time and use
the data compiled to inform our belief or look at records on the gender
composition of the local population, etc.
1
In a sense, the whole purpose of statistics is to determine probabilities or,
alternatively, expectations based on probabilities.

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Sample space, event

Sample space or population: the set of all the possible outcomes of
the experiment. E.g. the sample space of the experiment of
ﬂipping a coin once is: S = {H, T }.2
Event: a subset of the sample space, i.e. a set of one or more
outcomes. E.g. the event (M ≤ 1) that your car will “need one
repair at most” includes “no repairs” (M = 0) and “one repair”
(M = 1).

2
We rule out ‘freak’ possibilities, like the coin landing on its edge.

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Random variables

Random variable (r.v.): a numerical summary of a random
outcome. For example, G = g , where (e.g.) g is 0 if ‘male’ and 1
if ‘female’. The number of times your car needs repair during a
given month: M = m, where m = 0, 1, 2, 3, . . .. The time it takes
for SFC students to commute to school: T = t, where t is time in
minutes.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Random variables

There are discrete and continuous random variables. Gender,
summarized as a 0 if ‘male’ and 1 if ‘female’, and the number of
car repairs in a month are discrete random variables. The
commuting time, if recorded in fractions of an hour – or even
fractions of minutes and seconds, etc. can be regarded as a
continuous r.v.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Probability distribution of a discrete r.v.

The probability distribution of a discrete r.v. is a list of all the values of
the r.v. and the probabilities associated to each value of the r.v. By
convention, the probabilities are a number between 0 and 1, where 0
means impossibility and 1 means full certainty; the probabilities over the
sample space must add up to 1. E.g. let G = 0, 1 be the r.v. ‘gender of
the next person you’ll meet’. Then:

G Pr(G = g )
0 0.45
1 0.55


Outline
Topics
Expected value
Variance
Moments
Joint distributions


Using the information in the probability distribution, you can compute
the probability of a given event. E.g. the probability that you’ll meet ‘a
male or a female’:

Pr(G = 0 or G = 1) = Pr(G = 0)+Pr(G = 1) = 0.45+0.55 = 1 = 100%.

In words, we are completely certain that you’ll meet either a male or a
female the next time you meet a person.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Admittedly, the previous example is trivial. But consider the probability
distribution of your car needing repair(s) in a given month. The r.v.
‘number of repairs’ needed is denoted as M:

M Pr(M = m)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01

What’s the probability that the the car will need one or two repairs in a
month? Answer:
Pr(M = 1 or M = 2) = Pr(M = 1)+Pr(M = 2) = 0.10+0.06 = 0.16 = 16%.


Outline
Topics
Expected value
Variance
Moments
Joint distributions


The cumulative probability distribution (also known as a
‘cumulative distribution function’ or c.d.f.) is the probability that
the random variable is less than or equal to a particular value. The
ﬁrst two columns of the following table are the same as in the
previous table. The last column gives the c.d.f.:

M Pr(M = m) Pr(M ≤ m)
0 0.80 0.80
1 0.10 0.90
2 0.06 0.96
3 0.03 0.99
4 0.01 1.00


Outline
Topics
Expected value
Variance
Moments
Joint distributions


A binary discrete r.v. (e.g. G = 0, 1) is called a Bernoulli r.v. The
Bernoulli distribution is:
1 with probability p
G=
0 with probability 1 − p

where p is the probability of the next person being ‘female’.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Probability distribution of a continuous r.v.

The cumulative probability distribution of a continuous r.v. is also
the probability that the random variable is less than or equal to a
particular value.
The probability density function (p.d.f.) of a continuous random
variable summarizes the probabilities for each value of the random
variable.
The mathematical description of the p.d.f. of a continuous variable
requires that you’re familiar with calculus. So, we’ll skip it for now.
NB: Strictly speaking, the probability that a continuous random variable has a particular value is zero. We can only
speak of the probability of the random variable falling in a range (between two given values).
NB2: The p.d.f. and the c.d.f. show the same information in diﬀerent formats.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Characteristics of a r.v. distribution

In the practice of statistics, two basic measures are used
extensively to characterize the distribution of a r.v.:
the expected value or mean (or average) and
the variance.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Expected value
The expected value of a r.v. X or E (X ) is the average value of the
variable over many repeated trials.
It is computed as a weighted average of the possible outcomes,
where the weights are the probabilities of the outcomes. It is also
called the mean of X and denoted by µX . For a discrete r.v.:
k
E (X ) = x1 p1 + x2 p2 + · · · + xk pk = x i pi
i=1

E.g.: You loan $100 to your friend for a year at 10% interest.
There’s a 99% chance he’ll repay the loan and 1% he won’t.
What’s the expected value of your loan at maturity?

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Expected value

E.g.: You loan $100 to your friend for a year at 10% interest.
There’s a 99% chance he’ll repay the loan and 1% he won’t.
What’s the expected value of your loan at maturity?
Answer:
($110 × 0.99) + ($0 × .01) = $108.90
E.g.: What’s the expected value or average of the number of car
repairs per month? See the table above.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Expected value

E.g.: What’s the expected value or average of the number of car
repairs per month (M)? See the table above.
Answer:

E (M) = (0 × 0.80) + (1 × 0.10) + (2 × 0.06)+

(3 × 0.03) + (4 × 0.01) = 0.35
What does that mean?
E.g.: In general, what’s the expected value of a Bernoulli r.v. with
Pr(G = 1) = p and Pr(G = 0) = 1 − p?


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Expected value

E.g.: In general, what’s the expected value of a Bernoulli r.v. with
Pr(G = 1) = p and Pr(G = 0) = 1 − p?
Answer:
E (G ) = (1 × p) + (0 × (1 − p)) = p
Note 1: Think of the operator E (.) as a function that transforms
data on a variable by multiplying each value of the variable by its
probability and then adding up all the products.
Note 2: The formula for the expected value of a continuous r.v.
requires calculus. So we’ll skip it for now.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Variance and standard deviation
The variance of a r.v. Y is:

var(Y ) = σY = E [(Y − µY )2 ]
2

The standard deviation is the positive square root of the variance
σY :
2
s.d.(Y ) = σY = + σY
Basically, the s.d. gives the same information as the variance, but
in units that are easier to understand. The units of the standard
deviation are the same units as Y and µY .
What is the intuition behind the variance and/or the standard
deviation?

Outline
Topics
Expected value
Variance
Moments
Joint distributions


For a discrete r.v.:
k
var(Y ) = σY = E [(Y − µY )2 ] =
2
(yi − µY )2 pi
i=1

k
s.d.(Y ) = σY = (yi − µY )2 pi
i=1

E.g.: What are the var. and s.d. of the number of car repairs per
month (M)?


Outline
Topics
Expected value
Variance
Moments
Joint distributions


E.g.: What are the var. and s.d. of the number of car repairs per
month (M)?
Answer:

var(M) = [(0−0.35)2 ×0.80]+[(1−0.35)2 ×0.10]+[(2−0.35)2 ×0.06]

+[(3 − 0.35)2 × 0.03] + [(4 − 0.35)2 × 0.01] = 0.6475
√
s.d.(M) = 0.6475 ∼ 0.80
=
E.g.: What are the var. and s.d. of a Bernoulli r.v.?


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Variance
E.g.: What are the var. and s.d. of a Bernoulli r.v.?
Answer:
var(G ) = [(0 − p)2 × (1 − p)] + [(1 − p)2 × p] = p(1 − p)
s.d.(G ) = p(1 − p)
Note 1: Think of the operator var(.) as a function that transforms
data on a variable by taking the distance or difference between
each value of the variable and its mean, squaring that difference,
multiplying it by the respective probability, and then adding up all
the products.
Note 1: Think of the operator s.d.(.) as a function that transforms
data on a variable by taking the distance or difference between
each value of the -variable and its mean, Statistics for Economicsdifference, to Probability
SFC juliohuato@gmail.com Applied squaring that 2. Introduction

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Moments

More formally, in statistics, the characteristics of the distribution of
a r.v. are called moments.
E (Y ) is the ﬁrst moment, E (Y 2 ) is the second moment, and
E (Y r ) is the r th moment of Y . The ﬁrst moment is the mean and
it is a measure of the center of the distribution, the second
moment is a measure of its dispersion or spread, and r -th moments
for r > 2 measure other aspects of the distribution’s shape.
Clearly, the second moment of the distribution is intimately related
to the variance. How?


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Moments
Two other measures of the shape (using higher moments) of a
distribution are:
Skewness:
E [(Y − µY )3 ]
Skewness = 3
σY
For a symmetric distribution, the skewness is zero. If the distribution has
a long left tail, the skewness is negative. If the distribution has a long
right tail, the skewness is positive.
Kurtosis:

E [(Y − µY )4 ]
Kurtosis = 4
σY
For a distribution with heavy tails (outliers are likely), the kurtosis is to Probability
SFC - juliohuato@gmail.com Applied Statistics for Economics 2. Introduction

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Mean of a linear function of a r.v.
Consider the income tax schedule:

Y = 2, 000 + 0.8X
where X is pre-tax earnings and Y is after-tax earnings. What is the
marginal tax rate?
Suppose an individual’s next year pre-tax earnings are a r.v. with mean
2
µX and variance σX . Since her pre-tax earnings are random, her after-tax
earnings are random as well. With the following mean:

E (Y ) = µY = 2, 000 + 0.8µX
Why? Remember that the operator E (Y ) means “multiply each value of
Y by its probability and add up the results.”

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Variance of a linear function of a r.v.
In turn, the variance of Y is:

var(Y ) = σY = E [(Y − µY )2 ].
2

Since Y = 2, 000 + 0.8X , then
(Y − µY ) = (2, 000 + 0.8X ) − (2, 000 + 0.8µX ) = 0.8(X − µX ).
Therefore:

E (Y − µY )2 = E {[0.8(X − µX )2 ]} = 0.64E [(X − µX )2 ].
2 2
That is: σY = 0.64σX .
And taking the positive square root of that number:

σY = 0.8σX


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Mean and var. of a linear function of a r.v.

More generally, if X and Y are r.v.’s related by Y = a + bX , then:

µY = a + bµX
2
σY = b 2 σY
2

σY = bσY


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Two random variables

We now deal with the distribution of two random variables
considered together.
The joint probability distribution of two random variables X and Y
is the probability that the random variables take certain values at
once or Pr (X = x, Y = y ).
The marginal probability distribution of a random variable Y is its
probability distribution in the context of its relationship with
(an)other variable(s).


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Multi-variate distributions
The following table shows relative frequencies (probabilities):
Joint distribution of weather conditions and commuting times
Rain (X = 0) No rain (X = 1) Total
Long commute (Y = 0) 0.15 0.07 0.22
Short commute (Y = 1) 0.15 0.63 0.78
Total 0.30 0.70 1.00

The cells show the joint probabilities. The marginal probabilities
(the marginal distribution) of Y can be calculated from the joint
distribution of X and Y . If X can take l diﬀerent values x1 , . . . , xl ,
then:
l
Pr (Y = y ) = Pr (X = xi , Y = y )
i=1


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Conditional distribution

The conditional probability that Y takes the value y when X is
known to take the value x is written Pr (Y = y |X = x).
The conditional distribution of Y given X = x is:
Pr (X = x, Y = y )
Pr (Y = y |X = x) =
Pr (X = x)


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Conditional mean
Consider the following table:
Joint and conditional distribution of M and A
M =0 M =1 M =2 M =3 M =4 Total
Joint distribution
Old car (A = 0) 0.35 0.065 0.05 0.025 0.01 0.50
New car (A = 1) 0.45 0.035 0.01 0.005 0.00 0.50
Total 0.8 0.1 0.06 0.03 0.01 1.00
Conditional distribution
Pr(M | A = 0) 0.70 0.13 0.10 0.05 0.02 1.00
Pr(M | A = 1) 0.90 0.07 0.02 0.01 0.00 1.00

The conditional expectation of Y given X (or conditional mean of
Y given X ) is the mean of the conditional distribution of Y given
X.
k
E (Y |X = x) = yi Pr (Y = yi |X = x).
i=1

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Law of iterated expectations

The mean height of adults is the weighted average of the mean
height of men and the mean height of women, weighted by the
proportions of men and women. More generally:
l
E (Y ) = E (Y |X = xi )Pr (X = xi ).
i=1

In other terms:
E (Y ) = E [E (Y |X )].
This is called the law of iterated expectations. If E (Y |X ) = 0 then
E (Y ) = 0.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Conditional variance

The variance of Y conditional on X is the variance of the
conditional distribution of Y given X :
k
var(Y |X = x) = [yi − E (Y |X = x)]2 Pr (Y = yi |X = x).
i=1

Example.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Independence

Two r.v.’s X and Y are independently distributed (i.e.
independent) if knowing the value of one of them gives no
information about the other, that is, if the conditional distribution
of Y given X equals the marginal distribution of Y . Formally, X
and Y are independent if, for all values x and y ,

Pr(Y = y |X = x) = Pr(Y = y ) or
Pr(X = x, Y = y ) = Pr(X = x) Pr(Y = y )
In other words, the joint distribution of X and Y is the product of their
marginal distributions.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Covariance
The covariance between two r.v.’s X and Y measures the extent to
which they move together. The covariance is the expected value of
the product of the deviations of the variables from their expected
values. The ﬁrst equation below is the general formula of the
covariance. The second equation is speciﬁc to discrete r.v.’s and it
assumes that X can take on l values and Y can take on k values:

cov(X , Y ) = σXY = E [(X − µX )(Y − µY )]
k l
cov(X , Y ) = (xj − µX )(yi − µY ) Pr(X = xj , Y = yi ).
i=1 j=1
Note that −∞ < σXY < +∞. How do you interpret the covariance
formula?

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Correlation

The problem with the covariance is that it is not bounded. Its size
depends on the units of X and Y and is, thus, hard to interpret.
The correlation between X and Y is another measure of their
covariation. But, unlike the covariance, the correlation eliminates
the ‘units’ problem. Its formula is:
cov(X , Y ) σXY
corr(X , Y ) = =
var(X ) var(Y ) σX σY

Note that −1 ≤ corr(X , Y ) ≤ 1.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Correlation and conditional mean

If E (Y |X = x) = E (Y ) = µY , then X and Y are uncorrelated.
That is,

cov(X , Y ) = 0 and cov(X , Y ) = 0.
This follows from the law of iterated expectations.


Outline
Topics
Expected value
Variance
Moments
Joint distributions

Mean and variance of sums of r.v.’s
The mean of the sum of two r.v.’s, X and Y , is the sum of their means:
E (X + Y ) = E (X ) + E (Y ) = µX + µY
The variance of the sum of X and Y is the sum of their variance plus
twice their covariance:

2 2
var(X + Y ) = var(X ) + var(Y ) + 2cov(X , Y ) = σX + σY + 2σXY
If X and Y are independent, the covariance is zero and the variance of
their sum is the sum of their variances:
2 2
var(X + Y ) = var(X ) + var(Y ) = σX + σY
Why?

Outline
Topics
Expected value
Variance
Moments
Joint distributions

Sums of r.v.’s
Let X , Y , and V be r.v.’s and a, b, and c be constants. These
facts follow from the deﬁnitions of mean, variance, covariance, and
correlation:
E (a + bX + cY ) = a + bµX + cµY
var(a + bY ) = b 2 σY
2

var(aX + bY ) = a2 σX + 2abσXY + b 2 σY
2 2

E (Y 2 ) = σY + µ2
2
Y
cov(a + bX + cV , Y )
E (XY ) = σXY + µX µY

|σXY | ≤ 2 2
σX σY
Can you prove them?

Statistics - Probability theory 1

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Statistics - Probability theory 1

Similaire à Statistics - Probability theory 1 (20)

Plus de Julio Huato

Plus de Julio Huato (20)

Dernier

Dernier (20)

Statistics - Probability theory 1