This document discusses key concepts in estimation theory, including:
- Point estimators are statistics used to estimate unknown parameters based on sample data. Common point estimators include the sample mean and sample proportion.
- Estimator properties like unbiasedness, efficiency, and consistency are used to evaluate which estimators perform best. The minimum variance unbiased estimator (MVUE) has the lowest possible variance.
- Asymptotic properties like consistency ensure an estimator converges in probability to the true parameter value as the sample size increases. The sample mean is a consistent estimator of the population mean.
2. Estimation theory
deals with estimating the values of parameters based on measured empirical
data that has a random component
e.g., it is desired to estimate the proportion of a population of voters who will vote
for a particular candidate.
That proportion is the parameter sought; the estimate is based on a small random
sample of voters.
Alternatively, it is desired to estimate the probability of a voter voting for a
particular candidate, based on some demographic features, such as age.
3. An estimate is a single value that is calculated based on samples
and used to estimate a population value
An estimator is a function that maps the sample space to a set of
estimates.
The entire purpose of estimation theory is to arrive at an estimator,
which
takes the sample as input and
produces an estimate of the parameters with the
corresponding accuracy
4. Two Types of Estimator
Point estimator
Interval estimator
5. Point Estimator
a statistic (that is, a function of the data) that is used to infer the value of an
unknown parameter in a statistical model
is one of the possible values a pointer estimator can assume
Mathematically, suppose there is a fixed parameter θ that needs to be
estimated and X is a random variable corresponding to the observed data.
Then an estimator of θ, denoted by the symbol , is a function of the random
variable X, and hence itself a random variable (X).
A point estimate for a particular observed dataset (i.e. for X = x) is then (x),
which is a fixed value.
ˆ
ˆ
ˆ
6. Let there be n random variables arising from a random sample X1,
X2.....Xn
The corresponding observed values of a specific random sample
are x1, x2,.....xn
Parameter Space
The range of possible values of the parameter θ is called
the parameter space Ω (the greek letter "omega")
e.g., if μ denotes the mean grade point average of all college
students, then the parameter space (assuming a 4-point grading
scale) is:
And, if p denotes the proportion of students who smoke
cigarettes, then the parameter space is:
7. Point Estimator
The function of X1,X2,⋯,Xn, that is, the statistic u(X1,X2,⋯,Xn), used to
estimate θ is called a point estimator of θ.
For example, the function:
is a point estimator of the population mean μ.
The function:
(where Xi=0 or 1) is a point estimator of the population proportion p.
8. And, the function:
is a point estimator of the population variance σ2.
Point Estimate
The function u(x1,x2,⋯,xn) computed from a set of data is an
observed point estimate of θ.
For example, if xi are the observed grade point averages of a
sample of 88 students, then:
is a point estimate of μ, the mean grade point average of all the
students in the population.
9. And, if xi= 0 if a student has no tattoo, and xi=1 if a student has a
tattoo, then:
is a point estimate of p, the proportion of all students in the population
who have a tattoo.
10. A point estimator can be evaluated based on:
Unbiasedness (mean): whether the mean of this estimator is close
to the actual parameter
Efficiency (variance): whether the standard deviation of this
estimator is close to the actual parameter
Consistency (size): whether the probability distribution of the
estimator becomes concentrated on the parameter as the sample
sizes increases
11. Sampling Error, Bias and Mean squared error
Sampling Error: The error of the estimator (X) for the parameter θ is defined as:
The bias of the estimator is defined as the expected value of the error:
The mean squared error of (X) is defined as the expected value (probability-
weighted average, over all samples) of the squared errors; namely,
The MSE, variance, and bias, are related:
X
X
e ˆ
ˆ
ˆ
ˆ
ˆ
X
E
X
e
E
X
B ˆ
ˆ
ˆ
12. If an estimator is unbiased, this implies that the estimation error is
on average zero:
Sample mean is an unbiased point estimator for population mean
µ; namely
Sample variance S2 is an unbiasedn point estimator for population
variance σ2 ; namely
X
Unbiasedness
13. Let Y1,Y2,..,YN be a random sampling from a Binomial distribution
with a success probability p. An unbiased estimator of p is
Proof: Since the Yi are i.i.d. with E(Yi) = p, then we have:
N
i
i
Y
N
p
1
1
ˆ
p
N
Np
Y
E
N
p
E
N
i
i
1
1
ˆ
Unbiasedness
14. Let Y1,Y2,..,YN be a random sampling from a uniform distribution
U[0,θ]. An unbiased estimator of θ is
Since the Yi are i.i.d. with E(Yi) = (θ +0)/2 = θ/2, then we have
N
i
i
Y
N 1
2
ˆ
i
N
i
i Y
E
N
Y
N
E
E
2
2
)
ˆ
(
1
2
2 N
N
Unbiasedness
15. Unbiasedness is interesting per se but not so much
- Absence of bias is not a sufficient criterion to discriminate
among competitive estimators
- There may exist many unbiased estimators for the same
parameter of interest
16. Assume that Y1,Y2,..,YN are i.i.d. with E(Yi) = m, the statistics
and are unbiased estimators of m
i
Y
N
m
1
ˆ1 1
2
ˆ Y
m
Since the Yi are i.i.d. with E(Yi) = m, then we have
Both estimators and of the parameter m are unbiased
m
N
Nm
Y
E
N
m
E i
)
(
1
)
ˆ
( 1
m
Y
E
m
E
)
(
)
ˆ
( 1
2
1
m̂ 2
m̂
17.
18. Efficiency
For the same parameter θ, an unbiased point estimator is
more efficient than another unbiased point estimator if
The minimum-variance unbiased estimator (MVUE) is an
unbiased estimator that has lower variance than any other
unbiased estimator for all possible values of the parameter.
1
ˆ
2
ˆ
a measure of quality of an estimator
a more efficient estimator needs fewer observations than a less
efficient one to achieve a given performance
19. For a normal distribution with unknown mean and variance:
Sample mean is the MVUE for population mean µ i.e.,
Sample variance S2 is the MVUE for population variance σ2 i.e.,
For other distributions the sample mean and sample variance are
not in general MVUEs.
X
20. Let T be an estimator for the parameter θ.
The mean squared error of T is the value
Therefore, an estimator T1 performs better than an estimator T2 if
For a more specific case, if T1 and T2 are two unbiased estimators for the same
parameter θ, then the variance can be compared to determine performance.
T2 is more efficient than T1 if the variance of T2 is smaller than the variance of T1,
i.e. for all values of θ.
21. Assume that Y1,Y2,..,YN are i.i.d. E(Yi) = m and V(Yi) = σ2, the
estimator dominates the estimator
i
Y
N
m 1
1
ˆ 1
2
ˆ Y
m
Two estimators and are unbiased, they can be
compared in terms of variance (precision)
1
m̂ 2
m̂
N
N
N
Y
V
N
m
V i
2
2
2
2
1
1
ˆ
2
1
2
ˆ
Y
V
m
V
2
1
ˆ
ˆ m
V
m
V
is preferred to
1
m̂ 2
m̂
22. Let X1,..,XN be an i.i.d. sample with pdf fX (θ;x).
Let be an unbiased estimator of θ; i.e.
If fX (θ;x) is regular then
Where denotes the Fisher information matrix for the sample
evaluated at the true value θ0.
ˆ
)
ˆ
(
E
0
1
ˆ
N
I
V Cramer – Rao Bound
0
1
N
I
Is there a bound for the variance of the unbiased estimators?
23. An estimator is efficient if its variance attains the Cramer-Rao
bound:
where denotes the Fisher information matrix associated
to the sample evaluated at the true value θ0.
0
1
ˆ
N
I
V
0
1
N
I
Efficient Estimator
BLUE Estimator
An estimator is the minimum variance linear unbiased estimator
or best linear unbiased estimator (BLUE) if it is a linear function of
the data and has minimum variance among linear unbiased
estimators
25. An estimator is constructed as a function of an available sample of size n, and
if data being collected continuously and expanding the sample ad infinitum.
A sequence of estimates indexed by n, is obtained and consistency is a
property of what occurs as the sample size “grows to infinity”.
If the sequence of estimates can be mathematically shown to converge in
probability to the true value θ0, it is called a consistent estimator; otherwise
the estimator is said to be inconsistent.
An unbiased estimator is said to be consistent if the difference between the
estimator and the target population parameter becomes smaller as we
increase the sample size.
Consistency
26. Consistency
A consistent estimate has insignificant errors (variations) as sample
sizes grow larger
The probability that those errors will vary by more than a given amount
approaches zero as the sample size increases
The more data is collected, a consistent estimator will be close to the real
population parameter
An estimator of a given parameter is said to be consistent if it converges in
probability to the true value of the parameter as the sample size tends to
infinity.
This means that the distributions of the estimates become more and more
concentrated near the true value of the parameter being estimated, so that
the probability of the estimator being arbitrarily close to θ0 converges to
one.
27. An estimator Tn of parameter θ is said to be consistent, if it converges in
probability to the true value of the parameter
i.e. if, for all ε > 0
Assume that is unbiased for θ, for all n ∈ N. Then is consistent in squared mean
for θ if and only if
Consistency in squared mean implies consistency in probability
n
ˆ
n
ˆ
0
ˆ
lim
n
n
Var
28. Let there be a sequence of observations {X1, X2, ...} from a Normal N (µ, σ2)
distribution
To estimate μ based on the first n observations, sample mean:
Tn = (X1 + ... + Xn)/n.
This defines a sequence of estimators, indexed by the sample size n.
From the properties of the normal distribution, we know the sampling
distribution of this statistic:
Tn is itself normally distributed, with mean μ and variance σ2/n.
29. Equivalently, has a standard normal distribution:
as n tends to infinity, for any fixed ε > 0.
Therefore, the sequence Tn of sample means is consistent for the population
mean μ
The variance of the sample mean is σ2/n, which decreases to zero as we
increase the sample size n. Hence, the sample mean is a consistent estimator
for µ.
X
30. Let X1,…,Xn be a s.r.s. of a r.v. with mean μ and variance σ2. Consider the
following estimators of μ:
Which one is unbiased? Which one is consistent in probability for μ?
Their expectations are respectively given by
so all of them are unbiased.
To check whether they are consistent in probability or not i.e., to check whether
their variances tend to zero.
31. Variances are respectively given by
the variance of converges to zero, which means that only is consistent in
probability for μ.
3
̂ 3
̂
32. Sufficiency
A statistic is sufficient with respect to a parameter if "no other statistic that
can be calculated from the same sample provides any additional information
as to the value of the parameter”
Let X1,X2,…,Xn be a random sample from a probability distribution with
unknown parameter θ. Then, the statistic:
is said to be sufficient for θ if the conditional distribution of X1,X2,…,Xn, given
the statistic T, does not depend on the parameter θ.
n
X
X
X
u
T ...
, 2
1
33. Let X1,X2,…,Xn be a random sample of n Bernoulli trials in which:
Xi = 1 if the ith subject likes Pepsi
Xi = 0 if the ith subject does not like Pepsi
If p is the probability that subject i likes Pepsi, for i=1,2,…,n, then:
Xi = 1 with probability p
Xi = 0 with probability q = 1−p
Suppose, in a random sample of n=40 people, that
Y = ∑Xi = 22 people like Pepsi
If the value of Y is known, the number of successes in n trials, can we gain any
further information about the parameter p by considering other functions of
the data X1,X2,…,Xn That is, is Y sufficient for p?
34. Neyman’s Factorization Theorem
Let X1,X2,…,Xn denote random variables with joint probability density function
or joint probability mass function f(x1,x2,…,xn; θ), which depends on the
parameter θ.
Then, the statistic T = u(X1,X2,…,Xn) is sufficient for θ if and only if the p.d.f (or
p.m.f.) can be factored into two components, that is:
where:
ϕ is a function that depends on the data x1,x2,…,xn only through the
function u(x1,x2,…,xn), and
the function h((x1,x2,…,xn) does not depend on the parameter θ
35. If the conditional distribution of X1,X2,…,Xn, given the statistic Y, does not depend on p,
then Y is a sufficient statistic for p.
The conditional distribution of X1,X2,…,Xn, given Y, is by definition:
(1)
Suppose for a random sample of size n=3 in which x1= 1, x2 = 0, and x3 = 1. In this case:
because the sum of the data values, , is 1 + 0 + 1 = 2, but Y, which is defined to be
the sum of the Xi's is 1.
That is, because 2≠1, the event in the numerator of the (1) is an impossible event and
therefore its probability is 0.
Consider an event that is possible, namely ( X1=1,X2=0,X3=1,Y=2). By independence:
i
X
36. So, in general:
and:
The denominator in (1) above is the binomial probability of getting exactly y successes
in n trials with a probability of success p. That is, the denominator is:
for y = 0,1,2,…,n.
Putting the numerator and denominator together, we get if y = 0,1,2,…,n, that the
conditional probability is:
and
37. Thus the conditional distribution of X1,X2,…,Xn given Y does not depend on p.
Therefore, Y is sufficient for p.
That is, once the value of Y is known, no other function of X1,X2,…,Xn will provide any
additional information about the possible value of p.
38. Let X1,X2,…,Xn denote a random sample from a Poisson distribution with parameter λ > 0.
Find a sufficient statistic for the parameter λ.
Because X1,X2,…,Xn is a random sample, the joint probability mass function
of X1,X2,…,Xn is, by independence:
The joint p.m.f. is
The joint p.m.f. is factored into two functions, one (ϕ) being only a function of the
statistic Y= and the other (h) not depending on the parameter λ:
i
X
39. Therefore, the Factorization Theorem tells that Y = is a sufficient statistic for λ.
The joint p.m.f. can also be written as
Therefore, the Factorization Theorem tells that Y = is also a sufficient statistic for λ
i
X
X