SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
PHIL 6334 - Probability/Statistics Lecture Notes 3:
Estimation (Point and Interval)
Aris Spanos [Spring 2014]
1

Introduction

In this lecture we will consider point estimation in its simplest
form by focusing the discussion on simple statistical models,
whose generic form is give in table 1.
Table 1 — Simple (generic) Statistical Model
[i] Probability model: Φ={ (; θ) θ∈Θ ∈R } 
[ii] Sampling model: X:=(1  ) is a random (IID) sample.
What makes this type of statistical model ‘simple’ is the notion of random (IID) sample.
1.1

Random sample (IID)

The notion of a random sample is defined in terms of the joint
distribution of the sample X:=(1 2  ) say  (1 2  ; θ)
for all x:=(1 2  )∈R  by imposing two probabilistic

assumptions:
(I) Independence: the sample X is said to be Independent (I) if, for all x∈R  the joint distribution splits up into

a product of marginal distributions:
Q
 (x; θ)=1(1; θ1)·2(2; θ2)· · · · (; θ):= =1  ( ; θ )
(ID) Identically Distributed: the sample X is said to
be Identically Distributed (ID) if the marginal distributions
1
are identical:
 ( ; θ )= ( ; θ) for all =1 2  
Note that this means two things, the density functions have
the same form and the unknown parameters are common to
all of them.
For a better understanding of these two crucial probabilistic
assumptions we need to simplify the discussion by focusing
first on the two r.v. variable case, which we denote by  and
 to avoid subscripts.
First, let us revisit the notion of a random variable in order
to motive the notions of marginal and joint distributions.
Example 5. Tossing a coin twice and noting the outcome.
In this case ={( ) (  ) ( ) (  )} and let us assume that the events of interest are ={() ( ) ( )}
and ={(  ) ( ) ( )} Using these two events we can
generate the event space of interest F by applying the set
theoretic operations of union (∪), intersection (∩), and complementation (−). That is, F={ ∅      ∩  };
convince yourself that this will give rise to the set of all subsets of . Let us define the real-valued functions () and
 () on  as follows:
()=( )=( )=1 (  )=0
 ( ) = ( )= (  )=1  () =0
Do these two functions define proper r.v.s with respect to F
To check that we define all possible events generated by these
functions and check whether they belong to F:
{:()=0}={(  )}=∈F {:()=1}=∈F
2
{: ()=0}={()}=∈F {: ()=1}=∈F
Hence, both functions do define proper r.v’s with respect
to F To derive their distributions we assume that we have a
fair coin, i.e. each event in  has probability .25 of occurring.
Hence, both functions do define proper r.v’s with respect to
F To derive their distributions we assume that we have a fair
coin, i.e. each event in  has probability .25 of occurring.
{:()=0}=  (=0)=25 {: ()=0}=  (=0)=25
{:()=1}=  (=1)=75 {: ()=1}=  (=1)=75
Hence, their ‘marginal’ density functions take the form:

0 1
 () 25 75


0 1
 () 25 75

(1)

How can one define the joint distribution of these two r.v.s?
To define the joint density function we need to specify all the
events:
(=  =) ∈R  ∈R 
denoting ‘their joint occurrence’, and then attach probabilities to these events. These events belong to F by definition
because as a field is closed under the set theoretic operations
∪ ∩  so that:
(=0  =0)=
(=0  =1)=
(=1  =0)=
(=1  =1)=

{}=∅
{(  )}
{()}
{( ) ( )}
3

 (=0 =0)=
 (=0 =1)=
 (=1 =0)=
 (=1 =1)=

0
25
25
50
Hence, the joint density is defined by:
Â
0

0 1
0 25

1

25 50

(2)

How is the joint density (2) connected to the individual (marginal) densities given in (1)? It turns out that if we sum
over the rows of the above table for each value of , i.e. use
P
∈R  ( )= () we will get the marginal distribution
of  :  () ∈R  and if we sum over the columns for each
P
value of  , i.e. use ∈R  ( )= () we will get the
marginal distribution of :  () ∈R :
Â
0

0 1  ()
0 25 .25

1
25 50
 () .25 .75

.75
1

(3)

Note: ()=0(25)+1(75)=75 = ( )
 ()=(0−75)2(25)+(1−75)2(75)=1875 =  ( )
Armed with the joint distribution we can proceed to define the notions of Independence and Identically Distributed
between the r.v’s  and  .
Independence. Two r.v’s  and  are said to be Independent iff:
 ( )= ()· () for all values ( )∈R × R 

(4)

That is, to verify that these two r.v’s are independent we need
to confirm that the probability of all possible pairs of values
( ) satisfies (4).
4
Example. In the case of the joint distribution in (3)
we can show that the r.v’s are not independent because for
( )=(0 0):
 (0 0)=0 6=  (0)· (0)=(25)(25)
It is important to emphasize that the above condition of
Independence is not equivalent to the two random variables
being uncorrelated:
(  )=0 9  ( )= ()· () for all ( )∈R ×R 
where ‘9’ denotes ‘does not imply’. This is because (  )
is a measure of linear dependence between  and  since it
is based on the covariance defined by:
(  ) =[(-())( -( ))=2(0)(0-75) + (25)(0-75)(1-75)+
+(25)(0−75)(1−75) + (5)(1−75)(1−75) = −0625
A standardized covariance yields the correlation:
( )
= −0625 =
1875
 ()· ( )

(  )= √

−1
3

The intuition underlying this result is that the correlation involves only the first two moments [mean, variance, covariance]
of  and  but independence is defined in terms of the density functions; the latter, in principle, involves all moments,
not just the first two!
Identically Distributed. Two r.v’s  and  are said
to be Identically Distributed iff:
 (; )= (; ) for all values ( )∈R × R 

(5)

Example. In the case of the joint distribution in (3) we
can show that the r.v’s are identically distributed because (5)
5
holds. In particular, both r.v’s  and  take the same values
with the same probabilities.
To shed further light on the notion of IID, consider the
three bivariate distributions given below.
Â

1

2

 ()

Â

0

1

 ()

Â

0

1

 ()

0

018 042

06

0

018 042

06

0

036 024

06

2

012 028

04

1

012 028

04

1

024 016

04

1

 ()

1

 ()

 () 03

07

(A)

03

07

(B)

06

04

(C)

(I)  and  are Independent iff:
 ( )= ()· () for all ( )∈R × R 

(6)

(ID)  and  are Identically Distributed iff:
 () =  () for all ( )∈R × R  = and R =R 
The random variables  and  are independent in all three
cases since they satisfy (4) (verify!).
The random variables in (A) are not Identically Distributed
because R 6=R  and  ()6= () for some ( )∈R ×R 
The random variables in (B) are not Identically Distributed because even though R =R   ()6= () for some
( )∈R × R 
Finally, the random variables in (C) are Identically Distributed because R =R  and  ()= () for all ( )∈R ×
R 
6

1
2

Point Estimation: an overview

It turns out that all forms of frequentist inference, which include point and interval estimation, hypothesis testing and
prediction, are defined in terms of two sets:
X — sample space:
the set of all possible values of the sample X
Θ — parameter space: the set of all possible values of θ
Note that the sample space X is always a subset of R and
denoted by R 

In estimation the objective is to use the statistical information to infer the ‘true’ value ∗ of the unknown parameter, whatever that happens to be, as along as it belongs to
Θ
In general, an estimator b of  is a mapping (function)

from the sample space to the parameter space:
b X → Θ
():

(7)

Example 1. Let the statistical model of interest be the
simple Bernoulli model (table 2) and consider the question of
estimating the unknown parameter  whose parameter space
is Θ:=[0 1] Note that the sample space is: X:={0 1}
Table 2 - Simple Bernoulli Model
Statistical GM:  = +   ∈N.
⎫
[1] Bernoulli:
 v Ber( )  =0 1 ⎬
[2] constant mean:
( )=
∈N.
⎭
[3] constant variance:  ( )=(1−)
[4] Independence: {  ∈N} is an independent process
7
The notation b
(X) is used to denote an estimator in order to bring out the fact that it is a function of the sample
X and for different values it generates the sampling distribution  (b
(x); ) for x∈X. Post-data b
(X) yields an estimate b 0) which constitutes a particular value of b
(x
(X) corresponding to data x0 Crucial distinction: b
(X)-estimator
(Plato’s world), b 0)-estimate (real world), and -unknown
(x
constant (Plato’s world); Fisher (1922).
In light of the definition in (7), which of the following mappings constitute potential estimators of ?
Table 3: Estimators of ?
[a] b1(X)=

[b] b2(X)=1 − 

[c] b3(X)=(1 + )2

¡ ¢
b (X)= 1 P  for some   3
[d] 
=1
¡
¢
b+1(X)= 1 P 
[e] 
=1
+1

Do the mappings [a]-[e] in table 3 constitute estimators of
? All five functions [a]-[e] have X as their domain, but is the
range of each mapping a subset of Θ:=[0 1]? Mapping [a],
[c]-[e] can be possible estimators of  because their ranges are
subsets of [0 1], but [b] cannot not because it can take the
value −1 [ensure you understand why!] which lies outside the
parameter space of 
One can easily think of many more functions from X to Θ
that will qualify as possible estimators of  Given the plethora
of such possible estimators, how does one decide which one is
the most appropriate?
8
To answer that question let us think about the possibility of an ideal estimator, ∗():X → ∗ i.e., ∗(x)=∗ for
all values x∈X . That is, ∗(X) pinpoints the true value ∗
of  whatever the data. A moment’s reflection reveals that
no such estimator could exist because X is a random vector
with its own distribution  (x; ) for all x∈X. Moreover, in
view of the randomness of X, any mapping of the form (7)
will be a random variable with its own sampling distribution,
 (b
(x); ) which is directly derivable from  (x; ). Let us
take stock of these distributions.
Let us keep track of these distributions and where they
come from. The distribution of the sample  (x; ) for all x∈X
is given by the assumptions of statistical model in question.
I In the above case of the simple Bernoulli model, we can
combine assumptions [2]-[4] to give us:
[2]-[4] Y
 (x; ) =
 ( ; )
=1

and then use [1]:  ( ; )=(1 − )1− =1 2   to determine  (x; ):
P
[2]-[4] Y
[1]-[4] P 
=1
 (x; ) =
 ( ; ) = 
(1−) =1 1− = (1−)− 
=1
P
where = =1  , and one can show that :
P
 =   v Bin( (1 − ))
(8)
=1
i.e.  is Binomially distributed. note that the means and
variances are derived using the two formulae:
(i) (1 + 2 + )=(1) + (2) + 
2

2

(ii)  (1 + 2 + )=  (1) +   (2)
9

(9)
To derive the mean and variance of  :
P
P
(i) P
( ) =  (  ) =  ()=  =
=1
=1
=1

P
P
(ii) P
 ( )=  (  ) =   ()=  (1−)=(1−)
=1
=1
=1

The result in (8) is a special case of a general result.
¥ The sampling distribution of any (well-behaved) function
of the sample, say =(1 2  ) can be derived from
 (x; ) x∈X using the formula:
R
R
()=P( ≤ )= ··· {x: (x)≤}  (x; θ)x ∈R (10)
In the Bernoulli case, all the estimators [a], [c]-[e] are linear
functions of (1 2  ) and thus, by (8), their distribution is Binomial. In particular,

Table 4: Estimators and their sampling distributions
[a] b1(X)= v Ber( (1−))

³
´
b3(X)=(1 + )2 v Bin  [ (1−) ]
[c] 
2
³
´
¡ 1 ¢ P
(1−)
[d] b(X)= 

=1  v Bin  [  ]  for   3
³
´
¡ 1 ¢ P
(1−)

[e] b+1(X)= +1

=1  v Bin +1  [ (+1)2 ]

(11)

It is important to emphasize at the outset that the sampling
distributions [a]-[e] are evaluated under =∗ where ∗ is the
true value of 
It is clear that none of the sampling distributions of the
estimators in table 4 resembles that of the ideal estimator,
∗(X), whose sampling distribution, if it exists, would be of
the form:
(12)
[i] P(∗(X)=∗)=1
10
In terms of its first two moments, the ideal estimator satisfies
[ii] (∗(X))=∗and [iii]  (∗(X))=0 In contrast to the
(infeasible) ideal estimator in (12), when the estimators in
table 4 infer  using an outcome x, the inference is always
subject to some error because the variance is not zero. The
sampling distributions of these estimators provide the basis
for evaluating such errors.
In the statistics literature the evaluation of inferential
errors in estimation is accomplished in two interconnected
stages.
The objective of the first stage is to narrow down the set
of all possible estimators of  to an optimal subset, where
optimality is assessed by how closely the sampling distribution
of an estimator approximates that of the ideal estimator in
(12); the subject matter of section 3.
The second stage is concerned with using optimal estimators to construct the shortest Confidence Intervals (CI)
for the unknown parameter  based on prespecifying the error of covering (encompassing) ∗ within a random interval of
the form ((X) (X)); the subject matter of section 4.
3

Properties of point estimators

As mentioned above, the notion of an optimal estimator can
be motivated by how well the sampling distribution of an estimator b(X) approximates that of the ideal estimator in (12).

In particular, the three features of the ideal estimator [i]-[iii]
motivate the following optimal properties of feasible estimators.
11
Condition [ii] motivates the property known as:
[I] Unbiasedness: An estimator b
(X) is said to be an unbiased for  if:
(13)
(b
(X))=∗
That is, the mean of the sampling distribution of b
(X) coincides with the true value of the unknown parameter 
Example. In the case of the simple Bernoulli model,
we can see from table 4 that the estimators b1(X) b3(X)


and b(X) are unbiased since in all three cases (13) is satis
fied. In contrast, estimator b+1(X) is not unbiased because

´
³

 b+1(X) = +1  6= 

Condition [iii] motivates the property known as:
[II] Full Efficiency: An unbiased estimator b(X) is said

to be a fully efficient estimator of  if its variance is as small
as it can be, where the latter is expressed by:
´i−1
h ³
(x;)
b(X))=():=  − 2 ln  2
 (



where ‘()’ stands for the Cramer-Rao lower bound; note
that  (x; ) is given by the assumed model.
Example (the derivations are not important!). In the case
of the simple Bernoulli model:
P
ln  (x; )= ln +(− ) ln(1−) where  = =1   ( )=



 ln  (x;)
1
= ( )( 1 ) − ( −  )( 1− )


2 ln  (x;)

1
=− ( 12 )−(− )( 1− )2
2

´
³ 2
(x;)
1
=( 12 )( ) + [ − ( )]( 1− )2
−  ln2

and thus the Cramer-Rao lower bound is:
():= (1−) 

12


= (1−) 
Looking at the estimators of  in (12) it is clear that only
one unbiased estimator achieves that bound, b(X) Hence,

b(X) is the only estimator of  which is both unbiased and

fully efficient.
Comparisons between unbiased estimators can be made in
terms of relative efficiency:
 (b1(X))   (b2(X)) for   2




asserting that b2(X) is relatively more efficient than b1(X)

but one needs to be careful with such comparisons because
they can be very misleading when both estimators are bad,
as in the case above; the fact that b2(X) is relatively more

efficient than b1(X) does not mean that the former is even

an adequate estimator. Hence, relative efficiency is not something to write home about!
What renders these two estimators practically useless? An asymptotic property motivated by condition [i] of
the ideal estimator, known as consistency.
Intuitively, an estimator b(X) is consistent when its preci
sion (how close to ∗ is) improves as the sample size increases.
Condition [i] of the ideal estimator motivates the property
known as:
[III] Consistency: an estimator b(X) is consistent if:


Strong: P(lim→∞ b(X)=∗)=1
(14)
¯
³¯
´
¯b
∗¯
Weak: lim→∞ P ¯(X) −  ¯ ≤  =1

That is, an estimator b(X) is consistent if it approximates

(probabilistically) the sampling distribution of the ideal es13
timator asymptotically; as  → ∞ The difference between
strong and weak consistency stems from the form of probabilistic convergence they involve, with the former being stronger
than the latter. Both of these properties constitute an extension of the Strong and Weak Law of Large Numbers (LLN)
P
1
which hold for the sample mean  =    of a process
=1
{  =1 2   } under certain probabilistic assumptions,
the most restrictive being that the process is IID; see Spanos
(1999), ch. 8.
0.65

0.60

0.60

Sample average

0.70

0.65

Sample Average

0.70

0.55
0.50

0.55
0.50

0.45

0.45

0.40

0.40
1

20

40

60

80

100

120

140

160

180

1

200

100

200

300

400

500

600

700

800

900

1000

Inde x

Inde x

Fig. 2: t-plot of  for a BerIID
realization with =1000

Fig. 1: t-plot of  for a BerIID
realization with =200

In practice, it is no-trivial to prove that a particular estimator is consistent or not by verifying directly the conditions
in (14). However, there is often a short-cut for verifying consistency in the case of unbiased estimators using the sufficient
³
´
condition:
lim   b(X) =0

(15)
→∞

Example. In the case of the simple Bernoulli model, one
can verify that the estimators b1(X) and b3(X) are inconsis

tent because:
³
´
³
´
b1(X) =(1−)6=0 lim   b1(X) = (1−) 6=0
lim   

2
→∞

→∞

14
i.e. their variances do not decrease to zero as the sample size
 goes to infinity.
In contrast, the estimators b(X) and b+1(X) are consis

tent because:
³
´
lim   b(X) = lim ( (1−) )=0 lim (b+1(X))=0



→∞

→∞

→∞

Note that ‘MSE’ denotes the ‘Mean Square Error’, defined by:
(b; ∗)= (b) + [(b; ∗)]2




where (b; ∗)=(b)−∗ Hence:



lim (b)=0 if (a) lim  (b)=0 and (b) lim (b)=∗




→∞

→∞

→∞


where (b) is equivalent to lim (b; ∗)=0
→∞
Let us take stock of the above properties and how they
can used by the practitioner in deciding which estimator is
optimal. The property which defines minimal reliability for
an estimator is that of consistency. Intuitively, consistency
indicates that as the sample size increases [as  → ∞] the
estimator b(X) approaches ∗ the true value of  in some

probabilistic sense; convergence almost surely or convergence
in probability. Hence, if an estimator b(X) is not consistent,

it is automatically excluded from the subset of potentially
optimal estimators, irrespective of any other properties this
estimator might enjoy. In particular, an unbiased estimator
which is inconsistent is practically useless. On the other hand,
just because an estimator b(X) is consistent does not imply

that it’s a ‘good’ estimator; it only implies that it’s minimally
acceptable.
15
¥ It is important to emphasize that the properties of unbiasedness and fully efficiency hold for any sample size   1
and thus we call them finite sample properties, but consistency
is an asymptotic property because it holds as  → ∞
Example. In the case of the simple Bernoulli model, if
the choice between estimators is confined (artificially) among
the estimators b1(X) b3(X) and b+1(X) the latter estimator



should be chosen, despite being biased, because it’s a consistent estimator of  On the other hand, among the estimators
given in table 4, b (X) is clearly the best (most optimal) be
cause it satisfies all three properties. In particular, b(X), not

only satisfies the minimal property of consistency, but it also
has the smallest variance possible, which means that it comes
closer to the ideal estimator than any of the others, for any
sample size   2 The sampling distribution of b(X), when

∗
evaluated under = , takes the form:
³
´
∗
¡ ¢
∗
∗
b(X)= 1 P  = Bin ∗ [  (1− ) ] 
(16)
v
[d] 
=1


whatever the ‘true’ value ∗ happens to be.

Additional Asymptotic properties
In addition to the properties of estimators mentioned above,
there are certain other properties which are often used in practice to decide on the optimality of an estimator. The most
important is given below for completeness.
[V] Asymptotic Normality: an estimator b(X) is said

to be asymptotically Normal if:
´
√ ³b
 (X) −  v N(0 ∞()) ∞()6=0
(17)

where ‘v’ stands for ‘can be asymptotically approximated by’.


16
This property is an extension of a well-known result in
probability theory: the Central Limit Theorem (CLT).
The CLT asserts that, under certain probabilistic assumptions
on the process {  =1 2   } , the most restrictive
being that the process is IID, the sampling distribution of
P
1
 =    for a ‘large enough’  can be approximated
=1
by the Normal distribution (Spanos, 1999, ch. 8):

(√−())
v N(0 1)
(18)


 (  )

Note that the important difference between (17) and (18) is
that b(X) in the former does not have to coincide with  ;

it can be any well-behaved function (X) of the sample X.
Example. In the case of the simple Bernoulli model the
sampling distribution of b (X) which we know is Binomial

(see (16)), it can also be approximated using (18). In the
graph below we compare the Normal approximation to the
Binomial for =10 and =20 in the case where =5, and the
improvement is clearly noticeable.
Distribution Plot

Distribution Plot
0.20

Distribution n p
Binomial
10 0.5

0.4

Distribution n p
Binomial
20 0.5

Distribution Mean StDev
Normal
5
1

Distribution Mean StDev
Normal
10
2.236

0.15

Density

Density

0.3

0.2

0.05

0.1

0.0

0.10

0

2

4

6

8

0.00

10

X

Normal approx. of Bin.:
 (; =.5 =10)

2

4

6

8

10
X

12

14

16

18

Normal approx. of Bin.:
 (; =.5 =20)

17
4

Confidence Intervals (CIs): an overview

4.1

An optimal CI begins with an optimal point estimator

Example 2. Let us summarize the discussion concerning
point estimation by briefly discussing the simple (one parameter) Normal model, where =1 (table 5).
Table 5 - Simple Normal Model (one unknown parameter)
Statistical GM:
 = +   ∈N={1 2 }
⎫
[1] Normality:
 v N( )  ∈R ⎬
[2] Constant mean:
( )=
∈N.
⎭
[3] Constant variance:  ( )= 2 (known)
[4] Independence: {  ∈N} independent process
In section 3 we discussed the question of choosing among numerous possible estimators of  such as [a]-[e] (table 6) using
their sampling distributions. These results stem from the following theorem. If X:=(1 2  ) is a random (IID)
sample from the Normal distribution, i.e.
 v NIID(  2) ∈N:=(1 2   )
P
then the sampling distribution of =1  is:
P
2
(19)
=1  v N(  )
Among the above estimators the sample mean for 2=1:
¡ 1¢
P
1
 (X):=  =    v N  
b
=1
constitutes the optimal point estimator of  because it is:
[U] Unbiased (( )=∗),
[FE] Fully Efficient ( ( )=()), and
[SC] Strongly Consistent (P(lim→∞  =∗)=1).
18
Table 6: Estimators

UN FE SC

1(X)= v N( 1)
b
2(X)=1 −  v N(0 2)
b
3(X)=(¢ + )2 v N( 1 )¢
b
¡ 1 1 P
¡ 21
 (X)= 
b

=1  v N ³ 
´
P
1


[e] +1(X)= +1 =1 vN +1  (+1)2
b

X
×
X
X

×
×
×
X

×
×
×
X

×

×

X

[a]
[b]
[c]
[d]

Given that any ‘decent’ estimator (X) of  is likely to
b
yield any value in the interval (−∞ ∞)  can one say something more about its reliability than just "on average" its
values (x) for x∈X are more likely to occur around ∗ (the
b
true value) than those further away?
4.2

What is a Confidence Interval?

I This is what a Confidence Interval (CI) proposes to address.
In general, a 1− CI for  takes the generic form:
P((X) ≤ ∗ ≤ (X))=1−
where (X) and (X) denote the lower and upper (random)
bounds of this CI. The 1− is referred to as the confidence
level and represents the coverage probability of the CI:
(X;)=((X) (X))
in the sense that the probability that the random interval
(X) covers (overlays) the true ∗ is equal to (1−) 
This is often envisioned in terms of a long-run metaphor
of repeating the experiment underlying the statistical model
in question in order to get a sequence of outcomes (realizations
19
of X) x each of which will yield an observed
(x ;) =1 2  
0
In the context of this metaphor (1−) denotes the relative
frequency of the observed CIs that will include (overlay) ∗
Example 2. In the case of the simple (one parameter)
Normal model (table 5), let us consider the question of constructing 95 CIs using the different unbiased estimators of 
in table 6:
[a] P(b1(X)−196 ≤ ∗ ≤ 1(X)+196)=95

b

1
1
[c] P(b3(X)−196( √2 ) ≤ ∗ ≤ 3(X)+196( √2 ))=95

b
1
1
[d] P( −196( √ ) ≤ ∗ ≤  +196( √ ))=95

(20)

How do these CIs differ? The answer is in terms of their
precision (accuracy).
One way to measure precision for CIs is to evaluate their
length:
³
´
³
´
1
1
√
[a]: 2 (196) =392 [c]: 2 196( √2 ) =2772 [d]: 2 196( √ ) = 392

It is clear from this evaluation that the CI associated with
P
1
 =    is the shortest for any   2; e.g. for =100
=1
392
the length of this CI is √100 =392

20
∗
1.
` − − − −  − − − − a
2.
` − − − − − − − − −− a
3.
` − − − −  − − − − a
4.
` − − − −  − − − − a
5.
` − − − −  − − − − a
6.
` − − − −  − − − − a
7.
` − − − −  − − − − a
8.
` − − − −  − − − − a
9.
` − − − −  − − − − a
10.
` − − − −  − − − − a
11.
` − − − −  − − − − a
12.
` − − − −  − − − − a
13.I ` − − −−− − −− a
14.
` − − − −  − − − − a
15.
` − − − −  − − − − a
16.
` − − − −  − − − − a
17.
` − − − − − − − − −− a
18.
` − − − −  − − − − a
19.
` − − − −  − − − − a
20.
` − − − −  − − − − a
21
4.3

Constructing Confidence Intervals (CIs)

More generally, the sampling distribution of optimal estimator
  gives rise to a pivot (a function of the sample and  whose
distribution is known):
¢ =∗
√ ¡
(21)
   −  v N(0 1)
which can be used to construct the shortest CI among all
(1−) CIs for :
1
1
P( −  ( √ ) ≤ ∗ ≤  +  ( √ )) = (1−) 
(22)
2
2

where P(|| ≥   )=(1−) for  vN(0 1) (figures 1-2).
2
Example 3. In the case where  2 is unknown, and we use
P
2=[1( − 1)]  ( −  )2 to estimate it, the pivot in
=1
(21) takes the form:
√
(  −) =∗
(23)
v St( − 1)

where St(−1) denotes the Student’s t distribution with (−1)
degrees of freedom.
Step 1. Attach a (1−) coverage probability using (23):
√
(  −)
≤   ) = (1−) 
P(−  ≤

2
2

where P(| | ≥   )=(1−) for  vSt(−1)
2
√
(  −)
to isolate  to derive the CI:
Step 2. Re-arrange

√
¡
¢
(  −)


P(−  ≤
≤   )=P(−  ( √ ) ≤   −  ≤   ( √ ))=

2
2
2
2


=P(− −  ( √ ) ≤ − ≤ −  +   ( √ ))=
2
2


= P( −  ( √ ) ≤ ∗ ≤  +  ( √ ))= (1−) 
2
2

In figures 1-2 the underlying distribution is Normal and in
figures 3-4 it’s Student’s t with 19 degrees of freedom. One
22
can see that, while the tail areas are the same for each , the
threshold values for the Normal   for each  are smaller than
2
∗
the corresponding values   for the Student’s t because the
2
latter has heavier tails due to the randomness of 2
Distribution Plot

Distribution Plot

Normal, Mean=0, StDev=1

Normal, Mean=0, StDev=1

0.3
Density

0.4

0.3
Density

0.4

0.2

0.1

0.2

0.1
0.025

0.0

0.05

0.025
-1.96

0
X

0.0

1.96

0.05
-1.64

0
X

1.64

Fig. 1: P(|| ≥ 025)=.95
for  vN(0 1)

Fig. 2: P(|| ≥ 05)=.90
for  vN(0 1)

Distribution Plot

Distribution Plot

T, df=20

T, df=20

0.3
Density

0.4

0.3
Density

0.4

0.2

0.1

0.1
0.025

0.0

0.2

0.05

0.025
-2.09

0
X

0.0

2.09

Fig. 3: P(| | ≥ 025)=.95
for  vSt(19)

0.05
-1.72

0
X

1.72

Fig. 4: P(| | ≥ 05)=.90
for  vSt(19)

23
5

Summary and conclusions

The primary objective in frequentist estimation is to learn
about ∗ the true value of the unknown parameter  of interest using its sampling distribution (b ∗) associated with
;
particular sample size  The finite sample properties are de;
fined directly in terms of (b ∗) and the asymptotic properties are defined in terms of the asymptotic sampling distri;
;
bution ∞(b ∗) aiming to approximate (b ∗) at the limit
as  → ∞
The question that needs to be considered at this stage is:
what combination of the above mentioned
properties specifies an ‘optimal’ estimator?
A necessary but minimal property for an estimator is consistency (preferably strong). By itself, however, consistency
does not secure learning from data for a given ; it’s a promissory note for potential learning. Hence, for actual learning
one needs to supplement consistency with certain finite sample properties like unbiasedness and efficiency to ensure that
learning can take place with the particular data x0:=(1 2  )
of sample size .
Among finite sample properties full efficiency is clearly
the most important because it secures the highest degree of
learning for a given  since it offers the best possible precision.
Relative efficiency, although desirable, needs to be investigated further to find out how large is the class of estimators being compared before passing judgement. Being the
24
best econometrician in my family, although worthy of something,
does not make me a good econometrician!!
Unbiasedness, although desirable, is not considered indispensable by itself. Indeed, as shown above, an unbiased
but inconsistent estimator is practically useless, and a consistent but biased estimator is always preferable.
Hence, a consistent, unbiased and fully efficient estimator sets the gold standard in estimation.
In conclusion, it is important to emphasize that point estimation is often considered inadequate for the purposes of
scientific inquiry because a ‘good’ point estimator b(X) by

itself, does not provide any measure of the reliability and
precision associated with the estimate b(x0); one would be

wrong to assume that b(x0) ' ∗ This is the reason why

b(x0) is often accompanied by its standard error [the es
q

timated standard deviation  (b(X))] or the p-value of
some test of significance associated with the generic hypothesis =0.
Interval estimation rectifies this weakness of point estimation by providing the relevant error probabilities associated with inferences pertaining to ‘covering’ the true value
∗ of 

25

Contenu connexe

Tendances

Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)jemille6
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1Padma Metta
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2Padma Metta
 
Chap08 estimation additional topics
Chap08 estimation additional topicsChap08 estimation additional topics
Chap08 estimation additional topicsJudianto Nugroho
 
Sample sample distribution
Sample sample distributionSample sample distribution
Sample sample distributionNur Suaidah
 
Applied Business Statistics ,ken black , ch 6
Applied Business Statistics ,ken black , ch 6Applied Business Statistics ,ken black , ch 6
Applied Business Statistics ,ken black , ch 6AbdelmonsifFadl
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorChristian Robert
 
Applied Business Statistics ,ken black , ch 3 part 2
Applied Business Statistics ,ken black , ch 3 part 2Applied Business Statistics ,ken black , ch 3 part 2
Applied Business Statistics ,ken black , ch 3 part 2AbdelmonsifFadl
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMichael Ogoy
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 

Tendances (20)

Chap09 hypothesis testing
Chap09 hypothesis testingChap09 hypothesis testing
Chap09 hypothesis testing
 
Chapter11
Chapter11Chapter11
Chapter11
 
Chapter13
Chapter13Chapter13
Chapter13
 
Chapter7
Chapter7Chapter7
Chapter7
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
 
Chapter4
Chapter4Chapter4
Chapter4
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
 
Chap08 estimation additional topics
Chap08 estimation additional topicsChap08 estimation additional topics
Chap08 estimation additional topics
 
Chapter11
Chapter11Chapter11
Chapter11
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chapter15
Chapter15Chapter15
Chapter15
 
Chapter14
Chapter14Chapter14
Chapter14
 
Sample sample distribution
Sample sample distributionSample sample distribution
Sample sample distribution
 
Applied Business Statistics ,ken black , ch 6
Applied Business Statistics ,ken black , ch 6Applied Business Statistics ,ken black , ch 6
Applied Business Statistics ,ken black , ch 6
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factor
 
Applied Business Statistics ,ken black , ch 3 part 2
Applied Business Statistics ,ken black , ch 3 part 2Applied Business Statistics ,ken black , ch 3 part 2
Applied Business Statistics ,ken black , ch 3 part 2
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random Variable
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 

Similaire à Spanos lecture+3-6334-estimation

Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...
Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...
Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...Marco Benini
 
Fuzzy Group Ideals and Rings
Fuzzy Group Ideals and RingsFuzzy Group Ideals and Rings
Fuzzy Group Ideals and RingsIJERA Editor
 
ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)CrackDSE
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTmathsjournal
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTmathsjournal
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTmathsjournal
 
The Weak Solution of Black-Scholes Option Pricing Model with Transaction Cost
The Weak Solution of Black-Scholes Option Pricing Model with Transaction CostThe Weak Solution of Black-Scholes Option Pricing Model with Transaction Cost
The Weak Solution of Black-Scholes Option Pricing Model with Transaction Costmathsjournal
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTmathsjournal
 
Nbhm m. a. and m.sc. scholarship test 2006
Nbhm m. a. and m.sc. scholarship test 2006Nbhm m. a. and m.sc. scholarship test 2006
Nbhm m. a. and m.sc. scholarship test 2006MD Kutubuddin Sardar
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...mathsjournal
 
Engr 371 final exam april 1996
Engr 371 final exam april 1996Engr 371 final exam april 1996
Engr 371 final exam april 1996amnesiann
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)CrackDSE
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfAlexander Litvinenko
 
ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)CrackDSE
 
Statistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docxStatistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docxrafaelaj1
 

Similaire à Spanos lecture+3-6334-estimation (20)

Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...
Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...
Intuitionistic First-Order Logic: Categorical semantics via the Curry-Howard ...
 
Fuzzy Group Ideals and Rings
Fuzzy Group Ideals and RingsFuzzy Group Ideals and Rings
Fuzzy Group Ideals and Rings
 
ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
 
The Weak Solution of Black-Scholes Option Pricing Model with Transaction Cost
The Weak Solution of Black-Scholes Option Pricing Model with Transaction CostThe Weak Solution of Black-Scholes Option Pricing Model with Transaction Cost
The Weak Solution of Black-Scholes Option Pricing Model with Transaction Cost
 
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COSTTHE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
THE WEAK SOLUTION OF BLACK-SCHOLE’S OPTION PRICING MODEL WITH TRANSACTION COST
 
Nbhm m. a. and m.sc. scholarship test 2006
Nbhm m. a. and m.sc. scholarship test 2006Nbhm m. a. and m.sc. scholarship test 2006
Nbhm m. a. and m.sc. scholarship test 2006
 
Estimation rs
Estimation rsEstimation rs
Estimation rs
 
Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)
 
Probability Distribution
Probability DistributionProbability Distribution
Probability Distribution
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
Engr 371 final exam april 1996
Engr 371 final exam april 1996Engr 371 final exam april 1996
Engr 371 final exam april 1996
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
chap2.pdf
chap2.pdfchap2.pdf
chap2.pdf
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)ISI MSQE Entrance Question Paper (2010)
ISI MSQE Entrance Question Paper (2010)
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
Statistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docxStatistics Assignment 1 HET551 – Design and Developm.docx
Statistics Assignment 1 HET551 – Design and Developm.docx
 

Plus de jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 

Plus de jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 

Dernier

Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 

Dernier (20)

Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 

Spanos lecture+3-6334-estimation

  • 1. PHIL 6334 - Probability/Statistics Lecture Notes 3: Estimation (Point and Interval) Aris Spanos [Spring 2014] 1 Introduction In this lecture we will consider point estimation in its simplest form by focusing the discussion on simple statistical models, whose generic form is give in table 1. Table 1 — Simple (generic) Statistical Model [i] Probability model: Φ={ (; θ) θ∈Θ ∈R }  [ii] Sampling model: X:=(1  ) is a random (IID) sample. What makes this type of statistical model ‘simple’ is the notion of random (IID) sample. 1.1 Random sample (IID) The notion of a random sample is defined in terms of the joint distribution of the sample X:=(1 2  ) say  (1 2  ; θ) for all x:=(1 2  )∈R  by imposing two probabilistic  assumptions: (I) Independence: the sample X is said to be Independent (I) if, for all x∈R  the joint distribution splits up into  a product of marginal distributions: Q  (x; θ)=1(1; θ1)·2(2; θ2)· · · · (; θ):= =1  ( ; θ ) (ID) Identically Distributed: the sample X is said to be Identically Distributed (ID) if the marginal distributions 1
  • 2. are identical:  ( ; θ )= ( ; θ) for all =1 2   Note that this means two things, the density functions have the same form and the unknown parameters are common to all of them. For a better understanding of these two crucial probabilistic assumptions we need to simplify the discussion by focusing first on the two r.v. variable case, which we denote by  and  to avoid subscripts. First, let us revisit the notion of a random variable in order to motive the notions of marginal and joint distributions. Example 5. Tossing a coin twice and noting the outcome. In this case ={( ) (  ) ( ) (  )} and let us assume that the events of interest are ={() ( ) ( )} and ={(  ) ( ) ( )} Using these two events we can generate the event space of interest F by applying the set theoretic operations of union (∪), intersection (∩), and complementation (−). That is, F={ ∅      ∩  }; convince yourself that this will give rise to the set of all subsets of . Let us define the real-valued functions () and  () on  as follows: ()=( )=( )=1 (  )=0  ( ) = ( )= (  )=1  () =0 Do these two functions define proper r.v.s with respect to F To check that we define all possible events generated by these functions and check whether they belong to F: {:()=0}={(  )}=∈F {:()=1}=∈F 2
  • 3. {: ()=0}={()}=∈F {: ()=1}=∈F Hence, both functions do define proper r.v’s with respect to F To derive their distributions we assume that we have a fair coin, i.e. each event in  has probability .25 of occurring. Hence, both functions do define proper r.v’s with respect to F To derive their distributions we assume that we have a fair coin, i.e. each event in  has probability .25 of occurring. {:()=0}=  (=0)=25 {: ()=0}=  (=0)=25 {:()=1}=  (=1)=75 {: ()=1}=  (=1)=75 Hence, their ‘marginal’ density functions take the form:  0 1  () 25 75  0 1  () 25 75 (1) How can one define the joint distribution of these two r.v.s? To define the joint density function we need to specify all the events: (=  =) ∈R  ∈R  denoting ‘their joint occurrence’, and then attach probabilities to these events. These events belong to F by definition because as a field is closed under the set theoretic operations ∪ ∩  so that: (=0  =0)= (=0  =1)= (=1  =0)= (=1  =1)= {}=∅ {(  )} {()} {( ) ( )} 3  (=0 =0)=  (=0 =1)=  (=1 =0)=  (=1 =1)= 0 25 25 50
  • 4. Hence, the joint density is defined by: Â 0 0 1 0 25 1 25 50 (2) How is the joint density (2) connected to the individual (marginal) densities given in (1)? It turns out that if we sum over the rows of the above table for each value of , i.e. use P ∈R  ( )= () we will get the marginal distribution of  :  () ∈R  and if we sum over the columns for each P value of  , i.e. use ∈R  ( )= () we will get the marginal distribution of :  () ∈R : Â 0 0 1  () 0 25 .25 1 25 50  () .25 .75 .75 1 (3) Note: ()=0(25)+1(75)=75 = ( )  ()=(0−75)2(25)+(1−75)2(75)=1875 =  ( ) Armed with the joint distribution we can proceed to define the notions of Independence and Identically Distributed between the r.v’s  and  . Independence. Two r.v’s  and  are said to be Independent iff:  ( )= ()· () for all values ( )∈R × R  (4) That is, to verify that these two r.v’s are independent we need to confirm that the probability of all possible pairs of values ( ) satisfies (4). 4
  • 5. Example. In the case of the joint distribution in (3) we can show that the r.v’s are not independent because for ( )=(0 0):  (0 0)=0 6=  (0)· (0)=(25)(25) It is important to emphasize that the above condition of Independence is not equivalent to the two random variables being uncorrelated: (  )=0 9  ( )= ()· () for all ( )∈R ×R  where ‘9’ denotes ‘does not imply’. This is because (  ) is a measure of linear dependence between  and  since it is based on the covariance defined by: (  ) =[(-())( -( ))=2(0)(0-75) + (25)(0-75)(1-75)+ +(25)(0−75)(1−75) + (5)(1−75)(1−75) = −0625 A standardized covariance yields the correlation: ( ) = −0625 = 1875  ()· ( ) (  )= √ −1 3 The intuition underlying this result is that the correlation involves only the first two moments [mean, variance, covariance] of  and  but independence is defined in terms of the density functions; the latter, in principle, involves all moments, not just the first two! Identically Distributed. Two r.v’s  and  are said to be Identically Distributed iff:  (; )= (; ) for all values ( )∈R × R  (5) Example. In the case of the joint distribution in (3) we can show that the r.v’s are identically distributed because (5) 5
  • 6. holds. In particular, both r.v’s  and  take the same values with the same probabilities. To shed further light on the notion of IID, consider the three bivariate distributions given below. Â 1 2  () Â 0 1  () Â 0 1  () 0 018 042 06 0 018 042 06 0 036 024 06 2 012 028 04 1 012 028 04 1 024 016 04 1  () 1  ()  () 03 07 (A) 03 07 (B) 06 04 (C) (I)  and  are Independent iff:  ( )= ()· () for all ( )∈R × R  (6) (ID)  and  are Identically Distributed iff:  () =  () for all ( )∈R × R  = and R =R  The random variables  and  are independent in all three cases since they satisfy (4) (verify!). The random variables in (A) are not Identically Distributed because R 6=R  and  ()6= () for some ( )∈R ×R  The random variables in (B) are not Identically Distributed because even though R =R   ()6= () for some ( )∈R × R  Finally, the random variables in (C) are Identically Distributed because R =R  and  ()= () for all ( )∈R × R  6 1
  • 7. 2 Point Estimation: an overview It turns out that all forms of frequentist inference, which include point and interval estimation, hypothesis testing and prediction, are defined in terms of two sets: X — sample space: the set of all possible values of the sample X Θ — parameter space: the set of all possible values of θ Note that the sample space X is always a subset of R and denoted by R   In estimation the objective is to use the statistical information to infer the ‘true’ value ∗ of the unknown parameter, whatever that happens to be, as along as it belongs to Θ In general, an estimator b of  is a mapping (function)  from the sample space to the parameter space: b X → Θ (): (7) Example 1. Let the statistical model of interest be the simple Bernoulli model (table 2) and consider the question of estimating the unknown parameter  whose parameter space is Θ:=[0 1] Note that the sample space is: X:={0 1} Table 2 - Simple Bernoulli Model Statistical GM:  = +   ∈N. ⎫ [1] Bernoulli:  v Ber( )  =0 1 ⎬ [2] constant mean: ( )= ∈N. ⎭ [3] constant variance:  ( )=(1−) [4] Independence: {  ∈N} is an independent process 7
  • 8. The notation b (X) is used to denote an estimator in order to bring out the fact that it is a function of the sample X and for different values it generates the sampling distribution  (b (x); ) for x∈X. Post-data b (X) yields an estimate b 0) which constitutes a particular value of b (x (X) corresponding to data x0 Crucial distinction: b (X)-estimator (Plato’s world), b 0)-estimate (real world), and -unknown (x constant (Plato’s world); Fisher (1922). In light of the definition in (7), which of the following mappings constitute potential estimators of ? Table 3: Estimators of ? [a] b1(X)=  [b] b2(X)=1 −   [c] b3(X)=(1 + )2  ¡ ¢ b (X)= 1 P  for some   3 [d]  =1 ¡ ¢ b+1(X)= 1 P  [e]  =1 +1 Do the mappings [a]-[e] in table 3 constitute estimators of ? All five functions [a]-[e] have X as their domain, but is the range of each mapping a subset of Θ:=[0 1]? Mapping [a], [c]-[e] can be possible estimators of  because their ranges are subsets of [0 1], but [b] cannot not because it can take the value −1 [ensure you understand why!] which lies outside the parameter space of  One can easily think of many more functions from X to Θ that will qualify as possible estimators of  Given the plethora of such possible estimators, how does one decide which one is the most appropriate? 8
  • 9. To answer that question let us think about the possibility of an ideal estimator, ∗():X → ∗ i.e., ∗(x)=∗ for all values x∈X . That is, ∗(X) pinpoints the true value ∗ of  whatever the data. A moment’s reflection reveals that no such estimator could exist because X is a random vector with its own distribution  (x; ) for all x∈X. Moreover, in view of the randomness of X, any mapping of the form (7) will be a random variable with its own sampling distribution,  (b (x); ) which is directly derivable from  (x; ). Let us take stock of these distributions. Let us keep track of these distributions and where they come from. The distribution of the sample  (x; ) for all x∈X is given by the assumptions of statistical model in question. I In the above case of the simple Bernoulli model, we can combine assumptions [2]-[4] to give us: [2]-[4] Y  (x; ) =  ( ; ) =1 and then use [1]:  ( ; )=(1 − )1− =1 2   to determine  (x; ): P [2]-[4] Y [1]-[4] P  =1  (x; ) =  ( ; ) =  (1−) =1 1− = (1−)−  =1 P where = =1  , and one can show that : P  =   v Bin( (1 − )) (8) =1 i.e.  is Binomially distributed. note that the means and variances are derived using the two formulae: (i) (1 + 2 + )=(1) + (2) +  2 2 (ii)  (1 + 2 + )=  (1) +   (2) 9 (9)
  • 10. To derive the mean and variance of  : P P (i) P ( ) =  (  ) =  ()=  = =1 =1 =1 P P (ii) P  ( )=  (  ) =   ()=  (1−)=(1−) =1 =1 =1 The result in (8) is a special case of a general result. ¥ The sampling distribution of any (well-behaved) function of the sample, say =(1 2  ) can be derived from  (x; ) x∈X using the formula: R R ()=P( ≤ )= ··· {x: (x)≤}  (x; θ)x ∈R (10) In the Bernoulli case, all the estimators [a], [c]-[e] are linear functions of (1 2  ) and thus, by (8), their distribution is Binomial. In particular, Table 4: Estimators and their sampling distributions [a] b1(X)= v Ber( (1−))  ³ ´ b3(X)=(1 + )2 v Bin  [ (1−) ] [c]  2 ³ ´ ¡ 1 ¢ P (1−) [d] b(X)=   =1  v Bin  [  ]  for   3 ³ ´ ¡ 1 ¢ P (1−)  [e] b+1(X)= +1  =1  v Bin +1  [ (+1)2 ] (11) It is important to emphasize at the outset that the sampling distributions [a]-[e] are evaluated under =∗ where ∗ is the true value of  It is clear that none of the sampling distributions of the estimators in table 4 resembles that of the ideal estimator, ∗(X), whose sampling distribution, if it exists, would be of the form: (12) [i] P(∗(X)=∗)=1 10
  • 11. In terms of its first two moments, the ideal estimator satisfies [ii] (∗(X))=∗and [iii]  (∗(X))=0 In contrast to the (infeasible) ideal estimator in (12), when the estimators in table 4 infer  using an outcome x, the inference is always subject to some error because the variance is not zero. The sampling distributions of these estimators provide the basis for evaluating such errors. In the statistics literature the evaluation of inferential errors in estimation is accomplished in two interconnected stages. The objective of the first stage is to narrow down the set of all possible estimators of  to an optimal subset, where optimality is assessed by how closely the sampling distribution of an estimator approximates that of the ideal estimator in (12); the subject matter of section 3. The second stage is concerned with using optimal estimators to construct the shortest Confidence Intervals (CI) for the unknown parameter  based on prespecifying the error of covering (encompassing) ∗ within a random interval of the form ((X) (X)); the subject matter of section 4. 3 Properties of point estimators As mentioned above, the notion of an optimal estimator can be motivated by how well the sampling distribution of an estimator b(X) approximates that of the ideal estimator in (12).  In particular, the three features of the ideal estimator [i]-[iii] motivate the following optimal properties of feasible estimators. 11
  • 12. Condition [ii] motivates the property known as: [I] Unbiasedness: An estimator b (X) is said to be an unbiased for  if: (13) (b (X))=∗ That is, the mean of the sampling distribution of b (X) coincides with the true value of the unknown parameter  Example. In the case of the simple Bernoulli model, we can see from table 4 that the estimators b1(X) b3(X)   and b(X) are unbiased since in all three cases (13) is satis fied. In contrast, estimator b+1(X) is not unbiased because  ´ ³   b+1(X) = +1  6=   Condition [iii] motivates the property known as: [II] Full Efficiency: An unbiased estimator b(X) is said  to be a fully efficient estimator of  if its variance is as small as it can be, where the latter is expressed by: ´i−1 h ³ (x;) b(X))=():=  − 2 ln  2  (   where ‘()’ stands for the Cramer-Rao lower bound; note that  (x; ) is given by the assumed model. Example (the derivations are not important!). In the case of the simple Bernoulli model: P ln  (x; )= ln +(− ) ln(1−) where  = =1   ( )=   ln  (x;) 1 = ( )( 1 ) − ( −  )( 1− )   2 ln  (x;)  1 =− ( 12 )−(− )( 1− )2 2  ´ ³ 2 (x;) 1 =( 12 )( ) + [ − ( )]( 1− )2 −  ln2 and thus the Cramer-Rao lower bound is: ():= (1−)   12  = (1−) 
  • 13. Looking at the estimators of  in (12) it is clear that only one unbiased estimator achieves that bound, b(X) Hence,  b(X) is the only estimator of  which is both unbiased and  fully efficient. Comparisons between unbiased estimators can be made in terms of relative efficiency:  (b1(X))   (b2(X)) for   2    asserting that b2(X) is relatively more efficient than b1(X)  but one needs to be careful with such comparisons because they can be very misleading when both estimators are bad, as in the case above; the fact that b2(X) is relatively more  efficient than b1(X) does not mean that the former is even  an adequate estimator. Hence, relative efficiency is not something to write home about! What renders these two estimators practically useless? An asymptotic property motivated by condition [i] of the ideal estimator, known as consistency. Intuitively, an estimator b(X) is consistent when its preci sion (how close to ∗ is) improves as the sample size increases. Condition [i] of the ideal estimator motivates the property known as: [III] Consistency: an estimator b(X) is consistent if:   Strong: P(lim→∞ b(X)=∗)=1 (14) ¯ ³¯ ´ ¯b ∗¯ Weak: lim→∞ P ¯(X) −  ¯ ≤  =1 That is, an estimator b(X) is consistent if it approximates  (probabilistically) the sampling distribution of the ideal es13
  • 14. timator asymptotically; as  → ∞ The difference between strong and weak consistency stems from the form of probabilistic convergence they involve, with the former being stronger than the latter. Both of these properties constitute an extension of the Strong and Weak Law of Large Numbers (LLN) P 1 which hold for the sample mean  =    of a process =1 {  =1 2   } under certain probabilistic assumptions, the most restrictive being that the process is IID; see Spanos (1999), ch. 8. 0.65 0.60 0.60 Sample average 0.70 0.65 Sample Average 0.70 0.55 0.50 0.55 0.50 0.45 0.45 0.40 0.40 1 20 40 60 80 100 120 140 160 180 1 200 100 200 300 400 500 600 700 800 900 1000 Inde x Inde x Fig. 2: t-plot of  for a BerIID realization with =1000 Fig. 1: t-plot of  for a BerIID realization with =200 In practice, it is no-trivial to prove that a particular estimator is consistent or not by verifying directly the conditions in (14). However, there is often a short-cut for verifying consistency in the case of unbiased estimators using the sufficient ³ ´ condition: lim   b(X) =0  (15) →∞ Example. In the case of the simple Bernoulli model, one can verify that the estimators b1(X) and b3(X) are inconsis  tent because: ³ ´ ³ ´ b1(X) =(1−)6=0 lim   b1(X) = (1−) 6=0 lim     2 →∞ →∞ 14
  • 15. i.e. their variances do not decrease to zero as the sample size  goes to infinity. In contrast, the estimators b(X) and b+1(X) are consis  tent because: ³ ´ lim   b(X) = lim ( (1−) )=0 lim (b+1(X))=0    →∞ →∞ →∞ Note that ‘MSE’ denotes the ‘Mean Square Error’, defined by: (b; ∗)= (b) + [(b; ∗)]2    where (b; ∗)=(b)−∗ Hence:   lim (b)=0 if (a) lim  (b)=0 and (b) lim (b)=∗    →∞ →∞ →∞  where (b) is equivalent to lim (b; ∗)=0 →∞ Let us take stock of the above properties and how they can used by the practitioner in deciding which estimator is optimal. The property which defines minimal reliability for an estimator is that of consistency. Intuitively, consistency indicates that as the sample size increases [as  → ∞] the estimator b(X) approaches ∗ the true value of  in some  probabilistic sense; convergence almost surely or convergence in probability. Hence, if an estimator b(X) is not consistent,  it is automatically excluded from the subset of potentially optimal estimators, irrespective of any other properties this estimator might enjoy. In particular, an unbiased estimator which is inconsistent is practically useless. On the other hand, just because an estimator b(X) is consistent does not imply  that it’s a ‘good’ estimator; it only implies that it’s minimally acceptable. 15
  • 16. ¥ It is important to emphasize that the properties of unbiasedness and fully efficiency hold for any sample size   1 and thus we call them finite sample properties, but consistency is an asymptotic property because it holds as  → ∞ Example. In the case of the simple Bernoulli model, if the choice between estimators is confined (artificially) among the estimators b1(X) b3(X) and b+1(X) the latter estimator    should be chosen, despite being biased, because it’s a consistent estimator of  On the other hand, among the estimators given in table 4, b (X) is clearly the best (most optimal) be cause it satisfies all three properties. In particular, b(X), not  only satisfies the minimal property of consistency, but it also has the smallest variance possible, which means that it comes closer to the ideal estimator than any of the others, for any sample size   2 The sampling distribution of b(X), when  ∗ evaluated under = , takes the form: ³ ´ ∗ ¡ ¢ ∗ ∗ b(X)= 1 P  = Bin ∗ [  (1− ) ]  (16) v [d]  =1   whatever the ‘true’ value ∗ happens to be. Additional Asymptotic properties In addition to the properties of estimators mentioned above, there are certain other properties which are often used in practice to decide on the optimality of an estimator. The most important is given below for completeness. [V] Asymptotic Normality: an estimator b(X) is said  to be asymptotically Normal if: ´ √ ³b  (X) −  v N(0 ∞()) ∞()6=0 (17)  where ‘v’ stands for ‘can be asymptotically approximated by’.  16
  • 17. This property is an extension of a well-known result in probability theory: the Central Limit Theorem (CLT). The CLT asserts that, under certain probabilistic assumptions on the process {  =1 2   } , the most restrictive being that the process is IID, the sampling distribution of P 1  =    for a ‘large enough’  can be approximated =1 by the Normal distribution (Spanos, 1999, ch. 8):  (√−()) v N(0 1) (18)   (  ) Note that the important difference between (17) and (18) is that b(X) in the former does not have to coincide with  ;  it can be any well-behaved function (X) of the sample X. Example. In the case of the simple Bernoulli model the sampling distribution of b (X) which we know is Binomial  (see (16)), it can also be approximated using (18). In the graph below we compare the Normal approximation to the Binomial for =10 and =20 in the case where =5, and the improvement is clearly noticeable. Distribution Plot Distribution Plot 0.20 Distribution n p Binomial 10 0.5 0.4 Distribution n p Binomial 20 0.5 Distribution Mean StDev Normal 5 1 Distribution Mean StDev Normal 10 2.236 0.15 Density Density 0.3 0.2 0.05 0.1 0.0 0.10 0 2 4 6 8 0.00 10 X Normal approx. of Bin.:  (; =.5 =10) 2 4 6 8 10 X 12 14 16 18 Normal approx. of Bin.:  (; =.5 =20) 17
  • 18. 4 Confidence Intervals (CIs): an overview 4.1 An optimal CI begins with an optimal point estimator Example 2. Let us summarize the discussion concerning point estimation by briefly discussing the simple (one parameter) Normal model, where =1 (table 5). Table 5 - Simple Normal Model (one unknown parameter) Statistical GM:  = +   ∈N={1 2 } ⎫ [1] Normality:  v N( )  ∈R ⎬ [2] Constant mean: ( )= ∈N. ⎭ [3] Constant variance:  ( )= 2 (known) [4] Independence: {  ∈N} independent process In section 3 we discussed the question of choosing among numerous possible estimators of  such as [a]-[e] (table 6) using their sampling distributions. These results stem from the following theorem. If X:=(1 2  ) is a random (IID) sample from the Normal distribution, i.e.  v NIID(  2) ∈N:=(1 2   ) P then the sampling distribution of =1  is: P 2 (19) =1  v N(  ) Among the above estimators the sample mean for 2=1: ¡ 1¢ P 1  (X):=  =    v N   b =1 constitutes the optimal point estimator of  because it is: [U] Unbiased (( )=∗), [FE] Fully Efficient ( ( )=()), and [SC] Strongly Consistent (P(lim→∞  =∗)=1). 18
  • 19. Table 6: Estimators UN FE SC 1(X)= v N( 1) b 2(X)=1 −  v N(0 2) b 3(X)=(¢ + )2 v N( 1 )¢ b ¡ 1 1 P ¡ 21  (X)=  b  =1  v N ³  ´ P 1   [e] +1(X)= +1 =1 vN +1  (+1)2 b X × X X × × × X × × × X × × X [a] [b] [c] [d] Given that any ‘decent’ estimator (X) of  is likely to b yield any value in the interval (−∞ ∞)  can one say something more about its reliability than just "on average" its values (x) for x∈X are more likely to occur around ∗ (the b true value) than those further away? 4.2 What is a Confidence Interval? I This is what a Confidence Interval (CI) proposes to address. In general, a 1− CI for  takes the generic form: P((X) ≤ ∗ ≤ (X))=1− where (X) and (X) denote the lower and upper (random) bounds of this CI. The 1− is referred to as the confidence level and represents the coverage probability of the CI: (X;)=((X) (X)) in the sense that the probability that the random interval (X) covers (overlays) the true ∗ is equal to (1−)  This is often envisioned in terms of a long-run metaphor of repeating the experiment underlying the statistical model in question in order to get a sequence of outcomes (realizations 19
  • 20. of X) x each of which will yield an observed (x ;) =1 2   0 In the context of this metaphor (1−) denotes the relative frequency of the observed CIs that will include (overlay) ∗ Example 2. In the case of the simple (one parameter) Normal model (table 5), let us consider the question of constructing 95 CIs using the different unbiased estimators of  in table 6: [a] P(b1(X)−196 ≤ ∗ ≤ 1(X)+196)=95  b 1 1 [c] P(b3(X)−196( √2 ) ≤ ∗ ≤ 3(X)+196( √2 ))=95  b 1 1 [d] P( −196( √ ) ≤ ∗ ≤  +196( √ ))=95 (20) How do these CIs differ? The answer is in terms of their precision (accuracy). One way to measure precision for CIs is to evaluate their length: ³ ´ ³ ´ 1 1 √ [a]: 2 (196) =392 [c]: 2 196( √2 ) =2772 [d]: 2 196( √ ) = 392  It is clear from this evaluation that the CI associated with P 1  =    is the shortest for any   2; e.g. for =100 =1 392 the length of this CI is √100 =392 20
  • 21. ∗ 1. ` − − − −  − − − − a 2. ` − − − − − − − − −− a 3. ` − − − −  − − − − a 4. ` − − − −  − − − − a 5. ` − − − −  − − − − a 6. ` − − − −  − − − − a 7. ` − − − −  − − − − a 8. ` − − − −  − − − − a 9. ` − − − −  − − − − a 10. ` − − − −  − − − − a 11. ` − − − −  − − − − a 12. ` − − − −  − − − − a 13.I ` − − −−− − −− a 14. ` − − − −  − − − − a 15. ` − − − −  − − − − a 16. ` − − − −  − − − − a 17. ` − − − − − − − − −− a 18. ` − − − −  − − − − a 19. ` − − − −  − − − − a 20. ` − − − −  − − − − a 21
  • 22. 4.3 Constructing Confidence Intervals (CIs) More generally, the sampling distribution of optimal estimator   gives rise to a pivot (a function of the sample and  whose distribution is known): ¢ =∗ √ ¡ (21)    −  v N(0 1) which can be used to construct the shortest CI among all (1−) CIs for : 1 1 P( −  ( √ ) ≤ ∗ ≤  +  ( √ )) = (1−)  (22) 2 2 where P(|| ≥   )=(1−) for  vN(0 1) (figures 1-2). 2 Example 3. In the case where  2 is unknown, and we use P 2=[1( − 1)]  ( −  )2 to estimate it, the pivot in =1 (21) takes the form: √ (  −) =∗ (23) v St( − 1)  where St(−1) denotes the Student’s t distribution with (−1) degrees of freedom. Step 1. Attach a (1−) coverage probability using (23): √ (  −) ≤   ) = (1−)  P(−  ≤  2 2 where P(| | ≥   )=(1−) for  vSt(−1) 2 √ (  −) to isolate  to derive the CI: Step 2. Re-arrange  √ ¡ ¢ (  −)   P(−  ≤ ≤   )=P(−  ( √ ) ≤   −  ≤   ( √ ))=  2 2 2 2   =P(− −  ( √ ) ≤ − ≤ −  +   ( √ ))= 2 2   = P( −  ( √ ) ≤ ∗ ≤  +  ( √ ))= (1−)  2 2 In figures 1-2 the underlying distribution is Normal and in figures 3-4 it’s Student’s t with 19 degrees of freedom. One 22
  • 23. can see that, while the tail areas are the same for each , the threshold values for the Normal   for each  are smaller than 2 ∗ the corresponding values   for the Student’s t because the 2 latter has heavier tails due to the randomness of 2 Distribution Plot Distribution Plot Normal, Mean=0, StDev=1 Normal, Mean=0, StDev=1 0.3 Density 0.4 0.3 Density 0.4 0.2 0.1 0.2 0.1 0.025 0.0 0.05 0.025 -1.96 0 X 0.0 1.96 0.05 -1.64 0 X 1.64 Fig. 1: P(|| ≥ 025)=.95 for  vN(0 1) Fig. 2: P(|| ≥ 05)=.90 for  vN(0 1) Distribution Plot Distribution Plot T, df=20 T, df=20 0.3 Density 0.4 0.3 Density 0.4 0.2 0.1 0.1 0.025 0.0 0.2 0.05 0.025 -2.09 0 X 0.0 2.09 Fig. 3: P(| | ≥ 025)=.95 for  vSt(19) 0.05 -1.72 0 X 1.72 Fig. 4: P(| | ≥ 05)=.90 for  vSt(19) 23
  • 24. 5 Summary and conclusions The primary objective in frequentist estimation is to learn about ∗ the true value of the unknown parameter  of interest using its sampling distribution (b ∗) associated with ; particular sample size  The finite sample properties are de; fined directly in terms of (b ∗) and the asymptotic properties are defined in terms of the asymptotic sampling distri; ; bution ∞(b ∗) aiming to approximate (b ∗) at the limit as  → ∞ The question that needs to be considered at this stage is: what combination of the above mentioned properties specifies an ‘optimal’ estimator? A necessary but minimal property for an estimator is consistency (preferably strong). By itself, however, consistency does not secure learning from data for a given ; it’s a promissory note for potential learning. Hence, for actual learning one needs to supplement consistency with certain finite sample properties like unbiasedness and efficiency to ensure that learning can take place with the particular data x0:=(1 2  ) of sample size . Among finite sample properties full efficiency is clearly the most important because it secures the highest degree of learning for a given  since it offers the best possible precision. Relative efficiency, although desirable, needs to be investigated further to find out how large is the class of estimators being compared before passing judgement. Being the 24
  • 25. best econometrician in my family, although worthy of something, does not make me a good econometrician!! Unbiasedness, although desirable, is not considered indispensable by itself. Indeed, as shown above, an unbiased but inconsistent estimator is practically useless, and a consistent but biased estimator is always preferable. Hence, a consistent, unbiased and fully efficient estimator sets the gold standard in estimation. In conclusion, it is important to emphasize that point estimation is often considered inadequate for the purposes of scientific inquiry because a ‘good’ point estimator b(X) by  itself, does not provide any measure of the reliability and precision associated with the estimate b(x0); one would be  wrong to assume that b(x0) ' ∗ This is the reason why  b(x0) is often accompanied by its standard error [the es q  timated standard deviation  (b(X))] or the p-value of some test of significance associated with the generic hypothesis =0. Interval estimation rectifies this weakness of point estimation by providing the relevant error probabilities associated with inferences pertaining to ‘covering’ the true value ∗ of  25