1. Fishing for Errors
Extending the Treatment of Errors in the
Fisher Matrix Formalism
!
ded Moumita Aich
a Mohammed El-Mufti
elo Eli Kasai
Brian Nord
R Marina Seikel
Sahba Yahya
Alan Heavens
Bruce Bassett
12 April 2012
2. Fisher Power!
The Fisher Matrix forecasts astronomical constraints
for model parameters: it can be used to predict 68% confidence that parameters
confidence contours cosmological parameter! lie within the blue (dashed)
contour.
Ingredients required:
a parametrized, physical model for an
θA
observable [ y = f(x;θ) ]
a set of errors in the observable [ σy ]
Example: Baryon Acoustic Oscillations [e.g., Big BOSS] 99% confidence
Model and parameters: dA(z; H0, Ωk) and
H(z; H0, Ωk, Ωm)
a set of errors in the observable variables: σdA, σH
θB
3. Fisher Power!
The Fisher Matrix forecasts astronomical constraints
for model parameters: it can be used to predict 68% confidence that parameters
confidence contours cosmological parameter! lie within the blue (dashed)
contour.
Ingredients required:
a parametrized, physical model for an
θA
observable [ y = f(x;θ) ]
a set of errors in the observable [ σy ]
Example: Baryon Acoustic Oscillations [e.g., Big BOSS] 99% confidence
Model and parameters: dA(z; H0, Ωk) and
H(z; H0, Ωk, Ωm)
a set of errors in the observable variables: σdA, σH
θB
General Formalism:
⌧ 2
@ lnL [definition]
FAB = y = f (x; ✓) model
{
@✓A @✓B
✓ ◆ ✓ ◆ covariance of
@f (x) 1 @f (x) 1 1 @C 1 @C 2
FAB = C + Tr C C y1 0 data variable
@✓A @✓B 2 @✓A @✓B 2
0 yN errors
4. Fisher Power!
The Fisher Matrix forecasts astronomical constraints
for model parameters: it can be used to predict 68% confidence that parameters
confidence contours cosmological parameter! lie within the blue (dashed)
contour.
Ingredients required:
a parametrized, physical model for an
θA
observable [ y = f(x;θ) ]
a set of errors in the observable [ σy ]
Example: Baryon Acoustic Oscillations [e.g., Big BOSS] 99% confidence
Model and parameters: dA(z; H0, Ωk) and
H(z; H0, Ωk, Ωm)
a set of errors in the observable variables: σdA, σH
θB
General Formalism:
⌧ 2
@ lnL [definition]
FAB = y = f (x; ✓) model
{
@✓A @✓B
✓ ◆ ✓ ◆ covariance of
@f (x) 1 @f (x) 1 1 @C 1 @C 2
FAB = C + Tr C C y1 0 data variable
@✓A @✓B 2 @✓A @✓B 2
0 yN errors
This work focuses on the covariance matrix, C
5. Primary Goals and Questions
• Will the errors in the independent variable [e.g.,
redshift] impact predicted constraints on model
parameters?
• What is the impact of the dependent and
independent variables being correlated?
• Can we account for multi-peaked distributions in the
independent variable? --e.g., double-peaked
distributions for photometric redshifts. [Next time]
6. Primary Goals and Questions
✓ 2
◆
0
y1
0 2 • Will the errors in the independent variable [e.g.,
yN redshift] impact predicted constraints on model
parameters?
✓ ◆
2
xx 0 • What is the impact of the dependent and
0 2
yy independent variables being correlated?
• Can we account for multi-peaked distributions in the
✓ ◆ independent variable? --e.g., double-peaked
2 2
xx xy distributions for photometric redshifts. [Next time]
2 2
yx yy
7. Outline
• The motivation: Enhance FM predictions with more
comprehensive error accounting?
• General approach: Derive FM from scratch.
• Introducing Covariances in Observables.
8. Re-Derive FM From first principles
Setup [Observables]
Measured Observables: {Xi } , {Yi } ; i = 1, . . . N
True values of Observables: {xi } , {yi } ; i = 1, . . . N
Setup [Model]
Errors: X and Y are gaussian-distributed about true values, x and y, respectively.
Mean Model: x and y are related by y = f (x)
9. Re-Derive FM From first principles
Setup [Observables]
Measured Observables: {Xi } , {Yi } ; i = 1, . . . N
True values of Observables: {xi } , {yi } ; i = 1, . . . N
Setup [Model]
Errors: X and Y are gaussian-distributed about true values, x and y, respectively.
Mean Model: x and y are related by y = f (x)
Calculate Likelihood
L(✓) = p(X, Y |✓) / p(✓|X, Y ) (Likelihood of a parameter via Bayes’ Thm.)
Z
L= p(X, Y |x, y, ✓)p(y|x, ✓)p(x|✓) dN x dN y (Unpack all the conditional probabilities)
Let y always be the function of x: Assume uniform distribution for x:
p(y|x, ✓) = (y f (x)) xi ⇠ U = p(xi |✓)
10. Re-Derive FM From first principles
Setup [Observables]
Measured Observables: {Xi } , {Yi } ; i = 1, . . . N
True values of Observables: {xi } , {yi } ; i = 1, . . . N
Setup [Model]
Errors: X and Y are gaussian-distributed about true values, x and y, respectively.
Mean Model: x and y are related by y = f (x)
Calculate Likelihood
L(✓) = p(X, Y |✓) / p(✓|X, Y ) (Likelihood of a parameter via Bayes’ Thm.)
Z
L= p(X, Y |x, y, ✓)p(y|x, ✓)p(x|✓) dN x dN y (Unpack all the conditional probabilities)
Let y always be the function of x: Assume uniform distribution for x:
p(y|x, ✓) = (y f (x)) xi ⇠ U = p(xi |✓)
11. Re-Derive FM From first principles
Setup [Observables]
Measured Observables: {Xi } , {Yi } ; i = 1, . . . N
True values of Observables: {xi } , {yi } ; i = 1, . . . N
Setup [Model]
Errors: X and Y are gaussian-distributed about true values, x and y, respectively.
Mean Model: x and y are related by y = f (x)
Calculate Likelihood
L(✓) = p(X, Y |✓) / p(✓|X, Y ) (Likelihood of a parameter via Bayes’ Thm.)
Z
L= p(X, Y |x, y, ✓)p(y|x, ✓)p(x|✓) dN x dN y (Unpack all the conditional probabilities)
Let y always be the function of x: Assume uniform distribution for x:
p(y|x, ✓) = (y f (x)) xi ⇠ U = p(xi |✓)
Z
N N
) L= p(X, Y |x, f , ✓) d x d y
12. Calculate Likelihood (II.)
Z
The distributions in X and Y are Normal,
L = p(X, Y |x, f , ✓) dN x dN y but not generally analytically soluble.
Therefore, we Taylor-expand the model...
assume that it is linear across the width of f (xi ) ! f ⇤ (xi ) = f (Xi ) + (xi Xi )f 0 (Xi )
gaussian distribution of x, retaining only
linear terms.:
13. Calculate Likelihood (II.)
Z
The distributions in X and Y are Normal,
L = p(X, Y |x, f , ✓) dN x dN y but not generally analytically soluble.
Therefore, we Taylor-expand the model...
assume that it is linear across the width of f (xi ) ! f ⇤ (xi ) = f (Xi ) + (xi Xi )f 0 (Xi )
gaussian distribution of x, retaining only
linear terms.:
{zi , Zi } = {xi , Xi } for i N
Let Z be a 2N-dimensional vector, containing
both measured and true observables! {zi , Zi } = {f ⇤ , Yi } for i > N
Z
This provides the canonical form of the 1 1
multi-variate normal distribution: )L/ p exp (Z z)T C 1
(Z z)
detC 2
✓ ◆
With {z,Z} as vectors, the covariance matrix CXX CXY
can be written in block form: CXYT CY Y
14. Calculate Likelihood (II.)
Z
The distributions in X and Y are Normal,
L = p(X, Y |x, f , ✓) dN x dN y but not generally analytically soluble.
Therefore, we Taylor-expand the model...
assume that it is linear across the width of f (xi ) ! f ⇤ (xi ) = f (Xi ) + (xi Xi )f 0 (Xi )
gaussian distribution of x, retaining only
linear terms.:
{zi , Zi } = {xi , Xi } for i N
Let Z be a 2N-dimensional vector, containing
both measured and true observables! {zi , Zi } = {f ⇤ , Yi } for i > N
Z
This provides the canonical form of the 1 1
multi-variate normal distribution: )L/ p exp (Z z)T C 1
(Z z)
detC 2
✓ ◆
With {z,Z} as vectors, the covariance matrix CXX CXY
can be written in block form: CXYT CY Y
Notice that this form natively contains covariance
among X’s among Y’s and between X’s and Y’s
15. Calculate Likelihood (III.)
Z
1 1
L/ exp (Z z)T C 1
(Z z) Evaluating the exponent and simplifying,
detC 2
1 1 eT 1e
)L/ p exp Y R Y
detR 2
(where R is a function of Cij and f*) R = CY Y + CXYT T + T CXY + T CXX T
✓ ◆
df (x)
T = diag
dx x=X
16. Calculate Likelihood (III.)
Z
1 1
L/ exp (Z z)T C 1
(Z z) Evaluating the exponent and simplifying,
detC 2
1 1 eT 1e
)L/ p exp Y R Y
detR 2
(where R is a function of Cij and f*) R = CY Y + CXYT T + T CXY + T CXX T
✓ ◆
df (x)
T = diag
dx x=X
Result : Where the x data are irrelevant,
i.e., when derivatives of f are zero,
i.e., when CXX (or σx) = 0,
We recover the original form R→ C = CYY
18. Main
Result :
With the help of Tegmark, Taylor and Heavens (1997), R then takes the place of the
covariance, C, in the canonical formulation:
T
✓ ◆
@f 1 @f 1 1 @R 1 @R
FAB = R + Tr R R
@A @B 2 @A @B
19. Main
Result :
With the help of Tegmark, Taylor and Heavens (1997), R then takes the place of the
covariance, C, in the canonical formulation:
T
✓ ◆
@f 1 @f 1 1 @R 1 @R
FAB = R + Tr R R
@A @B 2 @A @B
Even if C does not dependent on the parameters, R does depend on
them [via f]. The trace term is in general non-zero.
20. Summary: An Extended Formalism
• Development of general process for evaluating arbitrary
model functions to 1st order in the FM formalism.
• Incorporation of correlated errors among observables.
Next Steps
• Application: Double-peaked Error distributions
• Check: Compare to MCMC
• Cosmological Application
• Incorporate into Fisher4Cast
Notes de l'éditeur
\n
Define the fisher matrix and it the pieces.\nstart motivation for ...\n1) why it’s used\n2) why we want to modify: note where the errors enter\n\n\n given a model, it propagates errors from data onto model parameter estimates.\n
Define the fisher matrix and it the pieces.\nstart motivation for ...\n1) why it’s used\n2) why we want to modify: note where the errors enter\n\n\n given a model, it propagates errors from data onto model parameter estimates.\n
mention Trotta and previous works, and propagation of error\n
\n
Step through this derivation, noting the key features\n\n\n
Step through this derivation, noting the key features\n\n\n
Step through this derivation, noting the key features\n\n\n
key features: 1) the small variation over the interval allows for taylor expansion\n\n
key features: 1) the small variation over the interval allows for taylor expansion\n\n
Start generally and choose the cases that are of interest; \n\nAre the key elements in the derivation that we should mention? \nWhat are the applications for this? [this is a big question for us, since that still needs to be addressed and workedon for the paper.\n\n\n
\n
\n
\n
Show some basic results from the propagation of error method: show the resulting equation for the FM and the behavior with varying sigma_x or sigma_y (plots!); this will be the naive version starting with the FM from slide 2, where people always start. \n\nExamples from the analytics for the linear case.\n
The naive version also starting with the canonical form of the FM.\n\nExamples with the linear case\n
Did we ever nail down why this difference occurred? Does it simply come from the fact that the MOE doesn’t have the 2nd [covariance] term in the canonical FM eqn?\n
the motivation for going deeper was simply that the methods disagreed?\n