The document provides an overview and review of topics related to tracking and filtering fundamentals, including:
- Linear algebra and linear systems, probability, hypothesis testing, and state estimation.
- Linear and non-linear filtering, multiple model filtering, track maintenance, data association techniques, and activity control.
- Mathematics topics like linear algebra, probability, estimation, vector/matrix properties, and state-space representations are reviewed for continuous and discrete time systems. Concepts include the Jacobian, gradient, Dirac delta function, and observability criteria.
3. Overview
Mathematics Overview
– Linear Algebra and Linear Systems
– Probability and Hypothesis Testing
– State Estimation
Filtering Fundamentals
– Linear and Non-linear Filtering
– Multiple Model Filtering
Tracking Basics
– Track Maintenance
– Data Association Techniques
– Activity Control
5. Mathematics Review
Linear Algebra and Linear Systems
– Definitions, Notations, Jacobians and Matrix Inversion Lemma
– State-Space Representation (Continuous and Discrete) and Observability
Probability Basics
– Probability, Conditional Probability, Baye’s and Total Probability Theorem
– Random Variables, Gaussian Mixture, and Covariance Matrices
Bayesian Hypothesis Testing
– Neyman-Pearson Lemma and Wald’s Theorem
– Chi-Square Distribution
Estimation Basics
– Maximum Likelihood (ML) and Maximum A Posteriori (MAP) Estimators
– Least Squares (LS) and Minimum Mean Square Error (MMSE) Estimators
– Cramer-Rao Lower Bound, Fisher Information, Consistency and Efficiency
7. Definitions and Notations
a1
a T
a = [ ai ] = 2 a = [ a1 a2 an ]
an
a11 a12 a1m a11 a21 an1
a a2 m a an 2
[ ]
A = aij = 21
a22
[ ]
AT = a ji = 12
a22
an1 an 2 anm a1m a2 m anm
8. Basic Matrix and Vector Properties
Symmetric and Skew Symmetric Matrix
A = AT A = − AT
Matrix Product (NxS = [NxM] [MxS]):
[ ]
m
C = cij = AB = ∑ aik bkj
k =1
Transpose of Matrix Product
[ ]
m
C = c ji = ( AB ) = B A = ∑ b jk aki
T T T T
k =1
Matrix Inverse
AA−1 = I
9. Basic Matrix and Vector Properties
Inner Product (Vectors must have equal length)
n
〈 a, b〉 = a b = ∑ ai bi
T
i =1
Outer Product (NxM = [N] [M])
[ ] [
abT = C = cij = ai b j ]
Matrix Trace
n
Tr ( A) = ∑ aii = Tr ( AT )
i =1
Trace of Matrix Product
∂ ∂ ( AB )
Tr ( AB) = Tr ( BA)
∂A
(Tr ( ABAT ) ) = A( B + BT ) = BT
∂A
10. Matrix Inversion Lemma
In Estimation Theory, the following complicated inverse appears:
(P −1
+H R H T −1
) −1
The Matrix Inversion Lemma yields an alternative expression which does
not depend on the inverses of the matrices in the above expression:
(
P − PH T HPH T + R ) −1
HP
An alternative form of the Matrix Inversion Lemma is:
( A + BCB )T −1 −1
= A − A B B A B+C−1
( T −1
)
−1 −1
B T A−1
11. The Gradient
The Gradient operator with respect to an n-dimensional vector “x” is:
T
∂ ∂
∇x =
∂x1 ∂xn
Thus the gradient of a scalar function “f” is:
T
∂f ∂f
∇x f =
∂x1 ∂xn
The gradient of an m-dimensional vector-valued function is:
T
T ∂ ∂
∇x f =
[ f1 ( x ) f m ( x )] = NxM
∂x1 ∂xn
12. The Jacobian Matrix
The Jacobian Matrix is a matrix of derivatives describing a linear mapping from
one set of coordinates to another. This is the transpose of the gradient of a
vector-valued function (p. 24):
∂x1 ∂x1
∂x′
∂xn
′
∂x ∂ ( x1 , x2 ,..., xm ) 1
J ( x , x ′) = = =
∂x ′ ∂ ( x1 , x′ ,..., xn )
′ 2 ′
∂xm
∂xm
∂x1
′ ∂xn
′
This is typically used as part of a Vector Taylor Expansion for approximating a
transformation.
∂x
x = x ( xo ) + ⋅ ( x ′ − xo ) +
′ ′
∂x ′ xo′
13. The Jacobian Matrix: An Example
The conversion from Spherical to Cartesian coordinates yields:
x = [ r Sin(b) Cos (e) r Cos (b) Cos (e) r Sin(e)]
x ′ = [ r b e]
Sin(b) Cos (e) r Cos (b) Cos (e) − r Sin(b) Sin(e)
J ( x , x ′) = Cos (b) Cos (e) − r Sin(b) Cos (e) − r Cos (b) Sin(e)
Sin(e) 0 r Cos (e)
∂x ∂x ′
J ( x , x ′) J ( x ′, x ) = = I
∂x ′ ∂x
15. Dirac Delta Function
The Dirac Delta Function is defined by:
δ (t − τ ) = 0 ∀t ≠ τ
This function is defined by its behavior under integration:
τ ∈ [ a, b ]
b
∫ δ (t − τ )dt = 1
a
In general, the Dirac Delta Function has the following “sifting” behavior:
f (t ) δ (t − τ )dt = f (τ ) τ ∈ [ a, b]
b
∫a
The discrete version of this is called the Kronnecker Delta:
0 ∀i ≠ j
δ ij =
1 i= j
16. State-Space Representation (Continuous)
A Dynamic Equation is typically expressed in the standard form (p. 27):
(t ) = A(t ) x (t ) + B(t )u (t )
x
x (t ) is the state vector of dimension “nx”
u (t ) is the control input vector of dimension “ny”
A(t ) is the system matrix of dimension “nx x nx”
B (t ) is the input gain matrix of dimension “nx x ny”
While the Measurement Equation is expressed in the standard form:
z (t ) = C (t ) x (t )
z (t ) is the measurement vector of dimension “nz”
C (t ) is the observation matrix of dimension “nz x nx”
17. Example State-Space System
A typical (simple) example is the constant velocity system:
ξ(t ) = 0
This system is not yet in state-space form:
ξ 0 1 ξ 0 0 u1
= ξ + 0 0 u
ξ 0 0 2
(t ) = A(t ) x (t ) + B(t )u (t )
x
And suppose that we only have position measurements available:
ξ
ξ meas
= [1 0]
ξ
z (t ) = C (t ) x (t )
18. State-Space Representation (Discrete)
A continuous state-space system can also be written in discrete form (p. 29):
xk = Fk −1 xk −1 + Gk −1 u k −1
xk is the state vector of dimension “nx” at time “k”
uk is the control input vector of dimension “ny” at time “k”
Fk is the transition matrix of dimension “nx x nx” at time “k”
Gk is the input gain matrix of dimension “nx x ny” at time “k”
While the Measurement Equation is expressed in the discrete form:
z k = H k xk
zk is the measurement vector of dimension “nz” at time “k”
Hk is the observation matrix of dimension “nz x nx” at time “k”
19. Example Revisited in Discrete Time
The constant velocity discrete time model is given by:
ξ k 1 t k − t k −1 ξ k −1 0 0 u1k −1
ξ = 0
ξ + 0 0 u
1 k −1
k 2 k −1
xk = Fk −1 xk −1 + Gk −1 u k −1
Since there is no time-dependence in the measurement equation, it is
a trivial extension to the continuous example:
ξ k
ξ meas
= [1 0]
ξ k
k
z k = H k xk
20. State Transition Matrix
We wish to be able to convert a continuous linear system to a discrete
time linear system. Most physical problems are easily expressible in the
continuous form while most measurements are discrete. Consider the
following time-invariant homogeneous linear system (pp. 180-182):
(t ) = A(t ) x (t ) where
x A(t ) = A for t ∈ [ t k −1 , t k ]
We have the solution:
{ }
x (t ) = Fk −1 ( t , t k −1 ) x ( t k −1 ) = L−1 ( sI − A) = e A( t −tk −1 ) x ( t k −1 )
−1
for t ∈ [ t k −1 , t k ]
If we add a term, making an inhomogeneous linear system, we obtain:
x (t ) = A(t ) x (t ) + B(t )u (t ) where B(t ) = B for t ∈ [ t k −1 , t k ]
21. Matrix Superposition Integral
Then, the state transition matrix is applied to the additive term and
integration is performed to obtain the generalized solution:
x (t ) = Fk −1 ( t , t k −1 ) x ( t k −1 ) + ∫ Fk −1 ( t ,τ ) B(τ )u (τ ) dτ for t ∈ [ t k −1 , t k ]
t
t k −1
Consider the following example:
u(t)=(t) 2σ 2 β x2 1 x1
s+β s
x1 0 1 x1 0
x = 0 − β x + 2σ 2 β u (t )
2 2
22. Observability Criteria
A system is categorized as observable if the state can be determined
from a finite number of observations, assuming that the state-space
model is correct.
For a time-invariant linear system, the observability matrix is given by:
H
HF
Ω=
n x −1
H F
Thus, the system is observable if this matrix has a rank equal to “nx” (pp.
25,28,30).
23. Observability Criteria: An Example
For the nearly constant velocity model described above, we have:
[1 0] 1 0
Ω= 1 ∆t =
[1 0]
1 ∆t
0 1
The rank of this matrix is “2” only if the delta time interval is non-zero.
Thus, we can only estimate position and velocity both (using only
position measurements) if these position measurements are separated
in time.
The actual calculation of rank is a subject for a linear algebra course and
leads to ideas such as linear independence and singularity (p. 25)
25. Axioms of Probability
Suppose that “A” and “B” denote random events, then the
following axioms hold true for probabilities:
– Probabilities are non-negative:
P{ A} ≥ 0 ∀A
– The probability of a certain event is unity:
P{ S } = 1
– Additive for mutually exclusive events:
If P{ A ∩ B} = 0 then P{ A ∪ B} = P{ A} + P{ B}
Mutually Exclusive
26. Conditional Probability
The conditional probability of an event “A” given the event “B” is:
P{ A ∩ B}
P{ A | B} =
P{ B}
For example, we might ask the following tracking related questions:
– Probability of observing the current measurement given the
previous estimate of the track state
– Probability of observing a target detection within a certain
surveillance region given that a true target is present
Formulating these conditional probabilities is the foundation of
track initiation, deletion, data association, SNR detection
schemes…
27. Total Probability Theorem
Assume that we have a set of events “Bi” which are mutually
exclusive:
P{ Bi ∩ B j } = 0 ∀ i ≠ j
And exhaustive:
n
∑ P{ B } = 1
i =1
i
Then the Total Probability Theorem states:
n n
P{ A} = ∑ P{ A ∩ Bi } = ∑ P{ A | Bi } P{ Bi }
i =1 i =1
28. Baye’s Theorem
We can work the conditional probability definition in order to obtain the
reverse conditional probability:
P{ Bi ∩ A} P{ A | Bi } P{ Bi }
P{ Bi | A} = =
P{ A} P{ A}
This conditional probability “Bi” is called the Posterior Probability while the
unconditional probability of “Bi” is called the Prior Probability.
In the case of “Bi” being mutually exclusive and exhaustive, we have (p. 47):
P{ A | Bi } P{ Bi }
Posterior Probability P{ Bi | A} = Prior Probability
∑ P{ A | B } P{ B }
n
j j
j =1 Likelihood Function
29. Gaussian (Normal) Random Variables
The Gaussian Random Variable is the most well-known, well-
investigated type because of its wide application in the real world
and its tractable mathematics.
A Gaussian Random Variable is one which has the following
probability density function (PDF) :
( x−µ ) 2
1 −
p ( x) = N ( x; µ , σ ) =
2
e 2σ 2
2πσ 2
and is denoted:
x ~ N (µ ,σ 2 )
30. Gaussian (Normal) Random Variables
The Expectation and Second Central Moment of this distribution
are:
( x−µ ) 2
∞ x −
E[ x ] = ∫ e 2σ 2
dx = µ Mean
−∞
2π σ
∞ x2 ( x−µ )
2
−
E[( x − E[ x]) ] = E[ x ] − E[ x] = ∫
2 2 2
e 2σ dx − µ 2 = σ 2
2
−∞ 2π σ
Mean Square Variance
These are only with respect to scalar random variables…what about
vector random variables?
31. Vector Gaussian Random Variables
The vector generalization is straight forward:
( x − µ ) T P −1 ( x − µ )
1 −
p( x ) = N ( x; µ , P) = e 2
2π P
The Expectation and Second Central Moment of this distribution
are:
E[ x ] = µ
E[( x − E[ x ])( x − E[ x ])T ] = P
Notice that the Variance is now replaced with a matrix called a
Covariance Matrix.
If the vector “x” is a zero-mean error vector than the covariance
matrix is called the Mean Square Error.
32. Baye’s Theorem: Gaussian Case
The “noise” of a device, denoted “x”, is observed. Normal
functionality is denoted by event “B1” while a defective device is
denoted by event “B2”:
B1 = N ( x;0, σ 12 ) B2 = N ( x;0, σ 2 )
2
The conditional probability of defect is (using Baye’s Theorem):
P{ x | B2 } P{ B2 } 1
P{ B2 | x} = =
P{ x | B1} P{ B1} + P{ x | B2 } P{ B2 } 1 + P{ x | B1} P{ B1 }
P{ x | B2 } P{ B2 }
Using the two distributions, we have:
1
P{ B2 | x} = x2 x2
σ 2 P{ B1} − 2+ 2
2σ 1 2σ 2
1+ e
σ 1 P{ B2 }
33. Baye’s Theorem: Gaussian Case
If we assume the diffuse prior, that the probability of each event is
equal, then we have a simplified formula:
1
P{ B2 | x} = x2 x2
σ2 − 2+ 2
2σ 1 2σ 2
1+ e
σ1
If we further assume that 2 = 4 1 and that x = 2, then we have:
P{ B2 | x} ≈ 0.998
Note that the likelihood ratio largely dominates the result of this
calculation. This quantity is crucial in inference and statistical
decision theory and often called “evidence from the data”.
P{ x | B1}
Λ( B1 , B2 ) =
P{ x | B2 }
34. Gaussian Mixture
Suppose we have “n” possible events “Aj” which are mutually exclusive
and exhaustive. And further suppose that each event has a Gaussian
PDF as follows (pp. 55-56):
A j ={ x ~ N ( x j , Pj )} and P{ A j } = p j
∆
Then, the total PDF is given by the Total Probability Theorem:
n
p ( x) = ∑ p ( x | A j ) P{ Ai }
j =1
This mixture can be approximated as another Gaussian once the mixed
moments are computed.
35. Gaussian Mixture
The first moment (mean) is easily derived as:
n n
[
x = E [ p ( x)] = E ∑ p ( x | A j ) P{ Ai } = ∑ E p ( x | A j ) p j]
j =1 j =1
n
x = ∑ pj xj
j =1
The covariance matrix is more complicated, but we simply apply the
definition:
[ ] [ ]
n
P = E ( x − x )( x − x ) = ∑ E ( x − x )( x − x ) | A j p j
T T
j =1
[ ]
= ∑ E ( x − x + x j − x j )( x − x + x j − x j ) | A j p j
n
T
j =1
36. Gaussian Mixture
Continuing the insanity:
[ ]
P = ∑ E ( x − x j )( x − x j ) | A j p j + ∑ ( x j − x )( x j − x ) p j
n n
T T
j =1 j =1
= ∑ Pj p j + ∑ ( x j − x )( x j − x ) p j
n n
T
j =1 j =1
Spread of the Means
The spread of the means term inflates the covariance of the final mixed
random variable to account for the differences between each individual
mean and the mixed mean.
38. Bayesian Hypothesis Testing
We consider two competing hypotheses about a parameter “”
defined as:
Null Hypothesis H0 : θ = θ0
Alternate Hypothesis H 1 : θ = θ1
We also define standard definitions concerning the decision errors:
∆
Type I Error (False Alarm) PeI = P{ accept H1 | H 0 true} = α
∆
Type II Error (Miss)
PeII = P{ accept H 0 | H1 true} = β
39. Neyman-Pearson Lemma
The power of the hypothesis test is defined as:
∆
Test Power (Detection) π = P{ accept H1 | H1 true} = 1 − β
The Neyman-Pearson Lemma states that the optimal decision (most
powerful test) rule subject to a fixed Type I Error () is the Likelihood
Ratio Test (pp.72-73):
P{ z | H1} H1 ; > Λ 0
Λ( H1 , H 0 ) ⇒ =
P{ z | H 0 } H 0 ; < Λ 0
Likelihood Functions
P{ Λ ( H1 , H 0 ) > Λ 0 | H 0 } = PeI = α
40. Sequential Probability Ratio Test
Suppose, we have a sequence of independent identically distributed (i.i.d.)
measurements “Z={zi}” and we wish to perform a hypothesis test. We can
formulate this in a recursive form as follows:
P{ H1 ∩ Z } P{ Z | H1} P0 { H1}
PR( H1 , H 0 ) = =
P{ H 0 ∩ Z } P{ Z | H 0 } P0 { H 0 }
Likelihood Functions a priori Probabilities
P0 { H1} n P{ zi | H1} n
PRn ( H1 , H 0 ) = ∏ P{ z | H } = PR0 ( H1 , H 0 ) ∏ Λ i ( H1 , H 0 )
P0 { H 0 } i =1 i 0 i =1
n
ln ( PRn ( H1 , H 0 ) ) = ln ( PR0 ( H1 , H 0 ) ) + ∑ ln ( Λ i ( H1 , H 0 ) )
i =1
41. Sequential Probability Ratio Test
So, the recursive for of the SPRT is:
ln( PRk ( H1 , H 0 ) ) = ln( PRk −1 ( H1 , H 0 ) ) + ln( Λ k ( H1 , H 0 ) )
Using Wald’s Theorem, we continue to test this quantity against two
thresholds until a decision is made:
H1 ; > T2
ln ( PRk ( H1 , H 0 ) ) ⇒ continue ; > T1 and < T2
H 0 ; < T1
1 − β β
T2 = ln and T1 = ln 1 − α
α
Wald’s Theorem applies when the observations are an i.i.d. sequence.
42. Chi-Square Distribution
The chi-square distribution with “n” degrees of freedom has the following
functional form:
n−2 x
1 −
χ n ( x) =
2
n
x 2
e 2
n
2 Γ
2
2
It is related to an “n” dimensional vector Gaussian distribution as follows:
( x − x ) T P −1 ( x − x ) ~ χ n2
More generally, the sum of squares of “n” independent zero-mean, unity
variance random variables is distributed as a chi-square with “n” degrees
of freedom (pp.58-60).
43. Chi-Square Distribution
The chi-square distribution with “n” degrees of freedom has the following
statistical moments:
E[ x] = n E[( x − E[ x]) 2 ] = 2n
The sum of two independent random variables which are chi-square are
also chi-square:
q1 ~ χ 2
n1 q2 ~ χ 2
n2
q1 + q2 ~ χ 2
n1 + n2
45. Parameter Estimator
A parameter estimator is a function of the observations (measurements)
that yields an estimate of a time-invariant quantity (parameter). This
estimator is typically denoted as:
[ ] where Z ={ z j } j =1
∆ ∆
k
xk = x k , Z
ˆ ˆ k k
Estimate Estimator Observations
We also denote the error in the estimate as:
∆
~ =x−x
xk ˆk
True Estimate
46. Estimation Paradigms
Non-Bayesian (Non-Random):
– There is no prior PDF incorporated
– The Likelihood Function PDF is formed
– This Likelihood Function PDF is used to estimate the parameter
∆
Λ Z ( x) = p( Z | x )
Bayesian (Random):
– Start with a prior PDF of the parameter
– Use Baye’s Theorem to find the posterior PDF
– This posterior PDF is used to estimate the parameter
p( Z | x ) p( x ) 1
p( x | Z ) = = p( Z | x ) p( x )
p( Z ) c
Posterior Likelihood Prior
47. Estimation Methods
Maximum Likelihood Estimator (Non-Random):
x ML ( Z ) = arg max[ p( Z | x ) ]
ˆ
x
dp( Z | x )
=0
dx x ML
ˆ
Maximum A Posteriori Estimator (Random):
x MAP ( Z ) = arg max[ p( Z | x ) p( x ) ]
ˆ
x
48. Unbiased Estimators
Non-Bayesian (Non-Random):
E[ xk ( Z k )] p ( Z k | x = x ) = x0
ˆ
0
Bayesian (Random):
[ ( )] (
E xk Z k
ˆ p x∩Z k ) = E[ x] p ( x )
General Case:
[ ( )]
E ~k Z k = 0
x
49. Estimation Comparison Example
Consider a single measurement of an unknown parameter “x” which is
susceptible to additive noise “w” that is zero-mean Gaussian:
z = x + w w ~ N 0, σ 2 ( )
The ML approach yields:
( z−x ) 2
1 −
Λ ( x ) = p ( z | x ) = N ( z; x, σ ) =
2
e 2σ 2
2πσ 2
x ML
ˆ = arg max[ Λ ( x)] = z
x
Thus, the MLE is the measurement itself because there is no prior
knowledge.
50. Estimation Comparison Example
The MAP, with a Gaussian prior, approach yields:
p( x) = N ( x; x , σ 0 )
2
( z − x)2 ( x− x )2 ( x −ξ ( z )) 2
− − − 2
2σ 2 2σ 02
p( z | x) p( x) e e 2σ 1
p( x | z ) = = =
p( z ) 2πσσ 0 p( z ) 2πσ 12
x z 1 1 1
ξ ( z ) = σ 12 2 + 2 and
σ = 2+ 2
0 σ σ 12 σ σ0 Prior
Information
x MAP = arg max[ p( x | z )] = ξ ( z )
ˆ
x Measurement
Information
Thus, the MAPE is a linear combination of the prior information and the
observation and it is weighted based upon the variance of each.
NOTE: The MLE and MAPE are equivalent for a diffuse prior !
51. Batch Estimation Paradigms
Consider that we now have a set of observations available for
estimating a parameter and that in general these observations are
corrupted by measurement noise:
Z k = { z j = h j ( x) + w j } j =1,,k
Least Squares (Non-Random)
k 2
ˆ
x LS
k [
= arg min ∑ z j − h j ( x) ]
x j =1
Minimum Mean Square Error (Random):
xkMMSE = arg min E ( x − x ) | Z k
ˆ
ˆ
x
ˆ 2
[ ]
[ ] ( )
∆ ∞
ˆ
x MMSE
k = E x | Z = ∫ x p x | Z k dx
k
−∞
52. Unbiasedness of ML and MAP Estimators
Maximum Likelihood Estimate:
E[ xkML ] = E[ z ] = E[ x0 + w] = x0 + E[ w] = x0
ˆ
Maximum A Posterior Estimate:
σ2
[ ] σ0 σ2 σ 02
2
Exˆ k= E 2
MAP
x+ 2 z = 2
2
x+ 2 E[ z ]
σ + σ 0 σ +σ0 σ +σ0 σ +σ0
2 2 2
σ2 σ 02 σ2 σ02
= 2 x+ 2 E[ x + w] = 2 x+ 2 ( x + E[ w])
σ +σ0 2
σ +σ0 2
σ +σ02
σ +σ0 2
σ2 σ0
2
= 2 x+ 2 x = x = E[ x]
σ +σ02
σ +σ02
53. Estimation Errors
Non-Bayesian (Non-Random):
{ } 2
{ }
Var[ xk ( Z k )] = E[ xk ( Z k ) − E[ xk ( Z k )] ] = E[ xk ( Z k ) − x0 ]
ˆ ˆ ˆ ˆ
2
Bayesian (Random):
ˆ ˆ { } 2
[ˆ { } 2
MSE[ xk ( Z k )] = E[ xk ( Z k ) − x ] = E E[ xk ( Z k ) − x | Z k ] ]
General Case:
[x ( )
E ~k Z k
2
] =
( ( ))
var xk Z k
ˆ x unbiased and x non − random
ˆ
( ( ))
ˆ k
MSE xk Z all cases
54. Variances of ML and MAP Estimators
Maximum Likelihood Estimate:
ˆ ( ˆ )
var[ xkML ] = E[ xkML − x0 ] = E[( z − x0 ) ] = σ 2
2 2
Maximum A Posterior Estimate:
[ ] [(
var xkMAP = E xkMAP − x
ˆ ˆ ) 2
] σ 2σ 02
= 2
σ +σ0 2
[ ]
< σ 2 = var xkML
ˆ
The MAPE error is less than the MLE error since the MAPE incorporates
prior information.
55. Cramer-Rao Lower Bound
The Cramer-Rao Lower Bound states that a limit on the ability to
estimate a parameter.
[ ( )] [( ( ) )( ( ) )
MSE xk Z k = E xk Z k − x xk Z k − x
ˆ ˆ ˆ
T
]≥ J k
−1
Not surprisingly, this lower limit is related to the likelihood function
which we recall as the “evidence from the data”. This limit is called the
Fisher Information Matrix.
∆
[
T
( (
J k = − E ∇ x ∇ x ln p Z k | x ) )]
x
When equality holds, the estimator is called efficient. An example of
this is the MLE estimate we have been working with.
59. Kalman-Bucy Problem
A stochastic discrete-time linear dynamic system:
xk = Fk −1 xk −1 + Gk −1 uk −1 + Γk −1ν k −1
xk is the state vector of dimension “nx” at time “k”
Gk uk is the control input of dimension “nx” at time “k”
Fk is the transition matrix of dimension “nx x nx” at time “k”
Γkν k is the plant noise of dimension “nx” at time “k”
The measurement equation is expressed in the discrete form:
z k = H k xk + wk
zk is the measurement vector of dimension “nz” at time “k”
Hk is the observation matrix of dimension “nz x nx” at time “k”
wk is the measurement noise of dimension “nz” at time “k”
60. Kalman-Bucy Problem
The Linear Gaussian Assumptions are:
E [ν k ] = 0 [
T
]
E ν kν j = Qk δ jk
E [ wk ] = 0 [
]
E wk wT = Rk δ jk
j
The measurement and plant noises are uncorrelated:
E[ wkν k ] = 0
The conditional mean is:
[ ]
∆
x j |k = E x j | Z k
ˆ Z k = { zi , i ≤ k }
ˆ
xk | k Filtered State Estimate ˆ
xk |k −1 Extrapolated State Estimate
The estimation error is denoted by:
∆
~ =x −x
x j |k ˆ j |k
j
61. Kalman-Bucy Problem
The estimate covariance is defined as:
[ ]
∆
Pj|k = E ~ j|k ~ jT|k | Z k
x x
Pk |k Filtered Error Covariance Pk |k −1 Extrapolated Error Covariance
The predicted measurement is given by:
[ ] [ ] [ ] [ ]
∆
z k |k −1 = E z k | Z k −1 = E H k xk + wk | Z k −1 = H k E xk | Z k −1 + E wk | Z k −1 = H k xk |k −1
ˆ ˆ
The measurement residual or innovation is denoted by:
∆
ηk = zk − zk |k −1 = z k − H k xk |k −1
ˆ ˆ
62. Kalman-Bucy Approach
Recall that the MMSE is equivalent to the MAPE in the Gaussian case.
Recall that the MAPE, with a Gaussian prior, is a linear combination of
the measurement and the prior information.
Recall that the prior information was, more specifically, the expectation
of the random variable prior to receiving the measurement.
If we consider the Kalman Filter to be a recursive process which applies
a static Bayesian estimation (MMSE) algorithm at each step, we are
compelled to consider the following linear combination.
′ ˆ
ˆ k |k = K k xk |k −1 + K k z k
x
Prior State Observation
Information Information
63. Kalman Filter - Unbiasedness
We start with the proposed linear combination:
′ ˆ
xk |k = K k xk |k −1 + K k z k
ˆ
We wish to ensure that the estimate is unbiased, that is:
E [ ~k |k ] = 0
x
Given the proposed linear combination, we determine the error to be:
~ = [K′ + K H − I ] x + K′ ~ + K w
xk | k k k k k k xk |k −1 k k
Applying the unbiasedness constraint, we have:
E[ ~k |k ] = 0 = [ K k + K k H k − I ] E[ xk ] + K k E[ ~k |k −1 ] + K k E[ wk ]
x ′ ′ x
′
Kk = I − Kk H k
64. Kalman Filter – Kalman Gain
So, we have the following simplified linear combination:
xk |k = xk |k −1 + K k η k
ˆ ˆ
We also desire the filtered error covariance, so that it can be minimized:
[
Pk |k = E ~k |k ~k |k
x x
T
]
Pk |k = ( I − K k H k ) Pk |k −1 ( I − K k H k ) + K k Rk K k
T T
If we minimize the trace of this expression with respect to the gain:
K k = Pk |k −1 H k
T
[H Pk k |k −1
T
H k + Rk ] −1
65. Kalman Filter - Recipe
Extrapolation:
xk |k −1 = Fk −1 ⋅ xk −1|k −1 + Gk −1uk −1
ˆ ˆ
T
Pk |k −1 = Fk −1 ⋅ Pk −1|k −1 ⋅ Fk −1 + Γk −1Qk −1ΓkT−1
Update:
xk |k = xk |k −1 + K kη k
ˆ ˆ
Pk |k = ( I − K k H k ) Pk |k −1 ( I − K k H k ) + K k Rk K k
T T
66. Kalman Filter – Innovations
The innovations are zero-mean, uncorrelated (p. 213) and have covariance:
[ ]
S k = E η kη k = H k Pk |k −1 H k + Rk
T T
The normalized innovation squared or statistical distance is chi-square distributed:
−1
d k2 = η k S k η k ~ χ nz
T 2
So, we expect that the innovations should have a mean and variance of:
{ }
E[ d i2 ] = nz { }
var[ d i2 ] = 2nz
The Kalman Gain can now be written as:
K k = Pk |k −1 H k
T
[H P
k
T
k |k −1 H k + Rk ] −1 T
= Pk |k −1 H k S k
−1
The state errors are correlated:
[
E~ ~
x x T
k |k k −1|k −1 ] = [ I − K H ]F
k k P
k −1 k −1|k −1
67. Kalman Filter – Likelihood Function
We wish to compute the likelihood function given the dynamics model used:
[ ] [ ]
p z k | Z k −1 = p z k | xk |k −1 = N ( z k ; z k |k −1 , S k ) = N ( z k − z k |k −1 ;0, S k ) = N (η k ;0, S k )
ˆ ˆ ˆ
Which has the explicit form:
[
1 T −1
exp− η k S k η k ]
[
Λ k = p z k | Z k −1 ] = 2
det[ 2π S k ]
Alternatively, we can write:
1
exp − d k2
[
Λ k = p z k | Z k −1 ] = 2 ⇒ ln Λ = − 1 d 2 − 1 ln( det[ 2π S ] )
[
det 2π S kr ]
k
2
k
2
k
68. Kalman Filter – Measurement Validation
Suppose our Kalman filter has the following output at a given time step:
5 0 1 0 10
Pk +1|k = H k +1 = ˆ k +1|k =
x
0 16
0 1
15
Suppose that we now receive 3 measurements of unknown origin:
4 0 1 7 2 16 3 19
R i
= z k +1 = , z k +1 = , z k +1 =
0 9
k +1
20 5 25
Evaluate the consistency of these measurements for this Kalman filter model.
This procedure is called gating and is the basis for data association.
( )
1
( )
( )
d k2+1 z k +1 = 2 d k2+1 z k2+1 = 8 d k2+1 zk3+1 = 13
χ 2 ( 95% ) = 6
2
χ 2 ( 99% ) = 9.2
2
69. Kalman Filter – Initialization
The true initial state is a random variable distributed as:
x0 = N ( x0|0 , P0|0 )
ˆ
It is just as important that the initial covariance and estimate realistically
reflect the actual accuracy. Thus, the initial estimate should satisfy:
~ T P −1 ~ ≤ χ 2 ( 95% )
x0|0 0|0 x0|0 nx
If the initial covariance is too small, then the Kalman gain will initially be
small and the filter will take a longer time to converge.
Ideally, the initial state estimate should be within one standard deviation
(indicated by the initial covariance) of the true value. This will lead to
optimal convergence time.
70. Kalman Filter – Initialization
In general, a batch weighted least-squares curve fit can be used (Chapter 3):
[ −1
x0|0 = H init Rinit H init
ˆ T
] −1 T −1
H init Rinit zinit [ T −1
P0|0 = H init Rinit H init ] −1
[ T
zinit = z0 , , z nx −1 ] [ (
H init = H 0 , , H nx −1 Fnx − 2 ) ]
n x −1 T
R0 0
Rinit =
0 Rn −1
x
This initialization will always be statistically consistent so long as the
measurement errors are properly characterized.
71. Kalman Filter – Summary
The Kalman Gain:
– Proportional to the Predicted Error
– Inversely Proportional to the Innovation Error
The Covariance Matrix:
– Independent of measurements
– Indicates the error in the state estimate assuming that all of the
assumptions/models are correct
The Kalman Estimator:
– Optimal MMSE state estimator (Gaussian)
– Best Linear MMSE state estimator (Non-Gaussian)
– The state and covariance completely summarize the past
74. Kalman Filter:
Direct Discrete Time Example
Consider the simplest example of the nearly constant velocity (CV)
dynamics model:
T 2 2 ξ k
1 T
xk = xk −1 + ν k −1 xk = [ ]
E ν k2 = q
0 1 T ξ k
Discrete White
Noise Acceleration
z k = [1 0] xk + wk
[ ]
E wk = r
2
The recursive estimation process is given by the Kalman equations
derived above.
How do we select “q”?
q ≅ amax
2
75. Kalman Filter:
Other Direct Discrete Time Models
For nearly constant acceleration (CA) models, the Discrete Weiner
Process Acceleration (DWPA) model is commonly used:
1 T T 2 2 T 2 2 ξ k
x k = 0 1
T xk −1 + T ν k −1
xk = ξ k [ ]
E ν k2 = q
0 0 1 1 ξk
T 4 4 T 3 3 T 2 2
Qk = qΓk ΓkT = q T 3 3 T 2 T q ≅ ∆amax
2
T 2 2 T 1
Notice the simple relationship between the “q-value” and the physical
parameter that is one derivative higher than that which is estimated.
76. Kalman Filter:
Discretized Continuous-Time Models
These models are derived from continuous time representations using the
matrix superposition integral. Ignoring the control input:
x ( t ) = Ax (t ) + Dv (t )
~ E [ v (t )v (τ )] = q δ ( t − τ )
~ ~ ~
xk = Fk −1 xk −1 + vk −1
where
F =e AT
and
T
~
vk = ∫ e A(T −τ ) Dv (τ ) dτ
0
Thus, the process noise covariance is found by:
[
Qk = E v v = ∫T
k k ] T
0 ∫
0
T
Fk ( T − τ 1 ) D E [ v (τ 1 )v (τ 2 )] DT FkT ( T − τ 2 ) dτ 1dτ 2
~ ~
= ∫ Fk ( T − τ 1 ) D q DT FkT ( T − τ 1 ) dτ 1
~
T
0
77. Kalman Filter:
Discretized Continuous-Time Models
Continuous White Noise Acceleration (CWNA) for CV Model:
~ ~ T 3 3 T 2 2 ~
ξ(t ) = v (t )
Qk = q 2 q ≅ amaxT
2
T 2 T
Continuous Weiner Process Acceleration (CWPA) for CA Model:
T 5 20 T 4 8 T 3 6
~
ξ(t ) = v (t )
~
Qk = q T 4 8 T 3 3 T 2 2
~
q ≅ ∆amax T
2
T3 6 T2 2 T
Singer [IEEE-AES, 1970] developed the Exponentially Correlated
Acceleration (ECA) for CA model (p. 187 & pp.321-324):
~
ξ(t ) = −αξ(t ) + v (t )
78. Kalman Filter:
Time Consistent Extrapolation
So, what is the difference between the Direct Discrete-Time and
Discretized Continuous-Time models for CV or CA models. Which one
should be used?
[ FPF T
+ Q DWNA ] T =2
[ {[
≠ F FPF T + Q DWNA ] }F
T =1
T
+ Q DWNA ] T =1
[ FPF T
+ Q CWNA ] = [ F {[ FPF T
+ Q CWNA ] }F T
+ Q CWNA ]
T =2 T =1 T =1
Thus, for the Continuous-Time model, 2 extrapolations of 1 second yields
the same result as 1 extrapolation of 2 seconds.
In general, the Continuous-Time models have this time consistent
property.
This is because the process noise covariance is derived using the
transition matrix, while the Direct Discrete-Time is arbitrary.
79. Kalman Filter:
Steady State Gains
If we iterate the Kalman equations for the covariance indefinitely, the
updated covariance (and thus the Kalman gain) will reach steady state.
This is only true for Kalman models that have constant coefficients.
In this case, the steady-state solution is found using the Algebraic Matrix
Riccati Equation (pp. 211 & 350):
[
Pss = F Pss − Pss H ( HPss H + R ) HPss F T + Q
T T −1
]
The steady state Kalman gain becomes:
K ss = Pss H T
[ HP H
ss
T
+R ] −1
80. Kalman Filter:
Steady State Biases
If a Kalman filter has reached steady-state, then it is possible to predict
the filter’s bias resulting from un-modeled dynamics.
Consider the CV model with an un-modeled constant acceleration (p. 13):
ξ k T 2 2 α
xk = F xk −1 + Γν k −1 + Gλ xk = G = K ss =
ξk T β T
Un-Modeled
Acceleration
The steady-state error is found to be:
1−α 2
β T
~ = ( I − K H ) F ~ + ( I − K H ) Gλ ⇒ ~ =
xss xss xss λ
ss ss
2α − β
T
β
81. Kalman Filter – Summary #2
The Kalman Gain:
– Reaches steady-state for constant coefficient models
– Can determine steady-state errors for un-modeled dynamics
The Covariance Matrix:
– Is only consistent for when model matches true
– Has no knowledge of the residuals
The Kalman Estimator:
– We need modifications for more general models
– What about non-linear dynamics?
83. Nonlinear Estimation Problems
Previously, all dynamics and measurement models were linear. Now, we
consider a broader scope of estimation problems:
~
x = f ( x , t ) + Du (t ) + v (t )
z (t ) = h( x , t ) + w(t )
Nonlinear Dynamics:
– Ballistic Dynamics (TBM exo-atmospheric, Satellites, etc…)
– Drag/Thust Dynamics (TBM re-entry, TBM Boost, etc…)
Nonlinear Measurements:
– Spherical Measurements
– Angular Measurements
– Doppler Measurements
84. EKF - Nonlinear Dynamics
The state propagation can be done using numerical integration or a
Taylor Series Expansion (linearization):
However, the linearization is necessary in order to propagate the
covariance:
xk ≅ Fk −1 xk −1 + Gk −1uk −1 + Γk −1vk −1
Fk −1 = e [
f ( x ) ( t k −t k −1 )
]
x = xk −1|k −1
ˆ
∂f
≅ I + (t k − t k −1 )
∂x
+
(t k − t k −1 ) 2 ∂ 2 f
2
∂x 2
+
x = xk −1|k −1
ˆ x = xk −1|k −1
ˆ
Jacobian Matrix Hessian Matrix
The state and covariance propagation are precisely as before:
ˆk |k −1 = Fk −1 ⋅ xk −1|k −1 + Gk −1uk −1
x ˆ
T
Pk |k −1 = Fk −1 ⋅ Pk −1|k −1 ⋅ Fk −1 + Γk −1Qk −1ΓkT−1
85. EKF - Nonlinear Measurements
We compute the linearization of the observation function:
h( xk ) = h( xk |k −1 ) + H k ( xk − xk |k −1 ) + H k ( xk − xk |k −1 ) 2 +
ˆ ˆ ˆ
∂h
∂ 2h
Jacobian Matrix Hk = Hk = 2 Hessian Matrix
∂x x = xk|k −1
ˆ ∂x x = x
ˆ
k |k −1
The residual is thus:
∆
ηk = zk − z k |k −1 = z k − h( xk |k −1 ) = H k ~k |k −1 + wk
ˆ ˆ x
The covariance update and Kalman gain are precisely as before (381-
386):
ˆk |k = xk |k −1 + K kη k
x ˆ
Pk |k = ( I − K k H k ) Pk |k −1 ( I − K k H k ) + K k Rk K k
T T
−1
K k = Pk |k −1 H k S k
T
86. Polar Measurements
Previously, we dealt with unrealistic observation models that assumed
that measurements were Cartesian. Polar measurements are more
typical. In this case, the observation function is nonlinear:
r x 2 + y 2
x k = [ xk
yk ]
z k = h( xk ) + wk h( xk ) = = −1 xk
yk T
b tan ( x y )
The Kalman Gain and Covariance Update only require the Jacobian of
this observation function:
x y
0 0
∂h x 2 + y 2 x2 + y2
H = =
∂x y −x
x2 + y2 0 0
x2 + y2
This Jacobian is evaluated at the extrapolated estimate.
87. Ballistic Dynamics
As a common example of nonlinear dynamics, consider the ballistic
propagation equations specified in ECEF coordinates:
[x
y
x z ] T =
y z
[x
2 ω y + ω 2 x + Gx
y − 2ω x + ω 2 y + Gy
z Gz
] T ~
+ Dv (t )
The gravitational acceleration components (to second order) are:
− µ x 3 R 2 z
2
3 1 + J 2 e 1 − 5
R 2 R
R
Gx
− µ y 3 Re 2 z
2
G = G y = 3 1 + J 2 1 − 5
R 2 R
R
G z
− µ z 3 R 2 z
2
3 1 + J 2 e 3 − 5
R 2 R
R
88. Thrust and Re-entry Dynamics
As a common example of nonlinear dynamics, consider the ballistic
propagation equations specified in ECEF coordinates:
[x
y
x z a
y z T
β = ]
x
y
z
~
Ballistic
x ax
+a Ballistic
y ay +a z Gz + a
aβ − β 2 + D v (t )
s s s
The new states are the relative axial acceleration “a” and the relative
mass depletion rate “”:
1
T + C D AC ρ ( v ⋅ v ) m(t )
2
a (t ) = athrust (t ) − adrag (t ) = β (t ) =
m(t ) m(t )
The process noise matrix (if extended to second order) becomes a
function of the speed. Thus, a more rapidly accelerating target while
have more process noise injected into the filter.
89. Pseudo- Measurements
In the case of the TBM dynamics, the ECEF coordinates are the most
tractable coordinates.
However, typically the measurements are in spherical coordinates.
Furthermore, the Jacobian for the conversion from ECEF to RBE is
extremely complicated.
Instead, we can convert the measurements into ECEF as follows:
zk′ = I 3 x 3 xk + w′
k Rk' = J meas Rk J meas
T
However, since this is a linearization, we must be careful to make sure
that this approximation holds.
90. Pseudo- Measurements
The linearization is valid so long as (pp. 397-402):
r σ b2
< 0.4
σr
91. Iterated EKF
The IEKF iteratively computes the state “n” times during a single update.
This recursion is based on re-linearization about the estimate:
∂h ∂h
H ki = where H k0 = i = 0,1,..., n
∂x ˆi
xk|k ∂x ˆ
xk|k −1
The state is updated iteratively with a re-linearized residual and gain:
xk+k1 = xk |k −1 + K kiη ki
ˆi | ˆ
ηki = z k − h( xk |k ) − H ki
ˆi
ˆi
xk|k
[x
ˆ k |k −1 − xk | k
ˆi ]
−1
H i H i Hi + R
T T
K k = Pk |k −1 k ˆ i k
i
Pk |k −1 k ˆ i k
xk|k
ˆi
xk|k xk|k
Finally, the covariance is computed based upon the values of the final
iteration:
+ K n R ( K n )T
T
Pk |k = I − K kn H kn
P I − K n H n
n k |k −1 k k k k k
xk|k
ˆ xk|k
ˆn
93. Why Multiple Models?
When the target dynamics differ from the modeled dynamics, the state
estimates are subject to:
– Biases (Lags) in the state estimate
– Inconsistent covariance output
– In a tracking environment, this increases the probability of mis-
association and track loss
In most tracking applications, the true target dynamics have an unknown
time dependence.
To accommodate changing target dynamics, one can develop multiple
target dynamics models and perform hypothesis testing.
This approach is called hybrid state estimation.
94. Why Multiple Models?
6-State Velocity Error (Meters/Sec) (x=red, y=blue, z=green, True Error=black)
100
Assuming a 80
True state estimate error
Constant
Velocity target 60
dynamics, the
estimation Confidence interval
40
Given by Kalman
errors become Covariance
inconsistent 20
during an
acceleration 0
-20
-40
-60
1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800
95. Why Multiple Models?
9-State Velocity Error (Meters/Sec) (x=red, y=blue, z=green, True Error=black)
200
A Constant True state estimate error
Acceleration
150
Confidence interval
model remains
100 Given by Kalman
consistent, Covariance
however the 50
steady-state
estimation 0
error is larger
-50
-100
-150
-200
1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800
96. Adaptive Process Noise
Since the normalized innovations squared indicate the consistency of
the dynamics model, it can be monitored to detect deviations (pp. 424-
426).
At each update, the perform the following threshold test:
T −1
d k2 = η k S k η k ≥ ε max { }
P d k2 ≥ ε max = α
Then, the process noise value is adjusted such that the statistical distance
is equal to this threshold value:
T −1
η k S k η k = ε max
q
The disadvantage is that false alarms result in sudden increases in error.
We can use a sliding window average of these residuals, however this can
delay the detection of a maneuver (pp. 424-426).
97. State Estimation – Multiple Models
We can further assume that the true dynamics is one of “n” models:
r r r r
xk = Fk −1 xk −1 + vk −1 r = 1,2, , n
r
z kr = H kr xkr + wk −1 r = 1,2, , n
Using Kalman filter outputs, each model likelihood function can be
computed:
[
1 T −1
exp− η kr S kr η kr
2
]
; r = 1,2, , n
Λrk =
[
det 2π S kr ]
At each filter update, the posterior model probabilities “ ki ” are computed
recursively using Baye’s Theorem. The proper output can be selected
using these probabilities (pp. 453-457).
98. State Estimation – SMM
Hypothesis 1
x1
ˆk
Measurement ˆ2
xk Λik µ k −1
i
Hypothesis 2 µ ki = n Hypothesis
∑Λ
j =1
j
k µ j
k −1
Selection
Hypothesis “n” ˆ
xk
ˆn
xk Most Probable
State Estimate
Each Kalman filter is updated independently and has no knowledge about
the performance of any other filter.
This approach assumes that the target dynamics are time-independent
99. State Estimation – IMM
Measurement
Conditional Probability Update/
Hypothesis 1
x1
ˆk
1
ˆ
x
State Estimate Interaction
k −1
ˆ2
xk Probability
ˆ2
xk −1 Hypothesis 2
Updates
Estimate
Mixing
ˆn
xk
ˆ
x n
k −1
ˆ
xk
Hypothesis “n”
IMM Estimate
Each Kalman filter interacts with others just prior to an update
This interaction allows for the possibility of a transition
This approach assumes that the target dynamics will change according to
a Markov process. pp. 453-457.
100. State Estimation – IMM
x1k −1|k −1 , P1k −1|k −1
ˆ x 2 k −1|k −1 , P 2 k −1|k −1
ˆ
µ k −1
1
Interaction 2
µ k −1
x 0,1k −1|k −1 , P 0,1k −1|k −1
ˆ x 0, 2 k −1|k −1 , P 0, 2 k −1|k −1
ˆ
Kalman Prob. Kalman
zk Λ1k Λ2k zk
Filter Updates Filter
x1 k | k , P1 k | k
ˆ x 2 k |k , P 2 k | k
ˆ
{µ 1
k −1 , µ k2−1 }
Estimate
Mixing
ˆ
xk |k , Pk |k
101. State Estimation – Applied IMM
6-State Velocity Error (Meters/Sec) (x=red, y=blue, z=green, True Error=black)
100
IMM adapts to 80
True state estimate error
changes in
Confidence interval
target 60
Given by Kalman
dynamics and 40 Covariance
provides a
consistent 20
covariance 0
during these
transitions -20
-40
-60
-80
-100
1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800
102. State Estimation – IMM Markov Matrix
The particular choice of the Markov Matrix is somewhat of an art.
Just like any filter tuning process, one can choose a Markov Matrix
simply based upon observed performance.
Alternatively, this transition matrix has a physical relationship to the
Mean Sojourn Time of a given dynamics state.
1 1 Tscan
E[ Ni ] = pii = 1 − = 1−
1 − pii E[ N i ] E [τ i ]
103. State Estimation – VSIMM
Air Targets: Adaptive Grid Coordinated Turning Model
TBM Targets: Constant Axial Thrust, Ballistic, Singer ECA
Air IMM
xkAir
ˆ
Measurement
TBM
SPRT
Hypothesis
Selection ˆ
xk
ˆ
x k
TBM IMM
The SPRT is performed as follows:
xkAir ; > T2
ˆ
µ kAir 1− β β
⇒ mixed ; > T1 and < T2 T2 = and T1 =
µk
TBM
α 1−α
ˆ TBM ; < T1
xk