1. Actuarial Science Reference Sheet
Author: Daniel Nolan
Email: daniel_nolan@msn.com
The purpose of this document is to provide entry-level
actuarial students with a sneak preview of the
mathematics used in the syllabi for SOA exams P and
FM. It can also server as a refresher for students who
have already covered the material in some depth.
1
2. 1 Probability
1.1 Preliminaries
• Indicator Function IA(ω) = I(ω ∈ A), where I(p) =
1 if p
0 if ¬p
• Delta “Function”
1. δ(x) =
∞ if x = 0
0 otherwise
2.
∞
−∞
δ(x)f(x)dx = f(0) for any function f, and in particular
∞
−∞
δ(x)dx = 1
3. δ(x) = du/dx, where u(x) = I(x ≥ 0)
• Gamma and Beta Functions
– Γ(x) =
∞
0
tx−1
e−t
dt, in particular Γ(1) = 1 and Γ(1/2) =
√
π
– Γ(x + 1) = xΓ(x) and therefore Γ(n) = (n − 1)! for any positive integer n
– Γ (x) =
∞
0
tx−1
e−t
log tdt, in particular Γ (1) = −γ, where γ = limn→∞ γn and γn =
n
i=1 1/i − log n
– Incomplete Gamma Function
∗ Ix(y) = 1
Γ(x)
y
0
tx−1
e−t
dt
∗ Ix+1(y) = Ix(y) − yx
e−y
Γ(x+1)
– B(x, y) = Γ(x)Γ(y)/Γ(x + y) = B(y, x)
– Incomplete Beta Function
∗ Ix(r, s) = 1
B(r,s)
∞
0
tr−1
(1 − t)s−1
dt, 0 ≤ x ≤ 1
∗ Ix(r, 1) = xr
and Ix(1, s) = 1 − (1 − x)s
∗ Ix(r, s) = Γ(r+s)xr
(1−x)s−1
Γ(r+1)Γ(s) + Ix(r + 1, s − 1)
• Monotonic Sequences of Sets
– A1 ⊂ A2 ⊂ · · · =⇒ An → A =
∞
i=1 Ai
– A1 ⊃ A2 ⊃ · · · =⇒ An → A =
∞
i=1 Ai
• DeMorgan’s Laws: i∈I Ai
c
= i∈I Ac
i and i∈I Ai
c
= i∈I Ac
i
1.2 Probability Spaces
• Probability Space (Ω, A, P)
– sample space Ω = set of all possible outcomes ω
– events A ∈ A ⊂ P(Ω), where A is a σ-algebra, i.e.
1. ∅ ∈ A
2. A ∈ A =⇒ Ac
∈ A
3. Ai ∈ A, i = 1, 2, . . . =⇒
∞
i=1 Ai ∈ A
– probability measure P : A → R such that
1. P(A) ≥ 0 for all A ∈ A
2. P(Ω) = 1
3. A1, A2, . . . pairwise disjoint =⇒ P (
∞
i=1 Ai) =
∞
i=1 P (Ai)
• Misc. Properties
1
3. – P (∅) = 0
– A ⊂ B =⇒ P(A) ≤ P(B)
– 0 ≤ P(A) ≤ 1
– P(Ac
) = 1 − P(A)
– A ∩ B = ∅ =⇒ P(A ∪ B) = P(A) + P(B)
– P(A ∪ B) = P(A) + P(B) − P(A ∩ B), and in general
P
n
k=1
Ak =
n
k=1
(−1)k−1
I⊂{1,...,n}
|I|=k
P
i∈I
Ai
• Continuity of Probabilities An → A =⇒ P(An) → P(A)
• Independence of Events
– A and B are independent if P(A ∩ B) = P(A)P(B), and in this case we write A B
– a collection {Ai : i ∈ I} is independent if P j∈J Aj = j∈J P(Aj) for every finite subset J of I
– if {Ai : i ∈ I} is independent, then P i∈I Ai = 1 − i∈I[1 − P(Ai)] by DeMorgan
– disjoint events with positive probabilities are not independent
• Conditional Probability
– if P(B) > 0, then probability of A given that B has occured P(A|B) = P(A ∩ B)/P(B)
– P(·|B) satisfies the axioms of probability for a fixed event B
– A B iff P(A|B) = P(A)
– P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)
• Law of Total Probability Let A1, . . . , An be a partition of Ω. Then for any event B, P(B) =
n
i=1 P(B|Ai)P(Ai).
• Bayes’ Theorem Let A1, . . . , An be a partition of Ω such that P(Ai) > 0 for each 1 ≤ i ≤ n. If P(B) > 0, then
posterior
P(Ai|B) =
P(B|Ai)
prior
P(Ai)
n
j=1 P(B|Aj)P(Aj)
1.3 Random Variables (RVs)
• Random Variable X : Ω → R such that {ω : X(ω) ≤ x} ∈ A for all x
• Cumulative Distribution Function (CDF) FX : R → [0, 1] defined by FX(x) = P(X ≤ x)
• Theorem Let X ∼ F and Y ∼ G. If F(x) = G(x) for all x, then P(X ∈ A) = P(Y ∈ A) for every measurable event A.
• Theorem F : R → [0, 1] is the CDF for some probability measure iff F satisfies the following:
1. x < y =⇒ F(x) ≤ F(y) (increasing)
2. F(x) → 0 as x → −∞ and F(x) → 1 as x → ∞ (normalized)
3. F(x) = F(x+
) for all x, where F(x+
) denotes limy→x
y>x
F(y) (right-continuous)
• X discrete if X : Ω → {x1, x2, . . .}, and in this case we define the probability mass function fX(x) = P(X = x)
• X continuous if there exists a function fX such that fX(x) ≥ 0 for all x,
∞
−∞
fX(x)dx = 1, and for every a ≤ b,
P(a ≤ X ≤ b) =
b
a
fX(x)dx
2
4. and in this case fX is called the probability density function (PDF). Note that
FX(x) =
x
−∞
fX(s)ds
and fX(x) = FX(x) wherever FX exists.
• Lemma Let X ∼ F. Then
1. P(X = x) = F(x) − F(x−
), where F(x−
) denotes limy→x
y<x
F(y)
2. P(x ≤ X ≤ y) = F(y) − F(x)
3. S(x) = P(X > x) = 1 − F(x)
4. X continuous =⇒ F(b) − F(a) = P(a < X < b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a ≤ X ≤ b)
• X mixed if X neither discrete nor continuous
– let FX have jump discontinuities at a1, a2, . . .
– define k =
∞
i=1 ki, where ki = FX(ai) − FX(a−
i )
– F(x) = [FX(x) −
∞
i=1 kiu(x − ai)] /(1 − k) is a continuous CDF, hence F = FC for some continuous RV C
– FX(x) = (1−k)FC(x)+
∞
i=1 kiu(x−ai) = (1−k)FC(x)+kFD(x), where D is a discrete RV with P(D = ai) = ki/k
– FX is a weighted average of continuous and discrete CDFs
– fX(x) = (1 − k)fC(x) +
∞
i=1 kiδ(x − ai)
• Quantile Function F−1
: [0, 1] → R defined by F−1
(q) = inf{x : F(x) > q}
– F−1
(1/4) = 1st quartile
– F−1
(1/2) = median
– F−1
(3/4) = 3rd quartile
• Survival and Hazard Functions
– SX(x) = P(X > x) = 1 − FX(x) has the following properties:
1. x < y =⇒ SX(x) ≥ SX(y) (decreasing)
2. SX(x) → 1 as x → −∞ and SX(x) → 0 as x → ∞ (normalized)
3. SX(x) = SX(x+
) (right-continuous)
– λX(x) = limh→0 P(x < X < x + h|X > x)/h = −SX(x)/SX(x) =⇒ SX(x) = exp −
x
−∞
λX(s)ds
• Bivariate Distributions
– F(x, y) = P(X ≤ x, Y ≤ y)
– f(x, y) is such that
1. f(x, y) ≥ 0 for all (x, y) ∈ R2
2. R2 f(x, y)dxdy = 1
3. P((X, Y ) ∈ A) = A
f(x, y)dxdy
• Marginal Distributions
– fX(x) =
∞
−∞
f(x, y)dy
– fY (y) =
∞
−∞
f(x, y)dx
• Independent RVs
– X Y if, for every pair of events A and B, P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B)
– Theorem Let X and Y have joint PDF fX,Y . Then X Y iff fX,Y (x, y) = fX(x)fY (y) for all (x, y) ∈ R2
.
– Theorem Suppose {(X(x), Y (y)) : (x, y) ∈ R2
} is a rectangle. Then f(x, y) = g(x)h(y) =⇒ X Y .
3
5. • Conditional Distributions
– fX|Y (x|y) = fX,Y (x, y)/fY (y), assuming fY (y) > 0
– P(X ∈ A|Y = y) = A
fX|Y (x|y)dx
• Multivariate Distributions and IID Samples
– random vector X = (X1, . . . , Xn) with PDF f(x1, . . . , xn), where X1, . . . , Xn are RVs
– X1, . . . , Xn independent ⇐⇒ P(X1 ∈ A1, . . . , Xn ∈ An) =
n
i=1 P(Xi ∈ Ai) ⇐⇒ f(x1, . . . , xn) =
n
i=1 fXi
(xi)
– X1, . . . , Xn
IID
∼ F signifies that X1, . . . , Xn are independent and identically distributed, each with CDF F
• Transformations of RVs
– univariate: Y = r(X), e.g. Y = X2
or Y = eX
1. Ay = {x : r(x) ≤ y}
2. FY (y) = P(Y ≤ y) = P(X ∈ Ay) = Ay
fX(x)dx
3. fY (y) = FY (y)
– bivariate: Y = r(X1, X2), e.g. Y = X1 + X2 or Y = min{X1, X2}
1. Ay = {(x1, x2) : r(x1, x2) ≤ y}
2. FY (y) = P(Y ≤ y) = P((X1, X2) ∈ Ay) = Ay
fX1,X2
(x1, x2)dx1dx2
3. fY (y) = FY (y)
1.4 Expectation
• Expected Value, or Mean µ = EX = xf(x)dx = S(x)dx, assuming |x|f(x)dx < ∞
• Y = r(X) =⇒ EY = r(x)f(x)dx
• E|X|k
< ∞ and j < k =⇒ E|X|j
< ∞
• E (
n
i=1 aiXi) =
n
i=1 aiEXi
• X1, . . . , Xn independent =⇒ E (
n
i=1 Xi) =
n
i=1 EXi
• Geometric Expectation
– for any positive RV X, Eg[X] = exp E[log X]
– X discrete =⇒ Eg[X] =
n
i=1 xpi
i , where pi = P(X = xi)
– (Arithmetic-Geometric Means Inequality) Eg[X] ≤ EX
– log Eg[X] = E(log X) and exp EX = Eg[exp X]
• kth Moment = EXk
and kth central moment = E(X − µ)k
• Variance σ2
= VX = E(X − µ)2
– σ2
= E(X2
) − µ2
– V(aX + b) = a2
VX
– X1, . . . , Xn independent =⇒ V (
n
i=1 aiXi) =
n
i=1 a2
VXi
• Standard Deviation sd(X) =
√
VX
• Sample Mean Xn = 1
n
n
i=1 Xi
• Sample Variance s2
n = 1
n−1
n
i=1(Xi − Xn)2
• Theorem Let X1, . . . , Xn be IID. Then EXn = µ, VXn = σ2
/n, and Es2
n = σ2
.
• Covariance Cov(X, Y ) = E[(X − µX)(Y − µY )] = E(XY ) − E(X)E(Y )
• Correlation ρ(X, Y ) = Cov(X, Y )/σXσY
4
6. • Theorem The correlation satisfies −1 ≤ ρ ≤ 1. If Y = aX + b, then ρ = sgn(a). If X Y , then Cov(X, Y ) = 0 and
therefore ρ = 0 as well.
• V ( i aiXi) = i a2
i VXi + i j<i aiajCov(Xi, Xj)
• Skewness
– γ = EZ3
, where Z = (X − µ)/σ
– γ = 0 if X is symmetric, i.e. if f(µ + x) = f(µ − x) for all x
– Y = aX + b =⇒ γY = sgn(a)γX
– Y = X1 + X2 =⇒ γY = (γ1σ3
1 + γ2σ3
2)/σ3
Y if X1 X2
• Multivariate Expectation
– random vector X = (X1, . . . , Xn)
– µX = (µ1, . . . , µn) , where µi = EXi
– variance-covariance matrix Σ defined by Σij = Cov(Xi, Xj), in particular Σii = VXi
– Lemma If a is a vector and X is a random vector with mean µ and variance Σ, then E(a X) = a µ and
V(a X) = a Σa. If A is a matrix, then E(AX) = Aµ and V(AX) = AΣA .
• Conditional Expectation
– E(X|y) = xfX|Y (x|y)dx and E[r(X, Y )|y] = r(X, y)fX|Y (x, y)dx
– whereas EX is a number, E(X|y) is a function of y
– Rule of Iterated Expectations For RVs X and Y , assuming the expectations exist, we have that E[E(X|Y )] =
EX. More generally, for any function r(X, Y ),
EE[(r(X, Y )|X] = E[r(X, Y )]
– V(X|y) = [x − µ(y)]2
fX|Y (x, y)dx, where µ(y) = E(X|y)
– Theorem For any RVs X and Y , we have
VX = EV(X|Y ) + VE(X|Y )
• Moment Generating Function
– MX(t) = E etX
= E 1 + X + (Xt)2
2! + · · ·
– M
(n)
X (0) = EXn
, n = 0, 1, 2, . . .
– Y = aX + b =⇒ MY (t) = ebt
MX(at)
– Y = X1 + X2 =⇒ MY (t) = M1(t)M2(t) if X1 X2
• Cumulant Generating Function
– ψX(t) = log MX(t)
– ψ
(n)
X (0) =
0 n = 0
µ n = 1
σ2
n = 2
σ3
γ n = 3
– Y = aX + b =⇒ ψY (t) = btψX(at)
– Y = X1 + X2 =⇒ ψY (t) = ψ1(t) + ψ2(t) if X1 X2
5
7. 1.5 Important Distributions
1.5.1 Discrete Distributions
• Point Mass Distribution X ∼ δa
– PMF: f(x) = I(x = a)
– CDF: F(x) = I(x ≥ a)
– EX = a and VX = 0
• Uniform Distribution X ∼ Uniform{x1, . . . , xn}
– PMF: f(x) =
1/n x = x1, . . . , xn
0 otherwise
– CDF: F(x) = 1
n
n
i=1 I(x ≥ xi)
– EX = 1
n
n
i=1 xi and VX = 1
n
n
i=1 x2
i − 1
n
n
i=1 xi
2
• Bernoulli Distribution X ∼ Bernoulli(p)
– X represents outcome of single trial, where P(success) = p
– PMF: f(x) =
px
(1 − p)1−x
if x = 0 or x = 1
0 otherwise
– CDF: F(x) = (1 − p)I(x ≥ 0) + pI(x ≥ 1)
– MGF: MX(t) = pet
+ (1 − p)
– EX = p and VX = p(1 − p)
• Binomial Distribution X ∼ Binomial(n, p)
– X represents number of successes in n independent Bernoulli trials, each with P(success) = p
– PMF: f(x) = n
x px
(1 − p)n−x
, x = 0, 1, . . . , n
– MGF: MX(t) = [pet
+ (1 − p)]n
– EX = np and VX = np(1 − p)
– X ∼ Binomial(m, p) and Y ∼ Binomial(n, p) and X Y =⇒ X + Y ∼ Binomial(m + n, p)
• Poisson Distribution X ∼ Poisson(λ)
– X represents the number of occurences of a rare event during some fixed time period in which the expected number
of occurences is λ and individual occurences are independent of each other
– Poisson RVs are used in the insurance industry to represent the number of claims in a large group of policies for
which the expected number of claims is known and claims occur independently and infrequently
– PMF: f(x) = e−λ
λx
/x!, x = 0, 1, 2, . . .
– MGF: MX(t) = exp[λ(et
− 1)]
– EX = VX = λ and γ = 1/
√
λ
– X ∼ Poisson(λ) and Y ∼ Poisson(µ) and X Y =⇒ X + Y ∼ Poisson(λ + µ)
• Negative Binomial Distribution X ∼ NB(r, p)
– X represents the number of failures that occur in a sequence of independent Bernoulli trials before the rth success,
where P(success) = p in each of the trials
– PMF: f(x) = Γ(r+x)
Γ(r)Γ(x+1) pr
(1 − p)x
, x = 0, 1, 2, . . .
– MGF: MX(t) = p
1−(1−p)et
r
, (1 − p)et
< 1
– EX = r(1 − p)/p and VX = r(1 − p)/p2
6
8. – X ∼ NB(r, p) and Y ∼ NB(s, p) and X Y =⇒ X + Y ∼ NB(r + s, p)
• Geometric Distribution X ∼ Geometric(p)
– X represents the number of failurs that occur in a sequence of independent Bernoulli trials before the first success,
where P(success) = p in each of the trials
– PMF: f(x) = p(1 − p)x
, x = 0, 1, 2, . . .
– MGF: MX(t) = p/[1 − (1 − p)et
], (1 − p)et
< 1
– EX = (1 − p)/p and VX = (1 − p)/p2
– P(X > s + t|X > t) = P(X > s) for all positive integers s and t
– X1, . . . , Xr
IID
∼ Geometric(p) =⇒
r
i=1 Xi ∼ NB(r, p)
1.5.2 Continuous Distributions
• Exponential Distribution X ∼ Exponential(λ)
– X represents one of the following:
∗ time until first arrival when arrivals are such that the number of arrivals in [0, t] is Poisson(λt)
∗ lifetime of an item that does not age
– PDF: f(x) =
λe−λx
x ≥ 0
0 x < 0
– MGF: MX(t) = λ/(λ − t), t < λ
– EX = 1/λ, VX = 1/λ2
, and γ = 2
– X ∼ Exponential(λ) =⇒ aX ∼ Exponential(λ/a), where a > 0
– X, Y
IID
∼ Exponential(λ) =⇒ X + Y ∼ Gamma(2, λ)
– X ∼ Exponential(λ) and Y ∼ Exponential(µ) and X Y =⇒ min(X, Y ) ∼ Exponential(λ + µ)
• Gamma Distribution X ∼ Gamma(r, λ)
– X has the following interpretations when r is a positive integer:
∗ time until rth arrival when arrivals are such that the number of arrivals in [0, t] is Poisson(λt)
∗
r
i=1 Xi, where X1, . . . , Xr
IID
∼ Exponential(λ)
– PDF: f(x) = λr
Γ(r) xr−1
e−λx
, x > 0
– MGF: MX(t) = [λ/(λ − t)]r
, t < λ
– EX = r/λ, VX = r/λ2
, and γ = 2/
√
r
– X ∼ Gamma(r, λ) =⇒ aX ∼ Gamma(r, λ/a)
– X ∼ Gamma(r, λ) and Y ∼ Gamma(s, λ) and X Y =⇒ X + Y ∼ Gamma(r + s, λ)
• Beta Distribution X ∼ Beta(α, β)
– typically used as prior distributions in Bayesian statistics
– PDF: f(x) = Γ(α+β)
Γ(α)Γ(β) xα−1
(1 − x)β−1
, 0 < x < 1
– MGF: MX(t) = 1 +
∞
n=1
n−1
k=0
α+k
α+β+k
tn
n!
– EX = α/(α + β), VX = αβ/(α + β)2
(α + β + 1)
• Pareto Distribution X ∼ Pareto(α, β)
– similar to exponential, except with heaver tail
– PDF: f(x) = α
β (1 + α/β)−(α+1)
7
9. – EXk
= βk
k!
(α−1)···(α−k) , k < α, in particular EX = βα/(β − 1), β > 1 and VX = αβ
/(β − 1)2
(β − 2), β > 2
– X ∼ Pareto(α, β) =⇒ bX ∼ Pareto(α, bβ)
• Weibull Distribution X ∼ Weibull(α, β)
– X has the following interpretations:
∗ lifetime of an item whose instantaneous risk of failure is given by a power function, i.e. λX(t) = ktβ−1
∗ positive power of an exponential RV, in particular αY 1/β
, where Y ∼ Exponential(1)
– PDF: f(x) = β
α (x/α)β−1
exp[−(x/α)β
], x ≥ 0
– EXk
= αk
Γ(1 + k/β)
– X ∼ Weibull(α, β) =⇒ aX ∼ Weibull(aα, β) and Xr
∼ Weibull(αr
, β/r)
• DeMoivre Distribution X ∼ DeMoivre(ω)
– X represents continuous quantities whose values we consider to be “equally likely” in the sense that all intervals
of the same length have equal probability, or lifetimes for which failures are uniformly distributed
– PDF: f(x) =
1/ω 0 < x < ω
0 otherwise
– MGF: MX(t) = (etω
− 1)/tω
– EX = ω/2 and VX = ω2
/12
– X ∼ DeMoivre(ω) =⇒ aX ∼ DeMoivre(aω)
• Normal Distribution X ∼ Normal(µ, σ2
)
– X has the following interpretations:
∗ continuous analog of binomial RV with p = 1/2
∗ measurements of a continuous quantity in a scientific experiment
∗ limiting distribution for sum of any collections of IID RVs
– PDF: ϕ(x) = (2πσ2
)−1/2
exp[(x − µ)2
/2σ2
]
– CDF: Φ(x) =
x
−∞
ϕ(t)dt
– MGF: MX(t) = exp(µt + σ2
t2
/2)
– EX = µ and VX = σ2
– X ∼ Normal(µ, σ2
) =⇒ aX + b ∼ Normal(aµ + b, a2
σ2
), in particular (X − µ)/σ ∼ Normal(0, 1)
– X ∼ Normal(µ1, σ2
1) and Y ∼ Normal(µ2, σ2
2) and X Y =⇒ X + Y ∼ Normal(µ1 + µ2, σ2
1 + σ2
2)
• Student’s t-distribution X ∼ tp
– similar to Normal distribution, except with heavier tails (Normal corresponds to tp with p = ∞)
– PDF: f(x) =
Γ(p+1
2 )
Γ(p/2) (1 + x2
/p)−(p+1)/2
– EX = 0, p > 1 and VX = p/(p − 2), p > 2
• Log-normal Distribution X ∼ Log-normal(µ, σ2
)
– X represents the limiting distribution of the product of any collection of positive IID RVs
– PDF: f(x) = 1
xσ
√
2π
exp[−(log x − µ)/2σ2
], x ≥ 0
– EXk
= exp(µk + σ2
k2
/2), in particular EX = exp(µ + σ2
/2) and VX = (exp σ2
− 1) exp(2µ + σ2
)
– X ∼ Log-normal(µ, σ) =⇒ aXb
∼ Log-normal(log a + bµ, b2
σ2
)
– X ∼ Log-normal(µ1, σ2
1) and Y ∼ Log-normal(µ2, σ2
2) and X Y =⇒ XY ∼ Log-normal(µ1 + µ2, σ2
1 + σ2
2)
• Chi-square X ∼ χ2
p
8
10. – if Z1, . . . , Zp
IID
∼ Normal(0, 1), then
p
i=1 Z2
i ∼ χ2
p
– PDF: f(x) = 1
Γ(p/2)2p/2 xp/2−1
e−x/2
, x > 0
– MGF: MX(t) = (1 − 2t)−p/2
, t < 1/2
– EX = p and VX = 2p
– X ∼ χ2
p and Y ∼ χ2
q and X Y =⇒ X + Y ∼ χ2
p+q
1.5.3 Multivariate Distributions
• Multinomial X ∼ Multinomial(n, p), where X = (X1, . . . , Xk) and p = (p1, . . . , pk)
– X summarizes the results of a sequence of n identical random experiments with k outcomes each; specifically, if
Y1, . . . , Yn are IID RVs with P(Yi = j) = pj, then Xj =
n
i=1 I(Yi = j) ∼ Binomial(n, pj)
–
k
i=1 Xi = n and
k
i=1 pi = 1 and Cov(Xi, Xj) = −npipj
– PMF: f(x) = n
x1...xk
k
i=1 pxi
i , where n
x1...xk
= n!
x1!···xk!
– MGF: MX(t1, . . . , tk) =
k
i=1 pieti
n
• Multivariate Normal X ∼ Normal(µ, Σ)
– PDF: f(x) = 1√
(2π)k|Σ|
exp −1
2 (x − µ) Σ−1
(x − µ)
– Theorem If Z ∼ Normal(0, I) and X = µ + Σ1/2
Z, then X ∼ Normal(µ, Σ). Conversely, if X ∼ Normal(µ, Σ),
then Σ−1/2
(X − µ) ∼ Normal(0, I).
– Theorem Let X ∼ Normal(µ, Σ). Suppose we partition X = (Xa, Xb). We can partition µ = (µa, µb) as well as
Σ =
Σaa Σab
Σba Σbb
We have the following:
1. Xa ∼ Normal(µa, Σaa)
2. Xb|Xa = xa ∼ Normal(µb + ΣbaΣ−1
aa (xa − µa), Σbb − ΣbaΣ−1
aa Σab)
3. a X ∼ Normal(a µ, a Σa)
4. (X − µ) Σ−1
(X − µ) ∼ χ2
k
1.5.4 Overview
Table 1: Overview of Important Distributions
Distribution f(x) EX VX MX(t)
Bernoulli(p) px
(1 − p)1−x
x = 0, 1 p p(1 − p) pet
+ (1 − p)
Binomial(n, p) n
x px
(1 − p)n−x
x = 0, 1, . . . , n np np(1 − p) [pet
+ (1 − p)]
n
Poisson(λ) e−λ
λx
/x! x = 0, 1, . . . λ λ exp[λ(et
− 1)]
Geometric(p) p(1 − p)x
x = 0, 1, . . . (1 − p)/p (1 − p)/p2
p/[1 − (1 − p)et
] t < log 1
1−p
Exponential(λ) λe−λx
x ≥ 0 1/λ 1/λ2
λ/(λ − t) t < λ
Gamma(r, λ) λr
Γ(r) xr−1
e−λx
x > 0 r/λ r/λ2
[λ/(λ − t)]r
t < λ
Normal(µ, σ2
) (2πσ2
)−1/2
exp (x−µ)2
2σ2 µ σ2
exp µt + σ2
t2
2
tp
Γ(p+1
2 )
Γ(p/2) (1 + x2
/p)−(p+1)/2
0 p/(p − 2)
χ2
p
1
Γ(p/2)2p/2 xp/2−1
e−x/2
x > 0 p 2p (1 − 2t)−p/2
t < 1/2
9
11. q q q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q q q q q q
0 10 20 30 40
0.000.050.100.150.200.25
Binomial
x
PMF
q n = 40, p = 0.3
n = 30, p = 0.6
n = 25, p = 0.9
q
q
q
q
q
q
q
q q q q
0 2 4 6 8 10
0.00.20.40.60.8
Geometric
x
PMF
q p = 0.2
p = 0.5
p = 0.8
q q
q
q
q
q q q q q q q q q q q q q q q q
0 5 10 15 20
0.00.10.20.3
Poisson
x
PMF
q λ = 1
λ = 4
λ = 10
0 1 2 3 4 5
0.00.51.01.52.0
Exponential
x
PDF
β = 2
β = 1
β = 0.4
0 5 10 15 20
0.00.10.20.30.40.5
Gamma
x
PDF
α = 1, β = 2
α = 2, β = 2
α = 3, β = 2
α = 5, β = 1
α = 9, β = 0.5
−4 −2 0 2 4
0.00.20.40.60.8 Normal
x
φ(x)
µ = 0, σ2
= 0.2
µ = 0, σ2
= 1
µ = 0, σ2
= 5
µ = −2, σ2
= 0.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Log−normal
x
PDF
µ = 0, σ2
= 3
µ = 2, σ2
= 2
µ = 0, σ2
= 1
µ = 0.5, σ2
= 1
µ = 0.25, σ2
= 1
µ = 0.125, σ2
= 1
0 2 4 6 8
0.00.10.20.30.40.5
χ2
x
PDF
k = 1
k = 2
k = 3
k = 4
k = 5
1.6 Inequalities
• Markov’s Inequality Let X be a non-negative RV such that EX exists. Then for any t > 0, P(X > t) ≤ EX/t.
• Chebyshev’s Inequality P(|X − µ| ≥ t) ≤ σ2
/t2
, in particular P(|Z| ≥ t) ≤ 1/t2
• Hoeffding’s Inequality
– Let X1, . . . , Xn
IID
∼ Bernoulli(p). Then for any > 0, P(|Xn − p| > ) ≤ 2 exp(−2n 2
).
– n = log(2/α)/2n =⇒ P(|Xn − p| > n) ≤ α
• Mill’s Inequality P(|Z| > t) ≤ 2/π exp(−t2
/2)/t
• Cauchy-Schwartz Inequality If X and Y both have finite variance, then E|XY | ≤
√
EX2EY 2.
• Jensen’s Inequality Let Y = g(X). If g is convex, then Eg(X) ≥ g(EX). If g is concave, then Eg(X) ≤ g(EX).
10
12. 2 Financial Mathematics
• Simple interest is more beneficial to the lender for fractions of a conversion period, i.e. for 0 < t < 1.
Proof. Let i and t both belong to (0, 1). Then
(1 + i)t
=
∞
k=0
ak = 1 + it +
∞
k=1
bk
where
ak =
t(t − 1) · · · (t − k + 1)
k!
ik
and
bk = a2k + a2k+1
Note that ti2k
/(2k)! is positive, while
(t − 1) · · · (t − 2k + 1)
odd no. of neg. terms
is negative, so a2k < 0. Note also that ti2k+1
/(2k + 1)! is positive, and so is
(t − 1) · · · (t − 2k)
even no. of neg. terms
hence a2k+1 > 0, and therefore
a2k+1 = |a2k| ·
t − 2k
2k + 1
i
= |a2k| ·
2k − t
2k + 1
· |i|
< |a2k|
Finally,
bk = −|a2k| + a2k+1 < 0
so we have
(1 + i)t
= 1 + it +
∞
k=1
bk < 1 + it
which completes the proof.
• Force of Interest δt = d
dt log A(t) =⇒ A(t) = A(0) exp
t
0
δsds
• Determination of Time (See Table 2)
– Exact Interest “actual/actual”
– Ordinary Interest “30/360” (∆t = 360∆y + 30∆m + ∆d)
– Banker’s Rule “actual/360” (always more beneficial to lender)
• Rule of 72: the amount of time it takes an investment to double at a given rate of interest is approximately 0.72/i.
(Most accurate for rates between 0.04 and 0.1.)
• an i =
n
t=1 νt
= 1−(1+i)−n
i
• sn i = an i(1 + i)n
= (1+i)n
−1
i
• ¨an =
n−1
t=0 νt
= (1 + i)an
• ¨sn = (1 + i)sn
• 1
an
= 1
sn
+ i (relevant to sinking funds)
12
14. • Let an i = g. Then
i ≈
2(n − g)
g(n + 1)
If instead sn i = g, then
i ≈
2(g − n)
g(n − 1)
• Increasing Annuities
1. arithmetically increasing annuities
0 1 2 3 n − 1 n
P P + Q P + 2Q P + (n − 2)Q P + (n − 1)Q
Psn| + Q
sn|−n
i Pan| + Q
an|−nνn
i
(a) (Ia)n =
¨an −nνn
i (Is)n =
¨sn −n
i
(b) (Da)n =
n−an
i (Ds)n =
n(1+i)n
−sn
i
2. geometrically increasing annuities
0 1 2 3 n − 1 n
P P(1 + r) P(1 + r)2
P(1 + r)n−2
P(1 + r)n−1
P
1−(1+r
1+i )
n
i−r
P (1+i)n−(1+r)n
i−r
• See Table 3
Table 3: Amortization Schedule for a Loan of an Repaid Over n Periods
Period Pmt amount Interest paid Principal repaid Balance
0 an
1 1 1 − νn
νn
an−1
2 1 1 − νn−1
νn−1
an−2
· · · · ·
· · · · ·
· · · · ·
t 1 1 − νn−t+1
νn−t+1
an−t
· · · · ·
· · · · ·
· · · · ·
n − 1 1 1 − ν2
v2
a1
n 1 1 − ν ν 0
Total n n − an an
• See Table 4
14
15. Table 4: Bond Terminology
P price of bond
F par value
C redemption value
r coupon rate
Fr amount of coupon
g modified coupon rate, i.e. rate such that Fr = Cg
i yield rate, often called yield to maturity (YTM)
n no. of coupon payment periods remaining
K present value, computed at yield rate, of redemption value
G base amount of bond, i.e. amount such that Gi = Fr
• It is customary to use semiannual compounding so that an 8% bond has r = 0.08/2 = 0.04.
• P = Fran + Cνn
basic
= C + (Fr − Ci)an
premium/discount
= G + (C − G)νn
base amount
= K +
g
i
(C − K)
Makeham
• Premium = P − C = C(g − i)an
• See Table 5
Table 5: Bond Amortization Schedule for a $1 n-period Bond with Coupons at g Bought to Yield i.
Period Coupon Interest Earned Principal Adjustment Book Value
0 1 + (g − i)an
1 g i[1 + (g − i)an ] (g − i)νn
1 + (g − i)an−1
2 g i[1 + (g − i)an−1 ] (g − i)νn−1
1 + (g − i)an−2
· · · · ·
· · · · ·
· · · · ·
t g i[1 + (g − i)an−t+1 ] (g − i)νn−t+1
1 + (g − i)an−1
· · · · ·
· · · · ·
· · · · ·
n − 1 g i[1 + (g − i)a2 ] (g − i)ν2
1 + (g − i)a1
n g i[1 + (g − i)a1 ] (g − i)ν 1
Total ng ng − p p = (g − i)an
• See Table 6
Table 6: Valuation Between Coupon Payment Dates. Bf
t+k = Bm
t+k + Frk
Method Flat Price Bf
t+k Accrued Coupon Frk Market Price Bm
t+k
Theoretical Bt(1 + i)k
Fr (1+i)k
−1
i Bt(1 + i)k
− Fr (1+i)k
−1
i
Practical Bt(1 + ik) kFr Bt(1 + ik) − kFr
Semi-theoretical Bt(1 + i)k
kFr Bt(1 + i)k
− kFr
15
16. • Approximate YTM: i ≈
g− k
n
1+ n+1
2n k
≈
g − k/n
1 + k/2
Bond
Salesman’s
Formula
, where k = P −C
C
• Callable Bonds
1. If i < g, i.e. if the bond sells at a premium, then assume the redemption date will be the earliest possible date.
2. If i > g, i.e. if the bond sells at a discount, then assume the redemption date will be the latest possible date.
• If 1. i > −1 exists such that NPV = 0, and 2. for such i, Bt > 0 for t = 0, 1, . . . , n − 1, then i is unique. (Bt denotes
the outstanding investment balance at time t.)
• See Figure 1
0 1 2 n − 1 n
i i i i
1
1 + isn|j
(a) An investment of 1 for n periods at rate i. The interest
is reinvested at rate j.
0 1 2 3 n − 1 n
i 2i (n − 2)i (n − 1)i
1 1 1 1 1
n + (Is)n−1|j
(b) An investment of 1 at the end of each period for n periods, at
rate i. The interest is reinvested at rate j.
Figure 1: Examples Involving Reinvestment Rates
• See Table 7
Table 7: Interest Measurement Terminology
A beginning balance
B ending balance
I amount of interest earned during the period
Ct net amount of principal contributed at time t (0 ≤ t ≤ 1)
C total amount of principal contributed during the period, i.e. C = t Ct
aib amount of interest earned by 1 invested at time b over the following period of
length a, where a + b ≤ 1
• B = A + C + I
• iDW
≈ I
A+ t Ct(1−t) ≈ 2I
A+B−I , assuming that on average net principal contributions occur at time t = 1/2
• iT W
=
m
k=1(1 + jk) − 1, where 1 + jk =
Bk
Bk−1+Ck−1
• See Table 8
• 1 + i = 1+i
1+r =⇒ i = i−r
1+r ≈ i − r, where r denotes inflation, and i is called the real rate of interest
• PV of ordinary annuity for which payments are indexed to reflect inflation: R(1 + r)
1−(1+r
1+i )
n
i−r = Ran i
• Normal Yield Curve (Increasing)
– Expectations Theory
– Liquidity Preference Theory
– Inflation Premium Theory
• Inverted Yield Curve (Decreasing): Fed may set high short-term rates in order to fight inflation or to remove excess
liquidity from the economy. Long-term rates may be lower due to expectations of inflation or the possibility of a
recession.
• (1 + sn)n
= (1 + sn−1)n−1
(1 + fn−1) =
n−1
k=0 (1 + fk)
16
17. Table 8: Illustration of Investment Year Method.
z 8.00 8.10 8.10 8.25 8.30 8.10 z +5
z +1 8.25 8.25 8.40 8.50 8.50 8.35 z +6
z +2 8.50 8.70 8.75 8.90 9.00 8.60 z +7
z +3 9.00 9.00 9.10 9.10 9.20 8.85 z +8
Calendar year
of portfolio
rate y +5
Calendar year
of original
investment y
Investment year rates Portfolio
rates
i y +5i y (1) i y (2) i y (3) i y (4) i y (5)
z +4 9.00 9.10 9.20 9.30 9.40 9.10 z +9
z +5 9.25 9.35 9.50 9.55 9.60 9.35 z +10
z +6 9.50 9.50 9.60 9.70 9.70
z +7 10.00 10.00 9.90 9.80
z +8 10.00 9.80 9.70
z +9 9.50 9.50
z +10 9 00z +10 9.00
• Method of Equated Time ¯t =
n
t=1 tRt/
n
t=1 Rt
• Macaulay Duration ¯d =
n
t=1 tνt
Rt/
n
t=1 νt
Rt
– i = 0 =⇒ ¯d = ¯t
– ∂ ¯d/∂i < 0
– If there is only one future cash flow, then ¯d is the time at which it occurs.
– See Figure 2
– Duration of Level Annuity: R(Ia)n
– Duration of a Coupon Bond: Fr(Ia)n + nCνn
Figure 2: Duration exhibits discontinuities on payment dates.
• Volatility (Modified Duration) ¯v = −P (i)/P(i) = ¯d/(1 + i), where P(i) =
n
t=1(1 + i)−t
Rt.
• continuous compounding =⇒ ¯v = ¯d
• Convexity ¯c = P (i)/P(i)
• P(i + h) ≈ P(i) 1 − h¯v + h2
2 ¯c
• d¯v
di = ¯v2
− ¯c
• Interest Sensitive Cash Flows
– Assume the following quantities are known:
17
18. P(i) = current price at yield rate i
P(i + h) = price if yield rate increases by h
P(i − h) = price if yield rate decreases by h
– Effective Duration ¯de ≈ P (i−h)−P (i+h)
2hP (i)
– Effective Convexity ¯ce ≈ P (i−h)−2P (i)+P (i+h)
h2P (i)
– P(i ± h) ≈ P(i) 1 h ¯de + h2
2 ¯ce
• For a portfolio consisting of m securities:
– P =
m
k=1 Pk
– ¯v =
m
k=1
Pk
P ¯vk
– ¯c =
m
k=1
Pk
P ¯ck
• Redington Immunization
– yield curve assumed to be flat
– Rt = At − Lt for t = 1, 2, . . . , n
– P(i) = 0
– P(i + h) = P(i) + hP (i) + h2
2 P (ξ), where 0 < |ξ| < |h|
– Choose asset portfolio such that P (i) = 0 and P (i) > 0, i.e. such that
◦ PV of assets equals PV of liabilities
◦ ¯vA = ¯vL
◦ ¯cA > ¯cL
– The value of the resulting portfolio increases under small changes in the interest rate.
• Full Immunization
– use force of interest δ equivalent to i
– liability Lk at time k
– hold two assets providing cash inflows of A at time k − a and B at time k + b
– solve the following equations simultaneously:
P(δ) = Aeaδ
+ Be−bδ
− Lk = 0
P (δ) = Aaeaδ
− Bbe−bδ
= 0
– If the two known quantities are: (1) a, b; (2) B, b; (3) A, a; or (4) A, b; then a unique solution exists. However,
for the cases: (5) a, B; and (6) A, B; a unique solution fails to exist. (In cases 5 and 6, solutions may be several
or nonexistant.)
– repeat the process for each liability Lk
18