Deriving the solution to Merton's Portfolio Problem (Optimal Asset Allocation and Consumption) using the elegant formulation of Hamilton-Jacobi-Bellman equation.
Ashwin RaoVice President, Data Science & Optimization at Target à Target
1. HJB Equation and Merton’s Portfolio Problem
Ashwin Rao
September 20, 2018
Ashwin Rao HJB and Merton Portfolio September 20, 2018 1 / 14
2. Overview
1 Problem Statement
2 HJB Equation as Optimal Discounted Value Function PDE
3 Reducing the PDE to an ODE
4 Optimal Consumption and Allocation
Ashwin Rao HJB and Merton Portfolio September 20, 2018 2 / 14
3. Informal Problem Statement
You will live for (deterministic) T more years
Current Wealth + PV of Future Income (less Debt) is W0 > 0.
You can invest in (allocate to) n risky assets and a riskless asset
Each asset has known normal distribution of returns
Allowed to long or short any fractional quantities of assets
Trading in continuous time 0 ≤ t < T, with no transaction costs
You can consume any fractional amount of wealth at any time
Dynamic Decision: Optimal Allocation and Consumption at each time
To maximize lifetime-aggregated utility of consumption
Consumption Utility assumed to have constant Relative Risk-Aversion
Ashwin Rao HJB and Merton Portfolio September 20, 2018 3 / 14
4. Problem Notation
For simplicity, we state and solve the problem for 1 risky asset but the
solution generalizes easily to n risky assets.
Riskless asset: dRt = r · Rt · dt
Risky asset: dSt = µ · St · dt + σ · St · dzt (i.e. Geometric Brownian)
µ > r > 0, σ > 0 (for n assets, we work with a covariance matrix)
Wealth at time t denoted by Wt > 0
Fraction of wealth allocated to risky asset denoted by π(t, Wt)
Fraction of wealth in riskless asset will then be 1 − π(t, Wt)
Wealth consumption denoted by c(t, Wt) ≥ 0
Utility of Consumption function U(x) = x1−γ
1−γ for 0 < γ = 1
Utility of Consumption function U(x) = log(x) for γ = 1
γ = (constant) Relative Risk-Aversion −x·U (x)
U (x)
Ashwin Rao HJB and Merton Portfolio September 20, 2018 4 / 14
5. Problem Statement
We write πt, ct instead of π(t, Wt), c(t, Wt) to lighten notation
Balance constraint implies the following process for Wealth Wt
dWt = ((πt · (µ − r) + r) · Wt − ct) · dt + πt · σ · Wt · dzt
At any time t, determine optimal [π(t, Wt), c(t, Wt)] to maximize:
E[
T
t
e−ρ(s−t) · c1−γ
s
1 − γ
· ds +
e−ρ(T−t) · B(T) · W 1−γ
T
1 − γ
| Wt]
where ρ ≥ 0 is the utility discount rate, B(T) is the bequest function
We can solve this problem for arbitrary bequest B(T) but for
simplicity, will consider B(T) = γ where 0 < 1, meaning “no
bequest” (we need this -formulation for technical reasons).
We will solve this problem for γ = 1 (γ = 1 is easier, hence omitted)
Ashwin Rao HJB and Merton Portfolio September 20, 2018 5 / 14
6. Continuous-Time Stochastic Control
Think of this as a continuous-time Stochastic Control problem
The State is (t, Wt)
The Action is [πt, ct]
The Reward per unit time is U(ct)
The Return is the usual accumulated discounted Reward
Find Policy : (t, Wt) → [πt, ct] that maximizes the Expected Return
Note: ct ≥ 0, but πt is unconstrained
Ashwin Rao HJB and Merton Portfolio September 20, 2018 6 / 14
7. Optimal Discounted Value Function
Instead of the usual Value Function (Expected Return from a given
State), we consider the Discounted Value Function
Discounted Value Function is simply the Value Function further
discounted to time 0
We focus on the Optimal Discounted Value Function V ∗(t, Wt)
V ∗
(t, Wt) = max
π,c
E[
T
t
e−ρs · c1−γ
s
1 − γ
· ds +
e−ρT · γ · W 1−γ
T
1 − γ
]
V ∗(t, Wt) satisfies a simple recursive formulation for 0 ≤ t < t1 < T.
V ∗
(t, Wt) = max
π,c
E[V ∗
(t1, Wt1 ) +
t1
t
e−ρs · c1−γ
s
1 − γ
· ds]
Ashwin Rao HJB and Merton Portfolio September 20, 2018 7 / 14
8. HJB Equation for Optimal Discounted Value Function
Rewriting in stochastic differential form, we have the HJB formulation
max
πt ,ct
E[dV ∗
(t, Wt) +
e−ρt · c1−γ
t
1 − γ
· dt] = 0
Use Ito’s Lemma on dV ∗, remove the dzt term since it’s a martingale, and
divide throughout by dt to produce the HJB Equation in PDE form:
max
πt ,ct
[
∂V ∗
∂t
+
∂V ∗
∂W
((πt(µ−r)+r)Wt −ct)+
∂2V ∗
∂W 2
π2
t σ2W 2
t
2
+
e−ρt · c1−γ
t
1 − γ
] = 0
Ashwin Rao HJB and Merton Portfolio September 20, 2018 8 / 14
9. Optimal Allocation and Consumption
Find optimal π∗
t , c∗
t by taking partial derivatives of above HJB expression
with respect to πt and ct, and equate to 0 (first-order conditions).
With respect to πt:
(µ − r) ·
∂V ∗
∂W
+
∂2V ∗
∂W 2
· πt · σ2
· Wt = 0
⇒ π∗
t =
−∂V ∗
∂W · (µ − r)
∂2V ∗
∂W 2 · σ2 · Wt
With respect to ct:
−
∂V ∗
∂W
+ e−ρt
· (c∗
t )−γ
= 0
⇒ c∗
t = (
∂V ∗
∂W
· eρt
)
−1
γ
Ashwin Rao HJB and Merton Portfolio September 20, 2018 9 / 14
10. Optimal Discounted Value Function PDE
Now substitute π∗
t and c∗
t in the maximizing expression of HJB to get the
Optimal Discounted Value Function PDE.
∂V ∗
∂t
−
(µ − r)2
2σ2
·
(∂V ∗
∂W )2
∂2V ∗
∂W 2
+
∂V ∗
∂W
· r · Wt +
γ
1 − γ
· e
−ρt
γ · (
∂V ∗
∂W
)
γ−1
γ = 0
The boundary condition is:
V ∗
(T, WT ) = e−ρT
· γ
·
W 1−γ
T
1 − γ
Don’t forget to check that the second-order conditions are satisfied.
Ashwin Rao HJB and Merton Portfolio September 20, 2018 10 / 14
11. Solving the PDE with a guess solution
Take as a guess solution
V ∗
(t, Wt) = f (t)γ
· e−ρt
·
W 1−γ
t
1 − γ
Then,
∂V ∗
∂t
= (γ · f (t)γ−1
· f (t) − ρ · f (t)γ
) · e−ρt
·
W 1−γ
t
1 − γ
∂V ∗
∂W
= f (t)γ
· e−ρt
· W −γ
t
∂2V ∗
∂W 2
= −f (t)γ
· e−ρt
· γ · W −γ−1
t
Ashwin Rao HJB and Merton Portfolio September 20, 2018 11 / 14
12. PDE reduced to an ODE
Substituting the guess solution in the PDE, we get the simple ODE:
f (t) = ν · f (t) − 1
where
ν =
ρ − (1 − γ) · ((µ−r)2
2σ2γ
+ r)
γ
with boundary condition f (T) = .
The solution to this ODE is:
f (t) =
1+(ν −1)·e−ν(T−t)
ν for ν = 0
T − t + for ν = 0
Ashwin Rao HJB and Merton Portfolio September 20, 2018 12 / 14
13. Optimal Consumption and Allocation
Putting it all together (substituting the solution for f (t)), we get:
π∗
(t, Wt) =
µ − r
σ2γ
c∗
(t, Wt) =
Wt
f (t)
=
ν·Wt
1+(ν −1)·e−ν(T−t) for ν = 0
Wt
T−t+ for ν = 0
V ∗
(t, Wt) =
e−ρt · (1+(ν −1)·e−ν(T−t))γ
νγ ·
W 1−γ
t
1−γ for ν = 0
e−ρt ·
(T−t+ )γ·W 1−γ
t
1−γ for ν = 0
Ashwin Rao HJB and Merton Portfolio September 20, 2018 13 / 14
14. Illuminating Observations
Optimal Allocation π∗(t, Wt) is constant (independent of t and Wt)
Optimal Fractional Consumption c∗(t,Wt )
Wt
depends only on t
Optimal Fractional Consumption as a function of time (= 1
f (t) )
depends on the key quantity ν
Under Optimal Allocation, Expected Portfolio Return = (µ−r)2
σ2γ
+ r
As T → ∞, Optimal Fractional Consumption is the constant ν
HJB Formulation was key and this solution approach provides a
template for similar continuous-time stochastic control problems
Analytical tractability was achieved due to assumptions of:
Normal distribution of asset returns
Constant Relative Risk-Aversion
Frictionless trading
Ashwin Rao HJB and Merton Portfolio September 20, 2018 14 / 14