The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
А.Н. Ширяев "Обзор современных задач об оптимальной обстановке"
1. Albert N. SHIRYAEV
Steklov Mathematical Institute
and
Lomonosov Moscow State University
Lectures on some specific topics of
STOCHASTIC OPTIMIZATION
on the filtered probability spaces
via OPTIMAL STOPPING
e-mail: albertsh@mi.ras.ru
i
2. INTRODUCTION: (θ, τ )- and (G, τ )-problems
TOPIC I: QUICKEST DETECTION PROBLEMS:
Discrete time. Infinite horizon
TOPIC II: QUICKEST DETECTION PROBLEMS:
Continuous time. Infinite horizon
TOPIC III: QUICKEST DETECTION PROBLEMS:
Filtered probability-statistical
experiments. Finite horizon
TOPIC IV: QUICKEST DETECTION PROBLEMS:
Case of expensive cost of observations
ii
3. TOPIC V: CLASSICAL AND RECENT RESULTS
on the stochastic differential equations
TOPIC VI: OPTIMAL STOPPING (OS) THEORY.
Basic formulations, concepts and
methods of solutions
TOPIC VII: LOCAL TIME APPROACH to the problems of
testing 3 and 2 statistical hypotheses
TOPIC VIII: OPTIMAL 1-TIME REBALANCING STRATEGY
and stochastic rule “Buy & Hold” for the
Black–Scholes model. Finite horizon
TOPIC IX: FINANCIAL STATISTICS, STOCHASTICS,
AND OPTIMIZATION
iii
4. TOPIC I: QUICKEST DETECTION PROBLEMS:
Discrete time. Infinite horizon
§ 1. W. Shewhart and E. Page’s works
§ 2. Definition of the θ-model and Bayesian G-model in the
quickest detection
§ 3. Four basic formulations (Variants A, B, C, D) of the quickest
detection problems for the general θ - and G-models
§ 4. The reduction of Variants A and B to the standard form.
Discrete-time case
§ 5. Variant C and D: lower estimates for the risk functions
§ 6. Recurrent equations for statistics πn, ϕn, ψn, γn
§ 7. Variants A and B: solving the optimal stopping problem
§ 8. Variants C and D: around the optimal stopping times
iv
5. TOPIC II: QUICKEST DETECTION PROBLEMS:
Continuous time. Infinite horizon
§ 1. INTRODUCTION
§ 2. Four basic formulations (VARIANTS A, B, C, D) of
the quickest detection problems for the Brownian case
§ 3. VARIANT A
§ 4. VARIANT B
§ 5. VARIANT C
§ 6. VARIANT D
v
6. TOPIC III: QUICKEST DETECTION PROBLEMS:
Filtered probability-statistical experiments.
Finite horizon
§ 1. The general θ -model and Bayesian G-model
§ 2. Main statistics (πt, ϕt, ψt(G))
§ 3. Variant A for the finite horizon
§ 4. Optimal stopping problem supτ PG(|τ − θ| ≤ h)
vi
7. TOPIC IV: QUICKEST DETECTION PROBLEMS:
Case of expensive cost of observations
§ 1. Introduction
§ 2. Approaches to solving problem V ∗(π) = inf (h,τ ) Eπ C(h, τ )
and finding an optimal strategy (τ , h)
vii
8. TOPIC V: Classical and recent results on the
stochastic differential equations
§ 1. Introduction: Kolmogorov and Itˆ’s papers
o
§ 2. Weak and strong solutions. 1:
Examples of existence and nonexistence
§ 3. The Tsirelson example
§ 4. Girsanov’s change of measures
§ 5. Criteria of uniform integrability for stochastic exponentials
§ 6. Weak and strong solutions. 2:
General remarks on the existence and uniqueness
§ 7. Weak and strong solutions. 3:
Sufficient conditions for existence and uniqueness
viii
9. TOPIC VI: Optimal Stopping (OS) problems.
Basic formulations, concepts,
and methods of solutions
§ 1. Standard and nonstandard optimal stopping problems
§ 2. OS-lecture 1: Introduction
§ 3. OS-lecture 2-3: Theory of optimal stopping for
discrete time (finite and infinite horizons)
A) Martingale approach
B) Markovian approach
§ 4. OS-lecture 4-5: Theory of optimal stopping for
continuous time (finite and infinite horizons)
A) Martingale approach
B) Markovian approach
ix
10. INTRODUCTION
Some ideas about the contents of the present lectures can be
obtained from the following considerations.
Suppose that we observe a random process X = (Xt ) on an interval
[0, T ], T ≤ ∞. The objects θ and τ which are introduced and
considered below are essential throughout the lectures:
θ is a parameter or a random variable; it is a hidden,
nonobservable characteristic – for example, it can be a time
when the observed process X = (Xt ) changes its character of
behavior or its characteristics;
τ is a stopping (Markov) time which serves as the time of
“alarm”; it warns of the coming of the time θ.
Introduction-1
11. The following problem will play an essential role in our lectures.
Suppose that the observed process has the form
σ dB , t < θ,
t
Xt = µ(t − θ)+ + σBt , or dXt =
µ dt + σ dBt , t ≥ θ,
where B = (Bt )t≥0 is a Brownian motion.
If θ is a random variable with the law G = G(t), t ≥ 0, then the
event {τ < θ} corresponds to the “false alarm” and the event {τ ≥ θ}
corresponds to the case that the alarm is raised in due time, i.e.,
after the time θ.
Introduction-2
12. VARIANT A of the quickest detection problem: To find
A(c) = inf PG(τ < θ) + cEG(τ − θ)+ ,
τ
where PG is a distribution with respect to G.
If θ is a parameter, θ ∈ [0, ∞], the following minimax problem D will
be investigated.
VARIANT D: To find
D(T ) = inf sup ess sup Eθ (τ − θ)+ | Fθ− (ω),
τ ∈MT θ≥1 ω
where MT = {τ : E∞τ ≥ T } and Pθ is the law when a change point
appears at time θ.
Introduction-3
13. Another typical problem where θ is a time at which the process
X = B, a Brownian motion, changes the character of its behavior,
are the following: To find:
inf E|Bτ − Bθ |p,
0≤τ ≤T
where θ is a maximum of the Brownian motion (Bθ = max0≤τ ≤T Bt );
or to find
inf E|τ − θ|;
0≤τ ≤T
or to find
Bτ
inf EF −1 ,
0≤τ ≤T Bθ
etc.
Introduction-4
14. We would like to underline the following features in our lectures:
• Finite horizon for θ and τ (usually θ ∈ [0, ∞] or θ ∈ {0, 1, . . . , ∞});
• Filtered probability-statistical experiment
(Ω, F , (Ft)t≥0, Pθ , θ ∈ Θ)
as a general model of the Quickest Detection and Optimal
Control (here (Ft )t≥0 is a flow of “information”, usually Ft =
X
Ft = σ(Xs , s ≤ t), where X is the observed process);
• Local time reformulations of the problems of testing the statistical
hypotheses (with details for 2 and 3 statistical hypotheses);
• Application to the Finance.
Introduction-5
15. TOPIC I: QUICKEST DETECTION PROBLEMS:
Discrete time. Infinite horizon
§ 1. W. Shewhart and E. Page’s works
1. Let us describe the (chronologically) first approaches to the
Quickest Detection (QD) problems initiated in the 1920-30s by
W. Shewhart who proposed – to control industrial products – the
so-called control charts (which are used till now).
The next step in this direction was made in the 1950s by E. Page
who invented the so-called CUSUM method, which became very
popular in the statistical practice.
None of these approaches was underlain by any deep stochastic
analysis.
I-1-1
16. In the late 1950s A. N. Kolmogorov and the author gave precise
mathematical formulation of two QD-problems.
The basic problem was a multistage problem of quickest detection
of the random target which appears in the steady stationary regime
under assumption that the mean time between false alarms is large.
The second problem was a Bayesian problem whose solution became
a crucial step in solving the first problem.
I-1-2
17. 2. W. Shewhart approach ∗ supposes that x1, x2, . . . are observations
on the random variables X1, X2, . . . and θ is an unknown parameter
(“hidden parameter”) which takes values in the set {1, 2, . . . , ∞}.
The case θ = ∞ is interpreted as a “normal” run of the inspected
industrial process. In this case X1, X2, . . . are i.i.d. random variables
with the density f ∞(x).
If θ = 0 or θ = 1, then X1, X2, . . . are again i.i.d. with density f 0(x).
If 1 < θ < ∞, then X1, . . . , Xθ−1 are i.i.d. with the density f ∞(x)
and Xθ , Xθ+1, . . . run with the density f 0(x): θ
f ∞(x) −→ f 0(x).
∗ W. A. Shewhart, “The application of statistics as an aid in maintaining qual-
ity of manufactured product”, J. Amer. Statist. Assoc., 138 (1925), 546–548.
W. A. Shewhart, Economic Control of Manufactured Product, Van Nostrand Rain-
hold, N.Y., 1931. (Republished in 1981 by the Amer. Soc. for Quality
Control, Milwaukee.) I-1-3
18. Alarm signal is a random stopping time τ = τ (x), x = (x1, x2, . . .)
such that τ (x) = inf{n ≥ 1 : xn ∈ D}, where D is some set in the
space of states x.
For f 0(x) ∼ N (µ0, σ), f ∞(x) ∼ N (µ∞, σ) Shewhart proposes to take
τ (x) = inf{n ≥ 1 : |xn − µ∞| ≥ 3σ}.
It is easy to find the probability of false alarm (on each step)
α ≡ P(µ∞,σ){|X1 − µ∞| ≥ 3σ} ≈ 0.0027.
For E(µ∞,σ)τ we find
∞
k−1 1
E(µ∞,σ)τ = kα(1 − α) = ≈ 370.
k=1
α
Similarly we can find the probability of the correct alarm β =
P(µ0,σ){|X1 − µ∞| ≥ 3σ} and E(µ0,σ)τ .
I-1-4
19. W. Shewhart did not formulated optimization problems.
A possible approach can be as follows:
Let MT = {τ : E∞τ ≥ T }, where T is a fixed constant.
∗
The stopping time τT is called a minimax if
∗ ∗
sup Eθ τT − θ | τT ≥ θ = inf sup Eθ (τ − θ | τ ≥ θ)
θ τ ∈MT θ
(here Pθ is the distribution on (R∞, B) generated by
X1, . . . , Xθ−1, Xθ , Xθ+1, . . .).
Another possible formulation is the following:
∗
θ is a random variable and τα,h is optimal if
∗
inf P (τ − θ)+ ≥ h = P (τα,h − θ)+ ≥ h
τ ∈M(α)
where M(α) = {τ : P(τ ≤ θ) ≤ α}. I-1-5
20. By Chebyshev’s equality, P (τ − θ)+ ≥ h ≤ 1 E(τ − θ)+ and
h
+ Eek(τ −θ)+
P (τ − θ)+ ≥ h = P(ek(τ −θ) ≥ ekh) ≤ , k > 0.
ekh
+
Eek(τ −θ)
So, P (τ − θ)+ ≥ h ≤ inf and we have the problems:
k>0 ekh
+ ∗−θ)+
inf E(τ − θ)+ = E(τ ∗ − θ)+, inf Eek(τ −θ) = Eek(τ .
τ ∈M(α) τ ∈M(α)
It is interesting to solve the problems
inf E|τ − θ|, inf Eek|τ −θ|
where inf is taken over the class M of all stoping times τ .
Solutions of these problems will be discussed later.
Now only note that for all Bayesian problems we need to
I-1-6
know the distributions of θ.
21. For the problem
∗
sup P(|τ − θ| ≥ h) = P(|τj − θ| ≤ h)
τ
with h = 0, i.e., for the problem
∗
sup P(τ = θ) = P(τ0 = θ),
τ
under the assumption that θ has the geometric distribution, the
∗
optimal time τ0 has the following simple structure:
0
∗ = inf n ≥ 1 : f (xn ) ∈ D ∗
τ0 0
f ∞(xn)
I-1-7
22. ◮ For Gaussian distributions f 0(x) ∼ N (µ0, σ) and f ∞(x) ∼
N (µ∞, σ) we find that
τ0 = inf{n ≥ 1 : xn ∈ A∗}.
∗
0
◮ If f 0(x) = 1 λ0e−λ0|x| and f ∞(x) = 1 λ∞e−λ∞|x|, then
2 2
∗ ∗
τ0 = inf{n ≥ 1 : xn ∈ B0 }.
Generally, if f 0 and f ∞ belong to the exponential family:
f a(x) = cag(x) exp{caϕa(x)}, where ca, ca, g(x) ≥ 0,
f 0(x) c
then ∞ = 0 exp c0ϕ0(x) − c∞ϕ∞(x) , thus
f (x) c∞
∗ ∗
τ0 = inf {n ≥ 1 : c0ϕ0(xn) − c∞ϕ∞(xn) ∈ C0 }.
I-1-8
23. 3. E. Page’s approach. ∗ Below in § 6 we will consider in details the
CUSUM method initiated by E. Page. Now we give only definition
of this procedure.
f 0 (xn )
The SHEWHART method is based on the statistics Sn = f ∞ (x ) ,
n
which for Gaussian densities f 0(x) ∼ N (µ0, 1) and f ∞(x) ∼ N (0, 1)
takes the form
n
µ0
Sn = exp{µ0(xn − 2 )} = exp{∆Zn}, where Zn = µ0(xk − µ0 ).
2
k=1
The CUSUM method is based on the statistics (see details in § 5)
n
γn = max exp µ0(xi − µ0 )
2 = max exp{Zn − Zn−k+1}.
1≤k≤n 1≤k≤n
i=n−k+1
∗ E.S.Page, “Continuous inspection schemes”, Biometrika, 41 (1954), 100–114.
E.S.Page, “Control charts with warning lines”, Biometrika, 42
(1955), 243–257. I-1-9
24. The CUSUM stopping time is τ ∗ = inf{n ≥ 1 : γn ≥ d}.
It is important to emphasize that to construct the CUSUM statistics γn
we must know the densities f ∞ (x) and f 0(x). Instead of Zn =
n
k=1 µ0(xk − µ0 ) we can use the following interesting statistics. Define
2
n
|x |
k
Zn = |xk |(xk − 2 ) and Tn = Zn − min Zk .
1≤k≤n
k=1
Note that
x2/2, x ≥ 0,
|xk |
|xk |(xk − 2 ) =
−3x2/2, x < 0.
So, for negative xk , k ≤ n, the statistics T is close to 0. But if xk
become positive, then the values Tn will increase. Thus, the statistics
Tn help us to discover the appearing of the positive values xk .
I-1-10
25. § 2. Definition of the θ -model and Bayesian G-model in the
quickest detection
1. We consider now the case of discrete time n = 0, 1, . . . and assume
that (Ω, F , (F )n≥0, P0, P∞) is the binary statistical experiments. The
measure P∞ corresponds to the situation θ = ∞, the measure P0
corresponds to the situation θ = 0.
Assume first that Ω = R∞ = {x : (x1, x2, . . .), xi ∈ R}.
Let fn (x1, . . . , xn) and fn (x1, . . . , xn) be the densities of P0 and P∞
0 ∞
n n
1
(w.r.t. the measure Pn = 2 (P0 + P∞)).
n n
0 ∞
We denote by fn (xn | x1, . . . , xn−1) and fn (xn | x1, . . . , xn−1) the cor-
responding conditional densities.
θ
How to define the conditional density fn (xn | x1, . . . , xn−1) for the
value 0 < θ < ∞?
I-2-1
26. The meaning of the value θ as a change-point (disorder, disruption,
in Russian “razladka”) suggests that it is reasonable to define
f ∞ (x | x , . . . , x
θ (x | x , . . . , x n n 1 n−1), n < θ,
fn n 1 n−1 )=
f 0 (xn | x , . . . , x n ≥ θ,
n 1 n−1),
θ ∞
fn (xn | x1, . . . , xn−1) = I(n < θ)fn (xn | x1, . . . , xn−1)
or 0
(∗)
+ I(n ≥ θ)fn (xn | x1, . . . , xn−1)
Since it should be
θ θ θ
fn (x1, . . . , xn−1) = fn−1(x1, . . . , xn−1)fn (xn | x1, . . . , xn−1),
we find that
θ ∞
fn (x1, . . . , xn−1) = I(n < θ)fn (x1, . . . , xn−1)
0
fn (x1, . . . , xn−1) (∗∗)
∞
+ I(n ≥ θ)fθ−1(x1, . . . , xθ−1) 0
fθ−1(x1, . . . , xθ−1)
I-2-2
27. The formula (∗∗) can be taken as a definition of the density
θ
fn (x1, . . . , xn−1).
It should be emphasized that, vice versa, from (∗∗) we obtain (∗).
Note that from (∗) we get the formula
θ ∞
fn (xn | x1, . . . , xn−1) − 1 = I(n < θ)[fn (xn | x1, . . . , xn−1) − 1]
0
+ I(n ≥ θ)[fn (xn | x1, . . . , xθ−1) − 1]
which explains the following general definition of the measures Pθ
n
which is based on some martingale reasoning.
I-2-3
28. 2. General stochastic θ -model. The previous considerations show
how to define measures Pθ for case of general binary filtered statistical
experiment (Ω, F , (Fn )n≥0, P0, P∞), F0 = {∅, Ω}.
Introduce the notation:
1 0 1 0
P = (P + P ), Pn = P |Fn, Pn = P |Fn, Pn = (Pn + P∞),
∞ 0 0 ∞ ∞
n
2 2
dP0 dP∞ dP0
n dP∞n
L0 = , L∞ = , L0 =
n , L∞ =
n
dP dP dPn dPn
dQ
( is the Radon–Nikod´m derivative ).
y
dP
I-2-4
29. Since for A ∈ Fn
0 0 0 dP0
n
E(L | Fn) dP = L dP = P (A) = P0 (A) =
n dPn = L0 dP,
n
A A A dPn A
we have the martingale property
L0 = E(L0 | Fn)
n and similarly L∞ = E(L∞ | Fn).
n
Note that P(L0 = 0) = P(L∞ = 0) = 0.
n n
Associate with the martingales L0 = (L0 )n≥0 and L∞ = (L∞)n≥0
n n
their stochastic logarithms
n ∆L0 n ∆L∞
0
Mn = k I(L0 > 0), ∞
Mn = k I(L∞ > 0),
k−1 k−1
L0
k=1 k−1
L∞
k=1 k−1
where ∆L0 = Lk − Lk−1 and ∆L∞ = L∞ − L∞ .
k
0 0
k k k−1
I-2-5
30. 0 ∞
The processes (Mn , Fn , P)n≥0, (Mn , Fn, P)n≥0 are P-local martingales
and
∆L0 = L0 ∆Mn ,
n n−1
0 ∆L∞ = L∞ ∆Mn .
n n−1
∞
In case of the coordinate space Ω = R∞, we find that (P-a.s.)
∆L0n L0 0
fn (x1, . . . , xn)
0
∆Mn = 0 = 0n − 1 = 0
−1
Ln−1 Ln−1 fn−1(x1, . . . , xn)
0
= fn (xn | x1, . . . , xn−1) − 1
and similarly
∞ ∞
∆Mn = fn (xn | x1, . . . , xn−1) − 1. (•)
I-2-6
31. θ
Above we defined fn (xn | x1, . . . , xn−1) as
θ ∞
fn (xn | x1, . . . , xn−1) = I(n < θ)fn (xn | x1, . . . , xn−1)
0
+ I(n ≥ θ)fn (xn | x1, . . . , xn−1).
Thus, if we take into account (•), then for general case it is
θ
reasonable to define ∆Mn as
θ ∞ 0
∆Mn = I(n < θ)∆Mn + I(n ≥ θ)∆Mn . (••)
We have
L0 = E(M 0)n,
n L∞ = E(M ∞)n,
n
where E is the stochastic exponential:
n
E(M )n = (1 + ∆Mk ) (• • •)
k=1
θ
with ∆Mk = Mk − Mk−1. Thus it is reasonable to define Ln by
Lθ = E(M θ )n.
n
I-2-7
32. From formulae (••) and (• • •) it follows that
E(M θ )n = E(M ∞)n, n < θ,
θ ∞ E(M 0)n
E(M )n = E(M )θ−1 0)
, 1 ≤ θ ≤ n.
E(M θ−1
So, for Lθ we find that (P-a.s.)
n
L0n
Lθ = I(n < θ)L∞ + I(n ≥ θ)L∞ ·
n n θ−1 0
,
Lθ−1
∞
θ ∞ 0 Lθ−1
or Ln = I(n < θ)Ln + I(n ≥ θ)Ln · 0 ,
Lθ−1
L0
or Ln = L(θ−1)∧n · 0 n
θ ∞
.
L(θ−1)∧n
L∞ ,
n θ>n
So, we have Lθ =
n L0n
L∞ ·
θ−1 , θ≤n
0
Lθ−1 I-2-8
33. Define now for A ∈ Fn
Pθ (A) = E[I(A)E(M θ )n ],
n or Pθ (A) = E[I(A)Ln ].
n
θ
The family of measures {Pθ }n≥1 is consistent and we can expect
n
that there exists a measure Pθ on F∞ = Fn such that
Pθ |Fn = Pθ .
n
Without special assumptions on Lθ , n ≥ 1, we cannot guarantee
n
existence of such a measure ∗. It will be so, if, for example, the
martingale (Lθ )n≥0 is uniformly integrable. In this case there exists
n
an F∞-measurable random variable Lθ such that
Lθ = E(Lθ | Fn )
n and Pθ (A) = E[I(A)Lθ ].
∗ See, e.g., the corresponding example in: A.N.Shiryaev, Probability,
Chapter II, § 3.
I-2-9
34. Another way to construct the measure Pθ is based on the famous
Kolmogorov theorem on the extension of measures on (R∞, B(R∞)).
This theorem states that
if Pθ , Pθ , . . . is a sequence of probability measures
1 2 on
(R ∞ , B(R∞ )) which have the consistency property
Pθ (B × R) = Pn(B),
n+1 B ∈ B(Rn ),
then there is a unique probability measure Pθ on (R∞, B(R∞))
such that
Pθ (Jn(B)) = Pn (B)
for B ∈ B(Rn ), where Jn(B) is the cylinder in R∞ with base
B ∈ B(Rn ).
I-2-10
35. Note that, for the case of continuous time, the measures Pθ based
on the measures Pθ , t ≥ 0, can be constructed in a similar way.
t
The measures Pθ constructed for all 0 ≤ θ ≤ ∞ from the measures P0
and P∞ have the following characteristic property of the filtered
model:
Pθ (A) = P∞(A), if n < θ and A ∈ Fn .
The constructed filtered statistical (or probability-statistical) expe-
riment
(Ω, F , (Fn )n≥0; Pθ , 0 ≤ θ ≤ ∞)
will be called a θ -model constructed via measures P0 (“change-
point”, “disorder” time θ equals 0) and P∞ (“change-point”, “disorder”
time θ equals ∞).
I-2-11
36. 3. General stochastic G-models on filtered spaces. Let
(Ω, F , (Fn )n≥0; Pθ , 0 ≤ θ ≤ ∞)
be the θ-model. Now we shall consider θ as a random variable (given
on some probability space (Ω′ , F ′, P′ )) with the distribution function
G = G(h), h ≥ 0. Define
Ω = Ω × Ω′, F = F∞ ⊗ F ′
and put for A ∈ F∞ and B ′ ∈ F ′
PG(A × B ′) = Pθ (A)∆G(θ),
θ∈B ′
where ∆G(θ) = G(θ) − G(θ − 1), ∆G(0) = G(0).
The extension of this function of sets A × B ′ onto F = F∞ ⊗ F ′ will
be denoted by PG .
I-2-12
37. It is clear that for PG(A) = PG (A × N ′) with A ∈ Fn, where N ′ =
{0, 1, . . . , ∞}, we get
n
G
P (A) = Pθ (A)∆G(θ) + (1 − G(n))P∞(A),
n n
θ=0
where we have used that Pθ (A) = P∞(A) for A ∈ Fn and θ > n.
n n
Denote
dPG
n
PG =
n
G
P |Fn and G
Ln = .
dPn
Then we see that
∞
LG =
n Lθ ∆G(θ).
n
θ=0
I-2-13
38. Taking into account that
L∞ ,
n θ>n
Lθ =
n L0n with L0 = L∞ = 1,
−1 −1
L∞ ·
θ−1 , θ ≤ n,
L0
θ−1
we find the following representation:
n
L0
LG =
n Lθ−1 0 n ∆G(θ) + L∞(1 − G(n)),
∞
n
θ=0 Lθ−1
where L∞ = L0 = 1.
−1 −1
I-2-14
39. EXAMPLE. Geometrical distribution:
G(0) = π, ∆G(n) = (1 − π)q n−1p, n ≥ 1.
Here
n−1 ∞
k Lk
LG = πL0 + (1 − π)L0
n n n pq + (1 − π)q n L∞.
n
k=0 L0
k
∞ ∞
If fn = fn (x1, . . . , xn) and fn = fn (x1, . . . , xn) are densities of P0
0 0
n
and P∞ w.r.t. the Lebesgue measure, then we find
n
G 0
fn (x1, . . . , xn) = πfn (x1, . . . , xn)
n−1 ∞
0 k fk (x1 , . . . , xk )
+ (1 − π)fn (x1, . . . , xn) pq 0
k=0 fk (x1, . . . , xk )
∞
+ (1 − π)q n fn (x1, . . . , xn).
I-2-15
41. § 3. Four basic formulations (VARIANTS A, B, C, D)
of the quickest detection problems for the general
θ- and G-models
1. VARIANT A. We assume that G-model is given and
Mα = {τ : PG(τ < θ) ≤ α}, where α ∈ (0, 1),
M is the class of all finite stoping times.
• Conditionally extremal formulation:
∗
To find an optimal stopping time τα ∈ Mα for which
∗
EG(τα − θ)+ = inf EG(τ − θ)+ .
τ ∈Mα
• Bayesian formulation: To find an optimal stopping time
∗
τ(c) ∈ M for which
∗ ∗
PG(τ(c) < θ) + c EG(τ(c) − θ)+ = inf P(τ < θ) + c E(τ − θ)+ .
τ ∈M
I-3-1
42. 2. VARIANT B (Generalized Bayesian formulation). We assume
that θ-model is given an:
MT = {τ ∈ M : E∞τ ≥ T } [the class of stopping times τ for which the
mean time E∞ τ of τ , under assumption that
there was no change point (disorder) at all,
equals a given a priori constant T > 0].
The problem is to find the value
B(T ) = inf Eθ (τ − θ)+
τ ∈MT
θ≥1
∗
and the optimal stopping time τT for which
Eθ (τ − θ)+ = B(T ) .
θ≥1
I-3-2
43. 3. VARIANT C (the first minimax formulation).
The problem is to find the value
C(T ) = inf sup Eθ (τ − θ | τ ≥ θ)
τ ∈MT θ≥1
and the optimal stopping time τT for which
sup Eθ (τT − θ | τT ≥ θ) = C(T )
θ≥1
I-3-3
44. 4. VARIANT D (the second minimax formulation).
The problem is to find the value
D(T ) = inf sup ess sup Eθ ((τ − θ)+ | Fθ−1)
τ ∈MT θ≥1 ω
and the optimal stopping time τ T for which
sup ess sup Eθ ((τ T − θ)+ | Fθ−1)(ω) = D(T )
θ≥1 ω
Essential supremum w.r.t. the measure P of the nonnegative
function f (ω) (notation: ess sup f , or f ∞ , or vraisup f ) is defined
as follows:
ess sup f (ω) = inf{0 ≤ c ≤ ∞ : P(|f | > c) = 0}.
ω
I-3-4
45. 5. There are many works, where, instead of the described penalty
functions, the following functions are investigated:
W (τ ), τ < θ,
1
W (θ, τ ) = ,
W (τ − θ), τ ≥ θ,
2
W (θ, τ ) = W1((τ − θ)+) + W2((τ − θ)+ ),
in particular, W (θ, τ ) = E|τ − θ|
W (θ, τ ) = P(|τ − θ| ≥ h), etc.
I-3-5
46. § 4. The reduction of VARIANTS A and B to the
standard form. Discrete-time case
1. Denote
A1(c) = inf PG(τ < θ) + cEG(τ − θ)+ ,
τ ∈M
A2(c) = inf PG(τ < θ) + cEG(τ − θ + 1)+ .
τ ∈M
THEOREM 1. Let πn = PG(θ ≤ n | Fn ). Then
τ −1
G
A1(c) = inf E (1 − πτ ) + c πk ,
τ ∈M
k=0
τ
A2(c) = inf EG (1 − πτ ) + c πk .
τ ∈M
k=0
I-4-1
47. PROOF follows from the formulae PG(τ < θ) = EG(1 − πτ ) and
τ −1 τ
+ +
(τ − θ) = I(θ ≤ k), (τ − θ + 1) = I(θ ≤ k). (•)
k=0 k=0
This follows from the τ −1
property ξ + = k≥1 I(ξ≥k): G + G
E (τ − θ) = E I(θ ≤ k)
(τ − θ)+ = I(τ − θ ≥ k) ∞ k=0
k≥1 = EG I(k ≤ τ − 1)I(θ ≤ k)
k=0
= I(θ ≤ τ − k) ∞
k≥1
= EG EG I(k ≤ τ − 1)I(θ ≤ k) | Fk
τ −1
k=0
= I(θ ≤ l) ∞
l=0 = EG EG I(τ ≥ k + 1)I(θ ≤ k) | Fk
k=0
∞ τ −1
G G G
=E I(k ≤ τ − 1)E I(θ ≤ k) | Fk = E πk .
k=0 k=0
Representations (•) imply
τ τ −1
EG(τ −θ+1)+ = EG πk , EG(τ − θ)+ = EG πk I-4-2
k=0 k=0
49. THEOREM 3. Let G = G(n), n ≥ 0, be the geometrical
distribution:
∆G(n) = pq n−1, n ≥ 1, G(0) = 0.
Then for A3 = A3(p) we have
A3(p) = 1 A1(p),
p
i.e.,
1
inf EG|τ − θ| = inf PG(τ < θ) + pEG(τ − θ)+ .
τ p τ
I-4-4
50. 2. Consider now the criterion
W (τ ), τ < θ,
G 1
A(W ) = inf E W (θ, τ ) with W (θ, τ ) =
τ W ((τ −θ)+ ), τ ≥ θ,
2
n
where W2(n) = f (k), W2(0) = 0. Then
k=1
τ −1
Lτ ∆G(k)
EGW (θ, τ ) = EG (1 − πτ ) W1(τ ) + W2(τ − k) .
1 − G(τ ) k=0 Lk−1
For example, for W1(n) ≡ 1, W2(n) = cn2 we get
τ −1
Lτ
EGW (θ, τ ) = EG (1 − πτ ) 1 + (τ − k)2 .
k=0
Lk−1
I-4-5
51. 3. In Variant B: B(T ) = inf Eθ (τ − θ)+.
τ ∈MT
θ≥1
THEOREM 4. For any finite stopping time τ we have
∞ τ −1
θ + ∞
E (τ − θ) =E ψn , (∗)
θ=1 n=1
n ∞
Ln L0
n
0
0 = dPn , ∞ = dPn .
where ψn = , Ln = ∞ , Ln Ln
L
θ=1 θ−1
Ln dPn dPn
Therefore,
τ −1
BT = inf E∞ ψn , where MT = {τ ∈ M : E∞τ ≥ T }.
τ ∈MT
k=1
Some generalization:
BF (T ) = inf Eθ F ((τ − θ)+), where F (n) = n f (k),
k=1
τ ∈MT
θ≥1 F (0) = 0,
I-4-6
f (k) ≥ 0
52. ∞
PROOF of (∗). Since (τ − θ)+ = I(τ − θ ≥ k) = I(τ ≥ k)
k=1 k≥θ+1
and Pk (A) = P∞(A) for A ≡ {τ ≥ k} ∈ Fk−1, we find that
Eθ (τ − θ)+ = Eθ I(τ ≥ k)
k≥θ+1
k d(Pθ |Fk−1)
= E I(τ ≥ k)
k≥θ+1
d(Pk |Fk−1)
Lθ
k−1
= Ek I(τ ≥ k)
k≥θ+1
L∞
k−1
Lθ
k−1
= Ek I(τ ≥ k)
k≥θ+1
L∞
k−1
L0 L∞
k−1 θ−1
= Ek I(τ ≥ k)
k≥θ+1 L0 L∞
θ−1 k−1
Lk−1
= Ek I(τ ≥ k) . I-4-7
k≥θ+1
Lθ−1
53. ∞ ∞ ∞
L
Therefore, Eθ (τ − θ)+ = E∞ I(τ ≥ k) k−1
θ=1 θ=1 k=θ+1
Lθ−1
∞ ∞
L
= E∞ I(θ + 1 ≤ k ≤ τ ) k−1
θ=1 k=2
Lθ−1
∞ k−1
Lk−1
= E∞
L
k=2 θ=1 θ−1
τ τ −1
= E∞ ψk = E ∞ ψk .
k=2 k=1
We find here that
τ −1
BF (T ) = inf E∞ Ψn(f ),
τ ∈MT
n=0
where
n
Ln
Ψn(f ) = f (n + 1 − θ) .
θ=0
Lθ−1 I-4-8
54. § 5. VARIANT C and D for the case of discrete time:
lower estimates for the risk functions
1. In Variant C the risk function is
C(T ) = inf sup Eθ (τ − θ | τ ≥ θ).
τ ∈MT θ≥1
THEOREM 5. For any stopping time τ with E∞τ < ∞,
τ −1 n
1 Ln
sup Eθ (τ − θ | τ ≥ θ) ≥ E∞ ψn where ψn = .
θ≥1 E∞ τ n=1 θ=1
Lθ−1
Thus, in the class MT = {τ ∈ MT : E∞τ = T },
θ 1
C(T ) = inf sup E (τ − θ | τ ≥ θ) ≥ B(T ),
τ ∈MT θ≥1 T
τ −1
where B(T ) = inf Eθ (τ − θ)+ = inf E∞ ψk .
I-5-1
τ ∈MT θ≥1 τ ∈MT k=1
56. 2. Now we consider Variant D, where
D(T ) = inf sup ess sup Eθ (τ − (θ − 1))+ | Fθ−1 .
τ ∈MT θ≥1 ω
THEOREM 6. For any stopping time τ with E∞τ < ∞,
τ −1
θ + E∞ k=0 γk
sup ess sup E (τ − (θ − 1)) | Fθ−1 ≥ τ −1
, (∗)
θ≥1 ω E∞ k=0(1 − γk )
+
where (γk )k≥0 is the CUSUM-statistics:
Lθ
n θ dPθ
n ∞ dP∞
n
γk = max ∞ (Ln = , Ln = ).
1≤θ≤n Ln dPn dPn
Thus,
τ −1
E∞ k=0 γk
D(T ) ≥ inf τ −1
.
τ ∈MT E∞ +
k=0(1 − γk )
I-5-3
57. Recall, first of all, some useful facts about CUSUM-statistics (γn )n≥0:
Ln Ln
γn = max = max ;
1≤θ≤n Lθ−1 1≤k≤n Ln−k
n
Ln Ln
γn = max(1, γn−1) = (1 − γθ−1)+ ⇒
Ln−1 θ=1
Lθ−1
n
Ln−1
= (1−γθ−1 )+
Lθ−1
θ=1
by induction
Ln Ln 1, γn−1 < 1,
⇒ γn = (1 − γn−1 )+ + γn−1 =
Ln−1 Ln−1 γn−1, γn−1 ≥ 1
n Ln
Ln
(cf. ψn = = [1 + ψn−1]).
L
θ=1 n−1 Ln−1
I-5-4
58. PROOF of the basic inequality (∗):
τ −1
E∞ k=0 γk
sup ess sup Eθ (τ − (θ − 1))+ | Fθ−1 ≥ τ −1
. (∗)
θ≥1 ω E∞ k=0(1 − γk )
+
For τ ∈ MT
def
dθ (τ ) = Eθ (τ − (θ − 1))+ | Fθ−1 (ω)
= I(τ − (θ−1) ≥ k) = I(τ ≥ k + (θ−1)) = I(τ ≥ k),
k≥1 k≥1 k≥θ
Lk−1
= Eθ [I(τ ≥k) | Fθ−1] = Eθ I(τ ≥k) Fθ−1 (P∞-a.s.).
k≥θ k≥θ θ−1
Here we used the fact that if r.v. ξ is ≥ 0 and Fk−1 -measurable, then
Lk−1 L0
θ ∞
E (ξ | Fθ−1 ) = E ξ Fθ−1 , Ln = ∞ .n
Lθ−1 Ln
I-5-5
59. Denote d(τ ) = supθ≥1 ess supω dθ (τ ). For each τ and each θ ≥ 1
d(τ ) ≥ dθ (τ ) (Pθ -a.s.)
and for any nonnegative Fθ -measurable function f = fθ (ω) (all r.v.’s
here are Fθ -measurable)
fθ−1I(τ ≥ θ)d(τ ) ≥ fθ−1I(τ ≥ θ)dθ (τ ) (Pθ -a.s. and P∞-a.s. ) (∗∗)
since Pθ (A) = P∞ (A)
for every A ∈ Fθ−1
Taking into account (∗∗), we get by definition of the θ-model
E∞ fθ−1I(τ ≥θ) d(τ ) = E∞ fθ−1I(τ ≥θ)d(τ ) ≥ E∞ fθ−1I(τ ≥θ)dθ (τ )
L
= E∞ fθ−1I(τ ≥ θ) E∞ I(τ ≥ k) k−1 Fθ−1
k≥θ
Lθ−1
L
= E∞ I(τ ≥ θ) fθ−1 k−1 I(τ ≥ θ)
k≥θ
Lθ−1
τ
Lk−1 I-5-6
= E∞ I(τ ≥ θ) fθ−1 .
k=θ
Lθ−1
60. Taking summation over θ we find that
∞ τ τ
∞ ∞ Lk−1
d(τ ) E fθ−1I(τ ≥ θ) ≥ E fθ−1
θ=1 θ=1 k=θ
Lθ−1
τ τ
∞ Lk−1
=E fθ−1 .
k=1 θ=1
Lθ−1
L
E∞ τ τ
k=1 θ=1 fθ−1 Lk−1
θ−1
From this we get d(τ ) ≥ ∞ τ
.
E θ=1 fθ−1
Take fθ = (1 − γθ )+ . Then
k
+ Lk−1 Lk−1 k + Lk Lk−1
(1 − γθ−1) = (1 − γθ−1) = γk .
θ=1
Lθ−1 Lk θ=1 Lθ−1 Lk
Since γk = LLk max{1, γk−1}, we have
k−1
τ τ τ τ −1
Lk−1
E∞ fθ−1 = max{1, γk−1} = max{1, γk }.
k=1 θ=1
Lθ−1 k=1 k=1 I-5-7
Thus, inequality (∗) of Theorem 6 is proved.
61. § 6. Recurrent equations for statistics πn, ϕn, ψn, γn, n ≥ 0
1. We know from § 4 that in Variant A the value
A1(c) = inf PG(τ < θ) + cEG(τ − θ)+
τ ∈M
can be represented in the form
τ −1
A1(c) = inf EG (1 − πτ ) + c πk ,
τ ∈M
k=0
where πn = PG (θ ≤ n | Fn), n ≥ 1, π0 = π ≡ G(0) . (If we have
X
observations X0, X1, . . ., then Fn = Fn = σ(X0, . . . , Xn ).) Using the
Bayes formula, we find
θ dPθ dPG
θ≤n Ln ∆G(θ) θ n G n
πn = G
, where Ln = , Ln = .
Ln dPn dPn
I-6-1
62. Introduce ϕn = πn/(1 − πn). For the statistics ϕn one find that
L∞
θ L0
n
θ−1
θ≤n L0 ∆G(θ)
θ≤n Ln ∆G(θ) θ−1
ϕn = θ ∆G(θ)
= ∞
.
θ>n Ln (1 − G(n))Ln
Since πn = ϕn/(1 + ϕn), we get
Ln ∆G(θ)
θ≤n Lθ−1
πn = L ,
n ∆G(θ) + (1 − G(n))
L θ−1
and therefore
Ln (1 − π ) ∆G(n) + π
Ln−1 n 1−G(n−1) n+1
πn = ,
Ln (1 − π ) ∆G(n) + π 1−G(n)
Ln−1 n 1−G(n−1) n+1 + (1 − πn−1) 1−G(n−1)
1−G(n)
(1 − πn−1) 1−G(n−1)
1 − πn = .
Ln (1 − π ) ∆G(n) + π 1−G(n)
Ln−1 n 1−G(n−1) n+1 + (1 − πn−1) 1−G(n−1)
I-6-2
64. EXAMPLE. If G is geometrical distribution: ∆G(0) = G(0) = π,
∆G(n) = (1 − π)q n−1p, then
∆G(n) p ∆G(n − 1) 1 Ln
= , = , and ϕn = (p + ϕn−1).
1 − G(n) q 1 − G(n) q qLn−1
ϕn Ln
For ψn(p) := , p > 0: ψn(p) = (1 + ψn−1(p));
p qLn−1
Ln
For ψn := lim ψn(p): ψn = (1 + ψn−1) .
p↓0 Ln−1
n
Ln
If ϕ0 = 0, then ψ0 = 0 and ψn = .
θ=1
Ln−1
I-6-4
65. We see that ψn = ψn (= n Ln ), where the statistics has
θ=1 Lθ−1
appeared in Variant B:
∞ τ −1
Eθ (τ − θ)+ = E∞ ψn .
θ=1 n=1
So, we conclude that statistics ψn (in Variant B) can be obtained
from the statistic ϕn(p) (which appeared in Variant A).
I-6-5
66. 2. Consider the term Ln/Ln−1 in the above formulae. Let σ-algebras
Fn be generated by the independent (w.r.t. both P0 and P∞)
observations x0, x1, . . . with densities f 0(x) and f ∞(x) for xn, n ≥ 1.
Then
Ln f 0(xn)
= ∞ .
Ln−1 f (xn)
So, in this case
f 0(xn) ∆G(n) 1 − G(n − 1)
ϕn = ∞ + ϕn−1 .
f (xn) 1 − G(n) 1 − G(n)
I-6-6
67. If x0, . . . , xn has the densities f 0(x0, . . . , xn) and f ∞(x0, . . . , xn), then
Ln f 0(xn|x0, . . . , xn)
= ∞
Ln−1 f (xn|x0, . . . , xn)
and
f 0(xn|x0, . . . , xn−1) ∆G(n) 1 − G(n − 1)
ϕn = ∞ + ϕn−1 .
f (xn|x0, . . . , xn−1) 1 − G(n) 1 − G(n)
In the case of Markov observations
f 0(xn|xn−1) ∆G(n) 1 − G(n − 1)
ϕn = ∞ + ϕn−1 .
f (xn|xn−1) 1 − G(n) 1 − G(n)
I-6-7
68. From the above representations we see that
◮ in the case of independent observations x0, x1, . . . (w.r.t.
P0 and P∞) the statistics ϕn and πn form a Markov
sequences (w.r.t. PG);
◮ in the case of Markov sequences x0, x1, . . . (w.r.t. P0
and P∞) the PAIRS (ϕn, xn) and (πn, xn) form Markov
sequences (w.r.t. PG).
I-6-8
69. § 7. VARIANTS A and B for the case of discrete case:
Solving the optimal stopping problem
1. We know that A1(c) = inf PG(τ < θ) + cEG(τ − θ)+ can be
τ ∈M
represented in the form
τ −1
A1(c) = inf Eπ (1 − πτ ) + c πn ,
τ ∈M
n=0
here Eπ is the expectation EG under assumption G(0) = π ( ∈ [0, 1]).
τ −1
Denote V ∗(π) = inf Eπ (1 − πτ ) + c πn , for fixed c > 0,
τ ∈M
n=0
T g(π) = Eπ g(π1) for any nonnegative (or bounded)
function g = g(π), π ∈ [0, 1],
Qg(π) = min{g(π), cπ + T g(π)}.
I-7-1
70. We assume that in our G-model
G(0) = π, ∆G(n) = (1 − π)q n−1p, 0 ≤ π < 1, 0 < p < 1.
In this case for ϕn = πn/(1 − πn) we have
Ln
ϕn = (p + ϕn−1).
qLn−1
In case of the P0- and P∞-i.i.d. observations x1, x2, . . . with the
densities f 0(x) and f ∞(x) we have
f 0(xn)
ϕn = ∞ (p + ϕn−1).
qf (xn)
From here it follows that (ϕn) is an homogeneous Markov sequence
(w.r.t. PG ). Since πn = ϕn/(1 + ϕn), we see that (πn ) is also PG-
Markov sequence. So, to solve the optimal stopping problem V ∗(π)
one can use the General Markovian Optimal Stopping Theory.
I-7-2
71. From this theory it follows that
a) V ∗(π) = lim Qng(π), where g(π) = 1 − π, Qg(π) = min(g(π), cπ + T g(π)),
b) optimal stopping time has the form τ ∗ = inf{n≥0 : V ∗(π) = 1−π}.
Note that τ ∗ < ∞ (Pπ -a.s.) and V ∗(π) is a concave function. So,
τ ∗ = inf{n : πn ≥ π ∗} where π ∗ is a (unique) root of the
equation V ∗(π) = 1 − π.
We have
V ∗(π) = lim Qn(1−π) with Q(1−π) = min (1−π), cπ+ Eπ (1−π1) .
n
=(1−π)(1−p)
Since V ∗(π) = limn Qn (1 − π) ≤ Q(1 − π) ≤ 1 − π, we find that
Q(1 − π ∗) ≤ 1 − π ∗, or min (1−π ∗), cπ ∗+(1−π ∗)(1−p) ≤ 1 − π ∗.
From here we obtain p
π∗ ≥ . I-7-3
the LOWER ESTIMATE for π ∗: c+p
72. In Topics II and III we shall consider Variant A for the continuous
(diffusion) case. In this case for π ∗ we shall obtain the explicit
formula for π ∗.
2. In Variant B
B(T ) = inf Eθ (τ − θ)+, where MT = {τ : E∞τ ≥T }, T > 0.
τ ∈MT
θ≥1
τ −1 n
Ln
We know that Eθ (τ − θ)+ = E ∞
ψn, where ψn = .
n=1 θ=1
Lθ−1
θ≥1
From here
Ln
ψn = (1 + ψn−1), ψ−1 = 0.
Ln−1
f 0 (xn)
= in the case of (P0- and P∞ -) i.i.d.
f ∞ (x n) observations I-7-4
73. For the more general case
n
BF (T ) = inf Eθ F (τ − θ)+ with F (n) = f (k),
τ ∈MT
θ≥1 k=1
F (0) = 0, f (k) ≥ 0
we find that
τ −1
Eθ F (τ − θ)+ = E∞ Ψn(f ),
θ≥1 n=0
where
n
Ln
Ψn(f ) = f (n + 1 − θ) .
θ=0
Lθ−1
I-7-5
74. M M
λm t (m,0)
◮ If f (t) = cm0e , then Ψn(f ) = c00ψn + cm0ψn
m=0 m=1
ψ = Ln (1 + ψ
n
n−1), ψ−1 = 0,
Ln−1
with
(m,0)
Ln (m,0) (m,0)
ψn
= eλm (1 + ψn−1 ), ψ−1 = 0.
Ln−1
K K
(0,k)
◮ If f (t) = c0k tk , then Ψn(f ) = c00ψn + c0k ψn
k=0 k=1
ψ = Ln (1 + ψ
n
n−1), ψ−1 = 0,
Ln−1
with k
(0,k)
Ln (0,i) (0,k)
ψ
n
= 1+ ci ψn−1 ,
n ψ−1 = 0.
Ln−1 i=0
I-7-6
75. M K
◮ For the general case f (t) = cmk eλm ttk , λ0 = 0, we have
m=0 k=0
M K
(m,k)
Ψn(f ) = cmk ψn
m=0 k=0
n
with
(m,k)
ψn = eλm (n+1−θ)(n + 1 − θ)k Ln .
θ=0
Lθ−1
(m,k)
The statistics ψn satisfy the system
K 0 ≤ m ≤ M,
(m,k) Ln (m,i)
ψn = eλm ci ψn−1 + 1 ,
k
Ln−1 i=0 0 ≤ k ≤ K.
I-7-7
76. EXAMPLE 1. If F (n) = n, then
Ln
f (n) ≡ 1, Ψn(f ) = ψn, where ψn = (1 + ψn), ψ−1 = 0.
Ln−1
EXAMPLE 2. If F (n) = n2 + n, then
f (n) = 2n.
In this case
(0,1) (0,1) Ln (0,1)
Ψn(f ) = 2ψn with ψn = (1 + ψn−1 + ψn−1 ).
Ln−1
Thus
Ψn(f ) = Ψn−1(f ) + 2ψn.
I-7-8
77. 3. We know that
τ −1 n
Ln
B(T ) = inf Eθ (τ − θ)+ = inf E∞ Ψn , Ψn = .
τ ∈MT
θ≥1
τ ∈MT
n=1 L
θ=1 θ−1
τ −1
By the Lagrange method, to find inf E∞ ψn we need to solve
τ ∈MT
the problem n=1
τ −1
∞
inf E (ψn + c) (c is a Lagrange multiplier). (∗)
τ ∈M
n=1
For i.i.d. case and geometric distribution the statistics (ψn) form a
homogeneous Markov chain.
By the Markovian optimal stopping theory, the optimal stopping
time τ ∗ = τ ∗(c) for the problem (∗) has the form
τ ∗(c) = inf{n : ψn ≥ b∗(c)} I-7-9
78. Suppose that for given T > 0 we can find c = c(T ) such that
E∞τ ∗(c(T )) = T.
Then the stopping time
τ ∗(c(T )) = inf{n : ψn ≥ b∗(c(T ))}
will be optimal stopping time in the problem
B(T ) = inf E∞(τ − θ)+
τ ∈MT
θ≥1
I-7-10
79. § 8. VARIANTS C and D: around the optimal stopping times
1. In Variant C the risk function is
C(T ) = inf sup Eθ (τ − θ | τ ≥ θ)
τ ∈MT θ≥1
τ −1 n
1 ∞ Ln
≥ ∞ E ψn, where ψn = (§ 6, Thm. 5)
E τ n=1 θ=1
Lθ−1
∞
= Eθ (τ − θ)+ (Thm. 4)
θ=1
So, in the class MT = {τ ∈ MT : E ∞τ = T } we have C(T ) ≥ 1 B(T ) .
T
For any stopping time τ 0∈M
T
1
sup Eθ (τ − θ | τ ≥ θ) ≥ C(T ) ≥ B(T ).
θ≥1 T
Thus, if we take a “good” time τ 0 and find B(T ), then we can obtain
a “good” estimate for C(T ). In Topic II we consider this I-8-1
procedure for the case of continuous time (Brownian model).
80. 2. Finally, in § 5 (Theorem 6) it was demonstrated that for any
stopping time τ with E∞τ < ∞
τ −1
∞
E γk
k=0
sup ess sup Eθ (τ − (θ − 1))+ | Fθ−1 ≥ .
θ≥1 ω τ −1
E∞ (1 − γk )+
k=0
So
τ −1 τ −1
E∞ γk inf E∞ γk
τ ∈MT
k=0 k=0
D(T ) ≥ inf ≥ .
τ ∈MT τ −1 τ −1
E∞ (1 − γk )+ sup E∞ (1 − γk )+
k=0 τ ∈MT k=0
I-8-2
81. G. Lorden ∗ proved that CUSUM stopping time
σ ∗(T ) = inf{n ≥ 0 : γn ≥ d∗(T )}
is asymptotically optimal as T → ∞ (for i.i.d. model).
In 1986 G.V.Moustakides ∗∗ proved that CUSUM statistics (γn ) is
optimal for all T < ∞. We consider these problems in § 6 of Topic II
for the case of continuous time (Brownian model).
∗ “Procedures for reacting to a change in distribution”, Ann. Math. Statist.,
42:6 (1971), 1897–1908.
∗∗ “Optimal stopping times for detecting changes in distributions”, Ann.
Statist., 14:4 (1986), 1379–1387.
I-8-3
82. TOPIC II: QUICKEST DETECTION PROBLEMS:
Continuous time. Infinite horizon
§ 1. Introduction
1.1. In the talk we intend to present the basic—from our point of
view—aspects of the problems of the quickest detection of disorders
in the observed data, with accent on
the MINIMAX approaches.
As a model of the observed process X = (Xt )t≥0 with a disorder, we
consider the scheme of the Brownian motion with changing drift.
More exactly, we assume that
σdB , t < θ,
t
Xt = µ(t − θ)+ + σBt, or dXt =
µ dt + σdBt , t ≥ θ,
where θ is a hidden parameter which can be a random
variable or some parameter with values in R+ = [0, ∞]. II-1-1
83. 1.2. A discrete-time analogue of the process X = (Xt )t≥0 is a model
X = (X1, X2, . . . , Xθ−1, Xθ , Xθ+1, . . .),
where, for a given θ,
X1, X2, . . . , Xθ−1 are i.i.d. with the distribution F∞ and
Xθ , Xθ+1, . . . are i.i.d. with the distribution F0.
We recall that Walter A. Shewhart was the first to use—in 1920–
30s—this model for the description of the quality of manufactured
product. The so-called ‘control charts’, proposed by him, are widely
used in the industry until now.
II-1-2
84. The idea of his method of control can be illustrated by his own
EXAMPLE: Suppose that
for n < θ the r.v.’s Xn are N (µ∞, σ 2) and
for n ≥ θ the r.v.’s Xn are N (µ0, σ 2), where µ0 > µ∞.
Shewhart proposed to declare alarm about appearing of a disorder
(‘change-point’) at a time which is
τ = inf{n ≥ 1 : Xn − µ0 ≥ 3σ}.
II-1-3
85. He did not give explanations whether this (stopping, Markov) time
is optimal, and much later it was shown that:
If θ has geometric distribution:
P(θ = 0) = π, P(θ = k | θ > 0) = q k−1p,
then in the problem
τ inf P(τ = θ) (∗)
τ
the time τ ∗ = inf n ≥ 1 : Xn ≥ c∗(µ0, µ∞, σ 2, p) , where
c∗ = c∗(µ0, µ∞, σ 2, p) = const, is optimal for criterion (∗).
Here the optimal decision about declaring of alarm at time n depends
only on Xn. However, for the more complicated models the optimal
stopping time will depend not only on the last observation Xn but
on the whole past history (X1, . . . , Xn).
II-1-4
86. 1.3. This remark was used in the 1950s by E. S. Page who proposed
new control charts, well known now as a CUSUM (CUmulative
SUMs) method. In view of a great importance of this method, we
recall its construction (for the discrete-time case).
NOTATION:
P0 and P∞ are the distributions of the sequences (X1, . . . , Xn ), n ≥ 1,
n n
under assumptions that θ = 0 and θ = ∞, resp.; Pn = 1 (P0 + P∞);
2 n n
0 ∞ L0
0 = dPn ,
Ln ∞ = dPn ,
Ln n
Ln = ∞ (the likelihood ratios);
dPn dPn Ln
∞
θ = I(n < θ)L∞ + I(n ≥ θ)L0 Lθ−1
Ln n n 0
Lθ−1
(L0 = L∞ = 1, L0 = L∞ = 1).
−1 −1 0 0
II-1-5
87. For the GENERAL DISCRETE-TIME SCHEMES:
the Shewhart method
the CUSUM method is
of control charts is
based on the statistics
based on the statistics
Lθ
n
Ln
n γn = max ∞ , n≥0
Sn = ∞ , n≥0 θ≥0 Ln
Ln
II-1-6
88. It is easy to find that since Lθ = L∞ for θ > n, we have
n n
Lθ
n
γn = max 1, max ∞ ,
0≤θ≤n Ln
or
L0 L∞
n θ−1 Ln
γn = max 1, max = max 1, max ,
0≤θ≤n L∞ L0
n 0≤θ≤n Lθ−1
θ−1
where L−1 = 1 and Ln , n ≥ 0, is defined as follows:
◮ if P0 ≪ P∞, then
n n Ln is the likelihood:
dP0
n
Ln := ∞
(Radon–Nykod´m derivative)
y
dPn
◮ in the general case
Ln is the Lebesgue derivative from the Lebesgue
decomposition P0 (A) = E∞LnI(A)+ Pn (A∩{L∞ = 0})
n n
0
n
dP0
n dP0
n II-1-7
and is denoted again by ∞
, i.e., ∞
:= Ln.
dPn dPn
89. Let Zn = log Ln and Tn = log γn. Then we see that
Tn = max 0, Zn − min Zθ ,
0≤θ≤n−1
whence we find that
Tn = Zn − min Zθ and Tn = max(0, Tn−1 + ∆Zn) ,
0≤θ≤n
where T0 = 0, ∆Zn = Zn − Zn−1 = log(Ln /Ln−1).
(In § 6, we shall discuss the corresponding formulae for the continuous-
time Brownian model. The question about optimality of the CUSUM
method will also be considered.)
II-1-8
90. § 2. Four basic formulations (VARIANTS A, B, C, D)
of the quickest detection problems for the
Brownian case
Recall our basic model
σ dB , t < θ,
t
dXt =
µ dt + σ dBt , t ≥ θ,
where • µ = 0, σ > 0 (µ and σ are known),
2
• B = (Bt )t≥0 is a standard (EBt = 0, EBt = t)
Brownian motion, and
• θ is a time of appearing of a disorder, θ ∈ [0, ∞].
II-2-1
91. VARIANT A. Here
θ = θ(ω) is a random variable with the values from R+ = [0, ∞] and
τ = τ (ω) are stopping (Markov) times (w.r.t. the filtration (Ft )t≥0,
X
where Ft = Ft = σ(Xs , s ≤ t).
• Conditionally variational formulation:
In the class Mα = {τ : P(τ ≤ θ) ≤ α}, where α is a given number
∗
from (0, 1), to find an optimal stopping time τα for which
∗ ∗
E(τα − θ | τα ≥ θ) = inf E(τ − θ | τ ≥ θ) .
τ ∈Mα
• Bayesian formulation: To find
A∗(c) = inf P(τ < θ) + c E(τ − θ)+
τ
∗
and an optimal stopping time τ(c) (if it exists) for which
P(τ(c) < θ) + c E(τ(c) − θ)+ = A∗(c) .
∗ ∗
II-2-2
92. VARIANT B (Generalized Bayesian). Notation:
MT = {τ : E∞τ = T } [the class of stopping times τ for which the
mean time E∞ τ of τ , under assumption that
there was no change point (disorder) at all,
equals a given constant T ].
∗
The problem is to find a stopping time τT in the class MT for
which
1 ∞ + 1 ∞ ∗
inf Eθ (τ − θ) dθ = Eθ (τT − θ)+ dθ .
τ ∈MT T 0 T 0
We call this variant of the quickest detection problem
generalized Bayesian
because the integration w.r.t. dθ can be considered as the integration
w.r.t. the “generalized uniform” distribution on R+.
II-2-3
93. ∗
In § 4 we describe the structure of the optimal stopping time τT and
calculate the value
1 T ∗
B(T ) = Eθ (τT − θ)+ dθ.
T 0
These results will be very useful for the description of the asympto-
tically (T → ∞) optimal method for the minimax
VARIANT C. To find
C(T ) = inf sup E(τ − θ | τ ≥ θ)
τ ∈MT θ≥0
and an optimal stopping time τ T if it exists.
Notice that the problem of finding C(T ) and τ T is not solved yet.
We do not even know whether τ T exists. However, for large T it
is possible to find an asymptotically optimal stopping time and an
asymptotic expansion of C(T ) up to terms which vanish as T → ∞.
II-2-4
94. The following minimax criterion is well known as a Lorden criterion.
VARIANT D. To find
D(T ) = inf sup ess sup Eθ (τ − θ)+ | Fθ (ω) .
τ ∈MT θ≥0 ω
this value can be interpreted
as the “worst-case” mean
detection delay
Below in § 6 we give the sketch of the proof that the optimal
stopping time is of the form
τT = inf{t ≥ 0 : γt ≥ d(T )},
where (γt )t≥0 is the corresponding CUSUM statistics.
II-2-5
95. § 3. VARIANT A
3.1. Under assumption that θ = θ(ω) is a random variable with
exponential distribution,
P(θ = 0) = π, P(θ > t | θ > 0) = e−λt
(π ∈ [0, 1) and λ > 0 is known),
∗ ∗
the problems of finding both the optimal stopping times τ(c) , τα and
the values
∗ ∗ ∗ +
A (c) = P τ(c) < θ + c E τ(c) − θ , A∗(α) = E(τα − θ | τα ≥ θ)
∗ ∗
were solved by the author a long time ago.
We recall here the main steps of the solution, since they will be
useful also for solving the problem of Variant B.
II-3-1
96. Introducing the a posteriori probability
πt = P(θ ≤ t | Ft), π0 = π,
we find that for any stopping time τ the following Markov representa-
tion holds:
τ
P(τ ≤ θ) + cE(τ − θ)+ = Eπ (1 − πτ ) + c πt dt ,
0
where Eπ stands for the expectation w.r.t. the distribution Pπ of
the process X with π0 = π. The process (πt)t≥0 has the stochastic
differential
µ2 2 µ
dπt = λ − 2 πt (1 − πt) dt + 2 πt(1 − πt) dXt .
σ σ
II-3-2
97. The process (Xt )t≥0 admits the so-called innovation representation
t
Xt = r πs ds + σB t, i.e., dXt = rπt dt + σ dB t,
0
where
µ t
B t = Bt + (θs − πs) ds
σ 0
X
is a Brownian motion w.r.t. (Ft )t≥0. So, one can find that
r
dπt = λ(1 − πt) dt + πt(1 − πt) dB t.
σ
X
Consequently, the process (πt , Ft )t≥0 is a diffusion Markov process
and the problem
τ ∈ MT inf P(τ < θ) + c E(τ − θ)+
τ
τ
= inf Eπ (1 − πτ ) + c πt dt ( ≡ V ∗(π))
τ 0
is a problem of optimal stopping for a diffusion Markov II-3-3
X
process (πt , Ft )t≥0.
98. To solve this problem, we consider (ad hoc) the corresponding
STEFAN (free-boundary) PROBLEM
(for unknown V (π) and A) :
V (π) = 1 − π, π ≥ A,
AV (π) = −cπ, π < A,
X
where A is the infinitesimal operator of the process (πt, Ft )t≥0:
d 1 µ 2 2 2
A = λ(1 − π) + π (1 − π)2 d ,
dπ 2 σ dπ 2
II-3-4
99. The general solution of the equation
AV (π) = −cπ
for π < A contains two undetermined constants (say, C1 and C2). So,
we have three unknown constants A, C1, C2 and only one additional
condition:
this condition is natural,
1 V (π) = 1 − π for π ≥ A since V (π), 0 < π < 1, is continuous .
as a concave function in π
It turned out that two other conditions are:
dV (π) dV0(π)
2 (smooth-fit): = with V0(π) = 1 − π,
dπ π↑A dπ π↓A
dV
3 = 0.
dπ π↑0
These three conditions allow us to find a unique solution (V (π), A)
of the free-boundary problem.
II-3-5
100. Solving the problem 1 - 2 - 3 , we get
(1 − A) − A y(x) dx, π ∈ [0, A),
V (π) = π
1 − π, π ∈ [A, 1],
where
x du u 1
y(x) = −C eΛ[G(x)−G(u)] 2
, G(u) = log − ,
0 u(1 − u) 1−u u
λ c µ2
Λ= , C= , ρ= 2
.
ρ ρ 2σ
The boundary point A = A(c) can be found from the equation
A du
C eΛ[G(A)−G(u)] 2
= 1. ( )
0 u(1 − u)
II-3-6
101. Let us show that
– the solution V (π) coincides with the value function
V ∗(π) = inf τ P(τ < θ) + cE(τ − θ)+ and
– stopping time τ = inf{t : πt > A} coincides with the
optimal stopping tome τ ∗,
i.e.,
V ∗(π) = P(τ < θ) + cE(τ − θ)+
To this end we shall use the ideas of the “Verification lemma” —
see Lemma 1 in § 2 of Topic IV.
II-3-7
102. t
Denote Y t = V (πt) + c 0 πs ds. Let us show that
(Y t) is a Pπ -submartingale for every π ∈ [0, 1].
By the Itˆ–Meyer formula,
o
t t t
Y t = V (πt) + c πs ds = V (π) + AV (πs) ds + c πs ds + Mt,
0 0 0
where (Mt)t≥0 is a Pπ -martingale for every π ∈ [0, 1]. Here
AV (π) = −cπ (for π < A),
AV (π) = A(1 − π) = −λ(1 − π) (for π > A).
II-3-8
103. t µ t
Since πt = π + λ (1 − πs) ds + 2 πs(1 − πs) dB s, we get, by the
0 σ 0
Itˆ–Meyer formula,
o
t t
Y t = V (π) + AV (πs) ds + c πs ds + Mt
0 0
t
= V (π) + I(πs < A)(−cπs)
0
+ I(πs > A)(−λ(1 − πs)) + cπs ds, ( )
′
where (Mt )t≥0 is a Pπ -martingale for each π ∈ [0, 1].
REMARK. Function V (π) does not belong to the class C 2. That is
why instead of the Itˆ formula we use the Itˆ–Meyer formula, which
o o
was obtained by P.-A. Meyer for functions which are a difference of
two concave (convex) functions.
Our function V (π) is concave. At point π = A instead of the usual
′′ ′′
second derivative V (A) in AV (π) we must take V −(A) (left limit).
II-3-9
104. Thus,
t
Y t = V (π) + ′
I(πs > A) cπs − λ(1 − πs) ds + Mt
0
t
= V (π) + I(πs > A) (c + λ)πs − λ ds + Mt. ( )
0
From the solution of the free-boundary problem we can conclude
that the boundary point A (see ( ) on slide II-3-6) satisfies the
inequality
λ
A> .
c+λ
From here and ( ) we see that
t t
I(πs > A) (c + λ)πs − λ ds ≥ I(πs > A) (c + λ)A − λ ds ≥ 0.
0 0
Hence (Y t)t≥0 is a Pπ -submartingale for all π ∈ [0, 1].
II-3-10