SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Deep Kalman Filters
¤ Rahul G. Krishnan, Uri Shalit, David Sontag arXiv, 2015/11/16
¤
¤ Deep Learning + Kalman Filter
¤ VAE
¤ Deep…
¤
¤
¤
¤
¤
¤
¤
¤
¤
show that our model can successfully perform counterfactual
f anti-diabetic drugs on diabetic patients.
quence of unobserved variables z1, . . . , zT 2 Rs
. For each
esponding observation xt 2 Rd
, and a corresponding action
e medical domain, the variables zt might denote the true state
ate known diagnoses and lab test results, and the actions ut
and medical procedures which aim to change the state of the
dels the observed sequence x1, . . . xT as follows:
ction-transition) , xt = Ftzt + ⌘t (observation),
are zero-mean i.i.d. normal random variables, with covari-
This model assumes that the latent space evolves linearly,
sition matrix Gt 2 Rs⇥s
. The effect of the control signal ut
the latent state obtained by adding the vector Btut 1, where
nput model. Finally, the observations are generated linearly
n matrix Ft 2 Rd⇥s
.
ow to replace all the linear transformations with non-linear
al nets. The upshot is that the non-linearity makes learning
n a very noisy setting and model the effect of external ac-
ow that our model can successfully perform counterfactual
nti-diabetic drugs on diabetic patients.
ence of unobserved variables z1, . . . , zT 2 Rs
. For each
ponding observation xt 2 Rd
, and a corresponding action
medical domain, the variables zt might denote the true state
known diagnoses and lab test results, and the actions ut
d medical procedures which aim to change the state of the
ls the observed sequence x1, . . . xT as follows:
on-transition) , xt = Ftzt + ⌘t (observation),
e zero-mean i.i.d. normal random variables, with covari-
This model assumes that the latent space evolves linearly,
on matrix Gt 2 Rs⇥s
. The effect of the control signal ut
latent state obtained by adding the vector Btut 1, where
ut model. Finally, the observations are generated linearly
matrix Ft 2 Rd⇥s
.
to replace all the linear transformations with non-linear
nets. The upshot is that the non-linearity makes learning
been considered for this goal. On a synthetic setting we empiri
is able to capture patterns within a very noisy setting and mo
tions. On real patient data we show that our model can success
inference to show the effect of anti-diabetic drugs on diabetic
2 Background
Kalman Filters Assume we have a sequence of unobserved variables
unobserved variable zt we have a corresponding observation xt 2 Rd
ut 2 Rc
, which is also observed. In the medical domain, the variables z
of a patient, the observations xt indicate known diagnoses and lab tes
correspond to prescribed medications and medical procedures which a
patient. The classical Kalman filter models the observed sequence x1, .
zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftz
where ✏t ⇠ N (0, ⌃t), ⌘t ⇠ N (0, t) are zero-mean i.i.d. normal ran
ance matrices which may vary with t. This model assumes that the l
transformed at time t by the state-transition matrix Gt 2 Rs⇥s
. The e
is an additive linear transformation of the latent state obtained by addi
Bt 2 Rs⇥c
is known as the control-input model. Finally, the observ
from the latent state via the observation matrix Ft 2 Rd⇥s
.
In the following sections, we show how to replace all the linear tran
transformations parameterized by neural nets. The upshot is that the n
tions. On real patient data we show that our model can success
inference to show the effect of anti-diabetic drugs on diabetic p
2 Background
Kalman Filters Assume we have a sequence of unobserved variables
unobserved variable zt we have a corresponding observation xt 2 Rd
,
ut 2 Rc
, which is also observed. In the medical domain, the variables z
of a patient, the observations xt indicate known diagnoses and lab tes
correspond to prescribed medications and medical procedures which ai
patient. The classical Kalman filter models the observed sequence x1, .
zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt
where ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal ran
ance matrices which may vary with t. This model assumes that the la
transformed at time t by the state-transition matrix Gt 2 Rs⇥s
. The ef
is an additive linear transformation of the latent state obtained by addin
Bt 2 Rs⇥c
is known as the control-input model. Finally, the observa
from the latent state via the observation matrix Ft 2 Rd⇥s
.
In the following sections, we show how to replace all the linear tran
transformations parameterized by neural nets. The upshot is that the n
al. On a synthetic setting we empirically validate that our model
within a very noisy setting and model the effect of external ac-
we show that our model can successfully perform counterfactual
ct of anti-diabetic drugs on diabetic patients.
a sequence of unobserved variables z1, . . . , zT 2 Rs
. For each
orresponding observation xt 2 Rd
, and a corresponding action
the medical domain, the variables zt might denote the true state
dicate known diagnoses and lab test results, and the actions ut
ns and medical procedures which aim to change the state of the
models the observed sequence x1, . . . xT as follows:
(action-transition) , xt = Ftzt + ⌘t (observation),
t) are zero-mean i.i.d. normal random variables, with covari-
h t. This model assumes that the latent space evolves linearly,
ransition matrix Gt 2 Rs⇥s
. The effect of the control signal ut
of the latent state obtained by adding the vector Btut 1, where
ol-input model. Finally, the observations are generated linearly
tion matrix Ft 2 Rd⇥s
.
w how to replace all the linear transformations with non-linear
derive a bound on the log-likelihood of sequential data and an algorithm to learn a bro
class of Kalman filters.
• We evaluate the efficacy of different recognition distributions for inference and learning
• We consider this model for use in counterfactual inference with emphasis on the medi
setting. To the best of our knowledge, the use of continuous state space models has n
been considered for this goal. On a synthetic setting we empirically validate that our mod
is able to capture patterns within a very noisy setting and model the effect of external a
tions. On real patient data we show that our model can successfully perform counterfactu
inference to show the effect of anti-diabetic drugs on diabetic patients.
2 Background
Kalman Filters Assume we have a sequence of unobserved variables z1, . . . , zT 2 Rs
. For ea
unobserved variable zt we have a corresponding observation xt 2 Rd
, and a corresponding acti
ut 2 Rc
, which is also observed. In the medical domain, the variables zt might denote the true st
of a patient, the observations xt indicate known diagnoses and lab test results, and the actions
correspond to prescribed medications and medical procedures which aim to change the state of t
patient. The classical Kalman filter models the observed sequence x1, . . . xT as follows:
zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt + ⌘t (observation),
where ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal random variables, with cova
ance matrices which may vary with t. This model assumes that the latent space evolves linear
transformed at time t by the state-transition matrix Gt 2 Rs⇥s
. The effect of the control signal
is an additive linear transformation of the latent state obtained by adding the vector Btut 1, whe
Bt 2 Rs⇥c
is known as the control-input model. Finally, the observations are generated linea
from the latent state via the observation matrix Ft 2 Rd⇥s
.
In the following sections, we show how to replace all the linear transformations with non-line
derive a bound on the log-likelihood of sequential data and an algorithm to learn a broad
class of Kalman filters.
• We evaluate the efficacy of different recognition distributions for inference and learning.
• We consider this model for use in counterfactual inference with emphasis on the medical
setting. To the best of our knowledge, the use of continuous state space models has not
been considered for this goal. On a synthetic setting we empirically validate that our model
is able to capture patterns within a very noisy setting and model the effect of external ac-
tions. On real patient data we show that our model can successfully perform counterfactual
inference to show the effect of anti-diabetic drugs on diabetic patients.
Background
alman Filters Assume we have a sequence of unobserved variables z1, . . . , zT 2 Rs
. For each
observed variable zt we have a corresponding observation xt 2 Rd
, and a corresponding action
2 Rc
, which is also observed. In the medical domain, the variables zt might denote the true state
a patient, the observations xt indicate known diagnoses and lab test results, and the actions ut
rrespond to prescribed medications and medical procedures which aim to change the state of the
ient. The classical Kalman filter models the observed sequence x1, . . . xT as follows:
zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt + ⌘t (observation),
ere ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal random variables, with covari-
ce matrices which may vary with t. This model assumes that the latent space evolves linearly,
nsformed at time t by the state-transition matrix Gt 2 Rs⇥s
. The effect of the control signal ut
an additive linear transformation of the latent state obtained by adding the vector Btut 1, where
2 Rs⇥c
is known as the control-input model. Finally, the observations are generated linearly
m the latent state via the observation matrix Ft 2 Rd⇥s
.
nsider this model for use in counterfactual inference with emphasis on the medical
. To the best of our knowledge, the use of continuous state space models has not
onsidered for this goal. On a synthetic setting we empirically validate that our model
to capture patterns within a very noisy setting and model the effect of external ac-
On real patient data we show that our model can successfully perform counterfactual
ce to show the effect of anti-diabetic drugs on diabetic patients.
und
s Assume we have a sequence of unobserved variables z1, . . . , zT 2 Rs
. For each
able zt we have a corresponding observation xt 2 Rd
, and a corresponding action
is also observed. In the medical domain, the variables zt might denote the true state
e observations xt indicate known diagnoses and lab test results, and the actions ut
rescribed medications and medical procedures which aim to change the state of the
ssical Kalman filter models the observed sequence x1, . . . xT as follows:
zt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt + ⌘t (observation),
(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal random variables, with covari-
which may vary with t. This model assumes that the latent space evolves linearly,
ime t by the state-transition matrix Gt 2 Rs⇥s
. The effect of the control signal ut
near transformation of the latent state obtained by adding the vector Btut 1, where
known as the control-input model. Finally, the observations are generated linearly
state via the observation matrix Ft 2 Rd⇥s
.
g sections, we show how to replace all the linear transformations with non-linear
parameterized by neural nets. The upshot is that the non-linearity makes learning
¤
¤
¤
¤
¤
¤
¤
Kalman Filters Assume we have a sequence of unobserved variables
unobserved variable zt we have a corresponding observation xt 2 Rd
ut 2 Rc
, which is also observed. In the medical domain, the variables z
of a patient, the observations xt indicate known diagnoses and lab te
correspond to prescribed medications and medical procedures which a
patient. The classical Kalman filter models the observed sequence x1, .
zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftz
where ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal ran
ance matrices which may vary with t. This model assumes that the l
transformed at time t by the state-transition matrix Gt 2 Rs⇥s
. The e
is an additive linear transformation of the latent state obtained by addi
Bt 2 Rs⇥c
is known as the control-input model. Finally, the observ
from the latent state via the observation matrix Ft 2 Rd⇥s
.
In the following sections, we show how to replace all the linear tran
transformations parameterized by neural nets. The upshot is that the n
2
a sequence of unobserved variables z1, . . . , zT 2 Rs
. For each
orresponding observation xt 2 Rd
, and a corresponding action
the medical domain, the variables zt might denote the true state
dicate known diagnoses and lab test results, and the actions ut
ns and medical procedures which aim to change the state of the
models the observed sequence x1, . . . xT as follows:
(action-transition) , xt = Ftzt + ⌘t (observation),
t) are zero-mean i.i.d. normal random variables, with covari-
h t. This model assumes that the latent space evolves linearly,
ansition matrix Gt 2 Rs⇥s
. The effect of the control signal ut
of the latent state obtained by adding the vector Btut 1, where
l-input model. Finally, the observations are generated linearly
tion matrix Ft 2 Rd⇥s
.
w how to replace all the linear transformations with non-linear
eural nets. The upshot is that the non-linearity makes learning
2
much more challenging, as the posterior distribution p(z1, . . . zT |x1, . . . , xT , u1, . . . , uT ) beco
ntractable to compute.
tochastic Backpropagation In order to overcome the intractability of posterior inference, we m
se of recently introduced variational autoencoders (Rezende et al. , 2014; Kingma & Welling, 20
o optimize a variational lower bound on the model log-likelihood. The key technical innova
the introduction of a recognition network, a neural network which approximates the intracta
osterior.
et p(x, z) = p0(z)p✓(x|z) be a generative model for the set of observations x, where p0(z) is
rior on z and p✓(x|z) is a generative model parameterized by ✓. In a model such as the one
osit, the posterior distribution p✓(z|x) is typically intractable. Using the well-known variatio
rinciple, we posit an approximate posterior distribution q (z|x), also called a recognition mod
ee Figure 1a. We then obtain the following lower bound on the marginal likelihood:
Stochastic Backpropagation
¤ SGVB
¤ Variational autoencoder(VAE)
se of recently introduced variational autoencoders (Rezende et al. , 2014; Kingma & Welling, 20
o optimize a variational lower bound on the model log-likelihood. The key technical innovat
s the introduction of a recognition network, a neural network which approximates the intracta
osterior.
Let p(x, z) = p0(z)p✓(x|z) be a generative model for the set of observations x, where p0(z) is
rior on z and p✓(x|z) is a generative model parameterized by ✓. In a model such as the one
osit, the posterior distribution p✓(z|x) is typically intractable. Using the well-known variatio
rinciple, we posit an approximate posterior distribution q (z|x), also called a recognition mod
ee Figure 1a. We then obtain the following lower bound on the marginal likelihood:
log p✓(x) = log
Z
z
q (z|x)
q (z|x)
p✓(x|z)p0(z)dz
Z
z
q (z|x) log
p✓(x|z)p0(z)
q (z|x)
dz
= E
q (z|x)
[log p✓(x|z)] KL( q (z|x)||p0(z) ) = L(x; (✓, )),
where the inequality is by Jensen’s inequality. Variational autoencoders aim to maximize the lo
ound using a parametric model q conditioned on the input. Specifically, Rezende et al. (201
Kingma & Welling (2013) both suggest using a neural net to parameterize q , such that are
arameters of the neural net. The challenge in the resulting optimization problem is that the lo
ound (1) includes an expectation w.r.t. q , which implicitly depends on the network parame
. This difficulty is overcome by using stochastic backpropagation: assuming that the latent s
s normally distributed q (z|x) ⇠ N (µ (x), ⌃ (x)), a simple transformation allows one to ob
Monte Carlo estimates of the gradients of Eq (z|x) [log p✓(x|z)] with respect to . The KL term
1) can be estimated similarly since it is also an expectation. If we assume that the prior p0(z
lso normally distributed, the KL and its gradients may be obtained analytically.
Let p(x, z) = p0(z)p✓(x|z) be a generative model for the set of observations x
prior on z and p✓(x|z) is a generative model parameterized by ✓. In a model
posit, the posterior distribution p✓(z|x) is typically intractable. Using the we
principle, we posit an approximate posterior distribution q (z|x), also called a
see Figure 1a. We then obtain the following lower bound on the marginal likeli
log p✓(x) = log
Z
z
q (z|x)
q (z|x)
p✓(x|z)p0(z)dz
Z
z
q (z|x) log
p✓(x|z
q (
= E
q (z|x)
[log p✓(x|z)] KL( q (z|x)||p0(z) ) = L(x; (✓,
where the inequality is by Jensen’s inequality. Variational autoencoders aim to
bound using a parametric model q conditioned on the input. Specifically, Re
Kingma & Welling (2013) both suggest using a neural net to parameterize q
parameters of the neural net. The challenge in the resulting optimization prob
bound (1) includes an expectation w.r.t. q , which implicitly depends on the
. This difficulty is overcome by using stochastic backpropagation: assuming
is normally distributed q (z|x) ⇠ N (µ (x), ⌃ (x)), a simple transformation
Monte Carlo estimates of the gradients of Eq (z|x) [log p✓(x|z)] with respect to
(1) can be estimated similarly since it is also an expectation. If we assume th
also normally distributed, the KL and its gradients may be obtained analytically
Counterfactual Estimation Counterfactual estimation is the task of inferring
Counterfactual Estimation
¤
¤
¤
¤
¤
¤ !( | )
¤
¤
¤
Pearl(2009)
¤ do
! % & → ! % () &
¤ & %
¤ & %
¤ do
¤
¤
¤
¤
¤
¤
¤
generative model to a sequence of observations and actions, motivated by the
th record data. We assume that the observations come from a latent state which
We assume the observations are a noisy, non-linear function of this latent state.
ume that we can observe actions, which affect the latent state in a possibly
e of observations ~x = (x1, . . . , xT ) and actions ~u = (u1, . . . , uT 1), with
states ~z = (z1, . . . , zT ). As previously, we assume that xt 2 Rd
, ut 2 Rc
, and
tive model for the deep Kalman filter is then given by:
z1 ⇠ N(µ0; ⌃0)
zt ⇠ N(G↵(zt 1, ut 1, t), S (zt 1, ut 1, t))
xt ⇠ ⇧(F(zt)).
(2)
hat the distribution of the latent states is Normal, with a mean and covariance
unctions of the previous latent state, the previous actions, and the time different
4
e of observations and actions, motivated by the
t the observations come from a latent state which
e a noisy, non-linear function of this latent state.
ions, which affect the latent state in a possibly
. . , xT ) and actions ~u = (u1, . . . , uT 1), with
reviously, we assume that xt 2 Rd
, ut 2 Rc
, and
n filter is then given by:
), S (zt 1, ut 1, t)) (2)
nt states is Normal, with a mean and covariance
state, the previous actions, and the time different
Model
r goal is to fit a generative model to a sequence of observations and actions, motivated by the
ure of patient health record data. We assume that the observations come from a latent state which
olves over time. We assume the observations are a noisy, non-linear function of this latent state.
nally, we also assume that we can observe actions, which affect the latent state in a possibly
n-linear manner.
note the sequence of observations ~x = (x1, . . . , xT ) and actions ~u = (u1, . . . , uT 1), with
responding latent states ~z = (z1, . . . , zT ). As previously, we assume that xt 2 Rd
, ut 2 Rc
, and
2 Rs
. The generative model for the deep Kalman filter is then given by:
z1 ⇠ N(µ0; ⌃0)
zt ⇠ N(G↵(zt 1, ut 1, t), S (zt 1, ut 1, t))
xt ⇠ ⇧(F(zt)).
(2)
at is, we assume that the distribution of the latent states is Normal, with a mean and covariance
ich are nonlinear functions of the previous latent state, the previous actions, and the time different
4
Model
goal is to fit a generative model to a sequence of observations and actions, motivated
ure of patient health record data. We assume that the observations come from a latent state
lves over time. We assume the observations are a noisy, non-linear function of this latent
ally, we also assume that we can observe actions, which affect the latent state in a po
-linear manner.
note the sequence of observations ~x = (x1, . . . , xT ) and actions ~u = (u1, . . . , uT 1)
esponding latent states ~z = (z1, . . . , zT ). As previously, we assume that xt 2 Rd
, ut 2 R
2 Rs
. The generative model for the deep Kalman filter is then given by:
z1 ⇠ N(µ0; ⌃0)
zt ⇠ N(G↵(zt 1, ut 1, t), S (zt 1, ut 1, t))
xt ⇠ ⇧(F(zt)).
t is, we assume that the distribution of the latent states is Normal, with a mean and cova
ch are nonlinear functions of the previous latent state, the previous actions, and the time dif
4
VAE
¤
¤
¤ DKF
0 0 d
the parameters of the generative model. We use a diagonal covariance matrix S (·), and employ
a log-parameterization, thus ensuring that the covariance matrix is positive-definite. The model is
presented in Figure 1b, along with the recognition model q which we outline in Section 5.
The key point here is that Eq. (2) subsumes a large family of linear and non-linear latent space
models. By restricting the functional forms of G↵, S , F, we can train different kinds of Kalman
filters within the framework we propose. For example, by setting G↵(zt 1, ut 1) = Gtzt 1 +
Btut 1, S = ⌃t, F = Ftzt where Gt, Bt, ⌃t, Ft are matrices, we obtain classical Kalman fil-
ters. In the past, modifications to the Kalman filter typically introduced a new learning algorithm
and heuristics to approximate the posterior more accurately. In contrast, within the framework we
propose any parametric differentiable function can be substituted in for one of G↵, S , F. Learning
any such model can be done using backpropagation as will be detailed in the next section.
x
z ✓
(a) Variational Autoencoder
Deep Kalman Filters
Rahul G. Krishnan Uri Shalit David Sontag
Courant Institute of Mathematical Sciences
New York University
November 25, 2015
x1 x2 . . . xT
z1 z2 zT
u1
. . .
uT 1
q (~z | ~x, ~u)
Figure 1: Deep Kalman Filter
(b) Deep Kalman Filter
Figure 1: (a): Learning static generative models. Solid lines denote the generative model p0(z)p✓(x|z), dashed
lines denote the variational approximation q (z|x) to the intractable posterior p(z|x). The variational param-
eters are learned jointly with the generative model parameters ✓. (b): Learning in a Deep Kalman Filter. A
parametric approximation to p✓(~z|~x), denoted q (~z|~x, ~u), is used to perform inference during learning.
x
Variational Autoencoder
x1 x2 . . . xT
q (~z | ~x, ~u)
Figure 1: Deep Kalman Filter
1
(b) Deep Kalman Filter
e 1: (a): Learning static generative models. Solid lines denote the generative model p0(z)p✓(x|z), dashed
denote the variational approximation q (z|x) to the intractable posterior p(z|x). The variational param-
are learned jointly with the generative model parameters ✓. (b): Learning in a Deep Kalman Filter. A
etric approximation to p✓(~z|~x), denoted q (~z|~x, ~u), is used to perform inference during learning.
Learning using Stochastic Backpropagation
Maximizing a Lower Bound
m to fit the generative model parameters ✓ which maximize the conditional likelihood of the
given the external actions, i.e we desire max✓ log p✓(x1 . . . , xT |u1 . . . uT 1). Using the vari-
al principle, we apply the lower bound on the log-likelihood of the observations ~x derived in
1). We consider the extension of the Eq. (1) to the temporal setting where we use the following
ization of the prior:
q (~z|~x, ~u) =
TY
t=1
q(zt|zt 1, xt, . . . , xT , ~u) (3)
¤
¤
¤
¤
1
astic Backpropagation
und
del parameters ✓ which maximize the conditional likelihood of the
i.e we desire max✓ log p✓(x1 . . . , xT |u1 . . . uT 1). Using the vari-
lower bound on the log-likelihood of the observations ~x derived in
on of the Eq. (1) to the temporal setting where we use the following
~z|~x, ~u) =
TY
t=1
q(zt|zt 1, xt, . . . , xT , ~u) (3)
orization of q in Section 5.2. We condition the variational approxi-
but also on the actions ~u.
ound to the conditional log-likelihood in a form that will factorize
amenable. The lower bound in Eq. (1) has an analytic form of the
1
earning using Stochastic Backpropagation
Maximizing a Lower Bound
m to fit the generative model parameters ✓ which maximize the conditional likelihood o
ven the external actions, i.e we desire max✓ log p✓(x1 . . . , xT |u1 . . . uT 1). Using the
principle, we apply the lower bound on the log-likelihood of the observations ~x deriv
. We consider the extension of the Eq. (1) to the temporal setting where we use the follo
zation of the prior:
q (~z|~x, ~u) =
TY
t=1
q(zt|zt 1, xt, . . . , xT , ~u)
tivate this structured factorization of q in Section 5.2. We condition the variational app
not just on the inputs ~x but also on the actions ~u.
al is to derive a lower bound to the conditional log-likelihood in a form that will fac
and make learning more amenable. The lower bound in Eq. (1) has an analytic form o
m only for the simplest of transition models G↵, S . Resorting to sampling for estimatin
nt of the KL term results in very high variance. Below we show another way to factoriz
m which results in more stable gradients, by using the Markov property of our model.
Algorithm 1 Learning Deep Kalman Filters
while notConverged() do
~x sampleMiniBatch()
Perform inference and estimate likelihood:
1. ˆz ⇠ q (~z|~x, ~u)
2. ˆx ⇠ p✓(~x|ˆz)
3. Compute r✓L and r L (Differentiating (5))
4. Update ✓, using ADAM
end while
We have for the conditional log-likelihood (see Supplemental Section A for a more detailed deriva-
tion):
log p✓(~x|~u)
Z
~z
q (~z|~x, ~u) log
p0(~z|~u)p✓(~x|~z, ~u)
q (~z|~x, ~u)
d~z
= E
q (~z|~x,~u)
[log p✓(~x|~z, ~u)] KL(q (~z|~x, ~u)||p0(~z|~u))
(using xt ?? x¬t|~z)
=
TX
t=1
E
zt⇠q (zt|~x,~u)
[log p✓(xt|zt, ut 1)] KL(q (~z|~x, ~u)||p0(~z|~u)) = L(x; (✓, )).
The KL divergence can be factorized as:
(4)KL(q (~z|~x, ~u)||p (~z))
¤ KL
¤
log p✓(~x|~u)
~z
q (~z|~x, ~u) log
q (~z|~x, ~u)
d~z
= E
q (~z|~x,~u)
[log p✓(~x|~z, ~u)] KL(q (~z|~x, ~u)||p0(~z|~u))
(using xt ?? x¬t|~z)
=
TX
t=1
E
zt⇠q (zt|~x,~u)
[log p✓(xt|zt, ut 1)] KL(q (~z|~x, ~u)||p0(~z|~u)) = L(x; (✓, )).
The KL divergence can be factorized as:
(4)KL(q (~z|~x, ~u)||p0(~z))
=
Z
z1
. . .
Z
zT
q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u) log
p0(z1, · · · , zT )
q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u)
d~z
(Factorization of p(~z))
= KL(q (z1|~x, ~u)||p0(z1))
+
TX
t=2
E
zt 1⇠q (zt 1|~x,~u)
[KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] .
This yields:
log p✓(~x|~u) L(x; (✓, )) =
TX
t=1
E
q (zt|~x,~u)
[log p✓(xt|zt)] KL(q (z1|~x, ~u)||p0(z1))
TX
t=2
E
q (zt 1|~x,~u)
[KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] . (5)
We have for the conditional log-likelihood (see Supplemental Section A for a more detailed deriva-
tion):
log p✓(~x|~u)
Z
~z
q (~z|~x, ~u) log
p0(~z|~u)p✓(~x|~z, ~u)
q (~z|~x, ~u)
d~z
= E
q (~z|~x,~u)
[log p✓(~x|~z, ~u)] KL(q (~z|~x, ~u)||p0(~z|~u))
(using xt ?? x¬t|~z)
=
TX
t=1
E
zt⇠q (zt|~x,~u)
[log p✓(xt|zt, ut 1)] KL(q (~z|~x, ~u)||p0(~z|~u)) = L(x; (✓, )).
The KL divergence can be factorized as:
(4)KL(q (~z|~x, ~u)||p0(~z))
=
Z
z1
. . .
Z
zT
q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u) log
p0(z1, · · · , zT )
q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u)
d~z
(Factorization of p(~z))
= KL(q (z1|~x, ~u)||p0(z1))
+
TX
t=2
E
zt 1⇠q (zt 1|~x,~u)
[KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] .
This yields:
log p✓(~x|~u) L(x; (✓, )) =
TX
t=1
E
q (zt|~x,~u)
[log p✓(xt|zt)] KL(q (z1|~x, ~u)||p0(z1))
TX
t=2
E
q (zt 1|~x,~u)
[KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] . (5)
Equation (5) is differentiable in the parameters of the model (✓, ), and we can apply backprop-
agation for updating ✓, and stochastic backpropagation for estimating the gradient w.r.t. of the
¤
→
¤
¤ &⃗
¤ ! + Universality of normalizing
flows[Rezende+ 2015]
• q-INDEP: q(zt|xt, ut) parameterized by an MLP
• q-LR: q(zt|xt 1, xt, xt+1, ut 1, ut, ut+1) parameterized by an MLP
• q-RNN: q(zt|x1, . . . , xt, u1, . . . ut) parameterized by a RNN
• q-BRNN: q(zt|x1, . . . , xT , u1, . . . , uT ) parameterized by a bi-directional RNN
In the experimental section we compare the performance of these four models on a challenging
sequence reconstruction task.
An interesting question is whether the Markov properties of the model can enable better design of
approximations to the posterior.
Theorem 5.1. For the graphical model depicted in Figure 1b, the posterior factorizes as:
p(~z|~x, ~u) = p(z1|~x, ~u)
TY
t=2
p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1)
Proof. We use the independence statements implied by the generative model in Figure 1b to note
that p(~z|~x, ~u), the true posterior, factorizes as:
• q-INDEP: q(zt|xt, ut) parameterized by an MLP
• q-LR: q(zt|xt 1, xt, xt+1, ut 1, ut, ut+1) parameterized by an MLP
• q-RNN: q(zt|x1, . . . , xt, u1, . . . ut) parameterized by a RNN
• q-BRNN: q(zt|x1, . . . , xT , u1, . . . , uT ) parameterized by a bi-directional RNN
In the experimental section we compare the performance of these four models on a challenging
sequence reconstruction task.
An interesting question is whether the Markov properties of the model can enable better design of
approximations to the posterior.
Theorem 5.1. For the graphical model depicted in Figure 1b, the posterior factorizes as:
p(~z|~x, ~u) = p(z1|~x, ~u)
TY
t=2
p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1)
Proof. We use the independence statements implied by the generative model in Figure 1b to note
that p(~z|~x, ~u), the true posterior, factorizes as:
p(~z|~x, ~u) = p(z1|~x, ~u)
TY
p(zt|zt 1, ~x, ~u)
Healing MNIST
¤
¤ MNIST
¤ , &
¤ 3
¤ ,
¤ 20%
¤
¤ Small Healing MNIST
¤ 40000 5
¤ Large Healing MNIST
¤ 140000 5
TS
(a) Reconstruction during training (
Figure 2: Large Healing MNIST. (a) P
Small Healing MNIST
¤ 4
noise. We infer a sequence of 1 timestep and display the reconstructions
the latent state and perform forward sampling and reconstruction from the
right rotation.
0 100 200 300 400 500
Epochs
2110
2100
2090
2080
2070
2060
2050
2040
TestLogLikelihood
q-BRNN
q-RNN
q-LR
q-INDEP
• q-INDEP: q(zt|xt, ut) parameterized by an MLP
• q-LR: q(zt|xt 1, xt, xt+1, ut 1, ut, ut+1) parameterized by an MLP
• q-RNN: q(zt|x1, . . . , xt, u1, . . . ut) parameterized by a RNN
• q-BRNN: q(zt|x1, . . . , xT , u1, . . . , uT ) parameterized by a bi-directional RNN
In the experimental section we compare the performance of these four models on a challenging
sequence reconstruction task.
An interesting question is whether the Markov properties of the model can enable better design of
approximations to the posterior.
Theorem 5.1. For the graphical model depicted in Figure 1b, the posterior factorizes as:
p(~z|~x, ~u) = p(z1|~x, ~u)
TY
t=2
p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1)
Proof. We use the independence statements implied by the generative model in Figure 1b to note
that p(~z|~x, ~u), the true posterior, factorizes as:
p(~z|~x, ~u) = p(z1|~x, ~u)
TY
t=2
p(zt|zt 1, ~x, ~u)
Now, we notice that zt ?? x1, . . . , xt 1|zt 1 and zt ?? u1 . . . , ut 2|zt 1, yielding:
p(~z|~x, ~u) = p(z1|~x, ~u)
TY
t=2
p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1)
q-BRNN
forward backward
q-INDEP
q-LR
Small Healing MNIST
¤
¤ q-BRNN q-RNN
(c) Counterfactual Reasoning. We reconstruct variants of the digits 5, 1 not presen
(bottom) and without (top) bit-flip noise. We infer a sequence of 1 timestep and di
from the posterior. We then keep the latent state and perform forward sampling and
generative model under a constant right rotation.
q-BRNNq-RNNq-LRq-INDEP
(a) Samples from models trained with different q
(b) Test Log-Likelihood
different q
Figure 3: Small Healing MNIST: (a) Mean probabilities sampled under different
constant, large rotation applied to the right. (b) Test log-likelihood under different rec
Large Healing MNIST
¤
¤ TS:
¤ R:
¤
R
TS
R
TS
R
TS
R
TS
R
TS
(a) Reconstruction during training (b) Samples: Different rotations (c) In
Figure 2: Large Healing MNIST. (a) Pairs of Training Sequences (TS) and Mean
tions (R) shown above. (b) Mean probabilities sampled from the model under
Large Healing MNIST
¤
¤
(a) Reconstruction during training
Large Rotation Right
Large Rotation Left
No Rotation
(b) Samples: Different rotations (c) Inference on
Figure 2: Large Healing MNIST. (a) Pairs of Training Sequences (TS) and Mean Probabilit
tions (R) shown above. (b) Mean probabilities sampled from the model under different, c
(c) Counterfactual Reasoning. We reconstruct variants of the digits 5, 1 not present in the
Small Rotation Right
Small Rotation Left
Large Rotation Right
Large Healing MNIST
¤
¤
¤
¤
(a) Reconstruction during training (b) Samples: Different rotations (c) Inference on unseen digits
Figure 2: Large Healing MNIST. (a) Pairs of Training Sequences (TS) and Mean Probabilities of Reconstruc-
tions (R) shown above. (b) Mean probabilities sampled from the model under different, constant rotations.
(c) Counterfactual Reasoning. We reconstruct variants of the digits 5, 1 not present in the training set, with
(bottom) and without (top) bit-flip noise. We infer a sequence of 1 timestep and display the reconstructions
from the posterior. We then keep the latent state and perform forward sampling and reconstruction from the
generative model under a constant right rotation.
¤
¤
¤ 8000
¤
¤ A1c
¤
¤ 9
¤ 3 4 1
¤ .
¤ A1c
¤
¤ A1c
¤
¤ lab indicator &-./
¤ lab
¤ lab indicator
¤ do
¤ Lab indicator 1
xt
ztzt 1 zt+1
xt
ind
(a) Graphical model during training
xt
ztzt 1 zt+1
1
(b) Graphical model during counterfactual
inference
Figure 5: (a) Generative model with lab indicator variable, focusing on time step t. (b) For counterfactual
inference we set xt
ind
to 1, implementing Pearl’s do-operator
¤ q-BRNN 8000
¤
¤ L : linear relationship
¤ NL : 2 NN 1
¤ Em(ission) : Fκ
¤ Tr(ansition) : Gα
0 500 1000 1500 2000
Epochs
150
140
130
120
110
100
90
80
TestLL
Em: L, Tr: NL
Em: NL, Tr: NL
Em: L, Tr: L
Em: NL, Tr: L
(b)
¤
¤ 800
¤ with without
¤ A1c
¤
0 2 4 6 8 10 12
Time Steps
0.0
0.2
0.4
0.6
0.8
1.0
ProportionofpatientswithhighGlucose
With diabetic drugs
0 2 4 6 8 10 12
Time Steps
0.0
0.2
0.4
0.6
0.8
1.0
ProportionofpatientswithhighGlucose
Without diabetic drugs
(b)
2000
(a) Test Log-Likelihood
(b)
0 2 4 6 8 10 12
Time Steps
0.0
0.2
0.4
0.6
0.8
1.0
ProportionofpatientswithhighA1c
With diabetic drugs
0 2 4 6 8 10 12
Time Steps
0.0
0.2
0.4
0.6
0.8
1.0
ProportionofpatientswithhighA1c
Without diabetic drugs
(c)
Figure 4: Results of disease progression modeling for 8000 diabetic and pre-diabetic patients. (a) Test log-
likelihood under different model variants. Em(ission) denotes F, Tr(ansition) denotes G↵ under Linear (L)
and Non-Linear (NL) functions. We learn a fixed diagonal S . (b) Proportion of patients inferred to have
high (top quantile) glucose level with and without anti-diabetic drugs, starting from the time of first Metformin
prescription. (c) Proportion of patients inferred to have high (above 8%) A1c level with and without anti-
diabetic drugs, starting from the time of first Metformin prescription. Both (b) and (c) are created using the
¤
¤
¤
¤
¤
¤
¤
¤
¤ ( )
http://www.slideshare.net/takehikoihayashi/ss-13441401
¤ Judea Pearl do
http://takehiko-i-
hayashi.hatenablog.com/entry/20111222/1324487579

Contenu connexe

En vedette

(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target PropagationMasahiro Suzuki
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?Masahiro Suzuki
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization TrickMasahiro Suzuki
 
[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読Deep Learning JP
 
Exploding Brand Value at the Local Level Revisited
Exploding Brand Value at the Local Level RevisitedExploding Brand Value at the Local Level Revisited
Exploding Brand Value at the Local Level RevisitedSaepio Technologies
 
Como debe ser el acompañamiento tutorial de un E-Mediador en AVA mediante el...
Como debe ser el acompañamiento tutorial  de un E-Mediador en AVA mediante el...Como debe ser el acompañamiento tutorial  de un E-Mediador en AVA mediante el...
Como debe ser el acompañamiento tutorial de un E-Mediador en AVA mediante el...CEMAcemartineza
 
Werkstattbericht Philippe Wampfler
Werkstattbericht Philippe WampflerWerkstattbericht Philippe Wampfler
Werkstattbericht Philippe WampflerPhilippe Wampfler
 
Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...
Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...
Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...QuitoTech
 
Real Time Target Marketing
Real Time Target MarketingReal Time Target Marketing
Real Time Target MarketingBrad Andersohn
 
Presentación monica botero
Presentación monica boteroPresentación monica botero
Presentación monica boteroRamón Mejía
 
All about me gareth rollason
All about me gareth rollasonAll about me gareth rollason
All about me gareth rollasonmdhih123
 
0529 derecho constitucional iii
0529 derecho constitucional iii0529 derecho constitucional iii
0529 derecho constitucional iiiEduardo Uscanga
 

En vedette (20)

(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
 
[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Kalman filter
Kalman filterKalman filter
Kalman filter
 
Proyecto mínimo viable
Proyecto mínimo viableProyecto mínimo viable
Proyecto mínimo viable
 
Exploding Brand Value at the Local Level Revisited
Exploding Brand Value at the Local Level RevisitedExploding Brand Value at the Local Level Revisited
Exploding Brand Value at the Local Level Revisited
 
Como debe ser el acompañamiento tutorial de un E-Mediador en AVA mediante el...
Como debe ser el acompañamiento tutorial  de un E-Mediador en AVA mediante el...Como debe ser el acompañamiento tutorial  de un E-Mediador en AVA mediante el...
Como debe ser el acompañamiento tutorial de un E-Mediador en AVA mediante el...
 
Werkstattbericht Philippe Wampfler
Werkstattbericht Philippe WampflerWerkstattbericht Philippe Wampfler
Werkstattbericht Philippe Wampfler
 
Presentacion Memoria Jung Mayo 2015
Presentacion Memoria Jung Mayo 2015Presentacion Memoria Jung Mayo 2015
Presentacion Memoria Jung Mayo 2015
 
Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...
Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...
Aprende y Emprende: Puedes ser quien quieres, sólo te falta emprender para ...
 
Real Time Target Marketing
Real Time Target MarketingReal Time Target Marketing
Real Time Target Marketing
 
Presentación monica botero
Presentación monica boteroPresentación monica botero
Presentación monica botero
 
All about me gareth rollason
All about me gareth rollasonAll about me gareth rollason
All about me gareth rollason
 
0529 derecho constitucional iii
0529 derecho constitucional iii0529 derecho constitucional iii
0529 derecho constitucional iii
 

Similaire à (DL hacks輪読) Deep Kalman Filters

2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filternozomuhamada
 
Mixed Model Analysis for Overdispersion
Mixed Model Analysis for OverdispersionMixed Model Analysis for Overdispersion
Mixed Model Analysis for Overdispersiontheijes
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...VLSICS Design
 
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSVLSICS Design
 
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...ijtsrd
 
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...irjes
 
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...IJRES Journal
 
Controllability of Linear Dynamical System
Controllability of  Linear Dynamical SystemControllability of  Linear Dynamical System
Controllability of Linear Dynamical SystemPurnima Pandit
 
Transfer Function Cse ppt
Transfer Function Cse pptTransfer Function Cse ppt
Transfer Function Cse pptsanjaytron
 
Presentation on GMM
Presentation on GMMPresentation on GMM
Presentation on GMMMoses sichei
 
Cellular Automata, PDEs and Pattern Formation
Cellular Automata, PDEs and Pattern FormationCellular Automata, PDEs and Pattern Formation
Cellular Automata, PDEs and Pattern FormationXin-She Yang
 
Parameter Identification and Synchronization of Dynamical System by Introduci...
Parameter Identification and Synchronization of Dynamical System by Introduci...Parameter Identification and Synchronization of Dynamical System by Introduci...
Parameter Identification and Synchronization of Dynamical System by Introduci...Xiaowen Li
 
T-S Fuzzy Observer and Controller of Doubly-Fed Induction Generator
T-S Fuzzy Observer and Controller of Doubly-Fed Induction GeneratorT-S Fuzzy Observer and Controller of Doubly-Fed Induction Generator
T-S Fuzzy Observer and Controller of Doubly-Fed Induction GeneratorIJPEDS-IAES
 
Cointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon ForecastingCointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon Forecastingمحمد إسماعيل
 
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease TransmissionSequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease TransmissionJeremyHeng10
 
BEC- 26 control systems_unit-II
BEC- 26 control systems_unit-IIBEC- 26 control systems_unit-II
BEC- 26 control systems_unit-IIShadab Siddiqui
 
A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...
A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...
A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...IJCSEA Journal
 

Similaire à (DL hacks輪読) Deep Kalman Filters (20)

2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter2012 mdsp pr03 kalman filter
2012 mdsp pr03 kalman filter
 
Mixed Model Analysis for Overdispersion
Mixed Model Analysis for OverdispersionMixed Model Analysis for Overdispersion
Mixed Model Analysis for Overdispersion
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
 
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
 
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
 
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
 
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
 
Controllability of Linear Dynamical System
Controllability of  Linear Dynamical SystemControllability of  Linear Dynamical System
Controllability of Linear Dynamical System
 
Transfer Function Cse ppt
Transfer Function Cse pptTransfer Function Cse ppt
Transfer Function Cse ppt
 
AJMS_467_23.pdf
AJMS_467_23.pdfAJMS_467_23.pdf
AJMS_467_23.pdf
 
Presentation on GMM
Presentation on GMMPresentation on GMM
Presentation on GMM
 
Cellular Automata, PDEs and Pattern Formation
Cellular Automata, PDEs and Pattern FormationCellular Automata, PDEs and Pattern Formation
Cellular Automata, PDEs and Pattern Formation
 
Parameter Identification and Synchronization of Dynamical System by Introduci...
Parameter Identification and Synchronization of Dynamical System by Introduci...Parameter Identification and Synchronization of Dynamical System by Introduci...
Parameter Identification and Synchronization of Dynamical System by Introduci...
 
T-S Fuzzy Observer and Controller of Doubly-Fed Induction Generator
T-S Fuzzy Observer and Controller of Doubly-Fed Induction GeneratorT-S Fuzzy Observer and Controller of Doubly-Fed Induction Generator
T-S Fuzzy Observer and Controller of Doubly-Fed Induction Generator
 
Cointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon ForecastingCointegration and Long-Horizon Forecasting
Cointegration and Long-Horizon Forecasting
 
ResearchPaper
ResearchPaperResearchPaper
ResearchPaper
 
Zte umts-handover-description
Zte umts-handover-descriptionZte umts-handover-description
Zte umts-handover-description
 
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease TransmissionSequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
 
BEC- 26 control systems_unit-II
BEC- 26 control systems_unit-IIBEC- 26 control systems_unit-II
BEC- 26 control systems_unit-II
 
A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...
A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...
A Software Tool for Live-Lock Avoidance in Systems Modelled Using a Class of ...
 

Plus de Masahiro Suzuki

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)Masahiro Suzuki
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択Masahiro Suzuki
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについてMasahiro Suzuki
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデルMasahiro Suzuki
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究についてMasahiro Suzuki
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習Masahiro Suzuki
 

Plus de Masahiro Suzuki (7)

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデル
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

(DL hacks輪読) Deep Kalman Filters

  • 2. ¤ Rahul G. Krishnan, Uri Shalit, David Sontag arXiv, 2015/11/16 ¤ ¤ Deep Learning + Kalman Filter ¤ VAE ¤ Deep… ¤
  • 3.
  • 4. ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ show that our model can successfully perform counterfactual f anti-diabetic drugs on diabetic patients. quence of unobserved variables z1, . . . , zT 2 Rs . For each esponding observation xt 2 Rd , and a corresponding action e medical domain, the variables zt might denote the true state ate known diagnoses and lab test results, and the actions ut and medical procedures which aim to change the state of the dels the observed sequence x1, . . . xT as follows: ction-transition) , xt = Ftzt + ⌘t (observation), are zero-mean i.i.d. normal random variables, with covari- This model assumes that the latent space evolves linearly, sition matrix Gt 2 Rs⇥s . The effect of the control signal ut the latent state obtained by adding the vector Btut 1, where nput model. Finally, the observations are generated linearly n matrix Ft 2 Rd⇥s . ow to replace all the linear transformations with non-linear al nets. The upshot is that the non-linearity makes learning n a very noisy setting and model the effect of external ac- ow that our model can successfully perform counterfactual nti-diabetic drugs on diabetic patients. ence of unobserved variables z1, . . . , zT 2 Rs . For each ponding observation xt 2 Rd , and a corresponding action medical domain, the variables zt might denote the true state known diagnoses and lab test results, and the actions ut d medical procedures which aim to change the state of the ls the observed sequence x1, . . . xT as follows: on-transition) , xt = Ftzt + ⌘t (observation), e zero-mean i.i.d. normal random variables, with covari- This model assumes that the latent space evolves linearly, on matrix Gt 2 Rs⇥s . The effect of the control signal ut latent state obtained by adding the vector Btut 1, where ut model. Finally, the observations are generated linearly matrix Ft 2 Rd⇥s . to replace all the linear transformations with non-linear nets. The upshot is that the non-linearity makes learning been considered for this goal. On a synthetic setting we empiri is able to capture patterns within a very noisy setting and mo tions. On real patient data we show that our model can success inference to show the effect of anti-diabetic drugs on diabetic 2 Background Kalman Filters Assume we have a sequence of unobserved variables unobserved variable zt we have a corresponding observation xt 2 Rd ut 2 Rc , which is also observed. In the medical domain, the variables z of a patient, the observations xt indicate known diagnoses and lab tes correspond to prescribed medications and medical procedures which a patient. The classical Kalman filter models the observed sequence x1, . zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftz where ✏t ⇠ N (0, ⌃t), ⌘t ⇠ N (0, t) are zero-mean i.i.d. normal ran ance matrices which may vary with t. This model assumes that the l transformed at time t by the state-transition matrix Gt 2 Rs⇥s . The e is an additive linear transformation of the latent state obtained by addi Bt 2 Rs⇥c is known as the control-input model. Finally, the observ from the latent state via the observation matrix Ft 2 Rd⇥s . In the following sections, we show how to replace all the linear tran transformations parameterized by neural nets. The upshot is that the n tions. On real patient data we show that our model can success inference to show the effect of anti-diabetic drugs on diabetic p 2 Background Kalman Filters Assume we have a sequence of unobserved variables unobserved variable zt we have a corresponding observation xt 2 Rd , ut 2 Rc , which is also observed. In the medical domain, the variables z of a patient, the observations xt indicate known diagnoses and lab tes correspond to prescribed medications and medical procedures which ai patient. The classical Kalman filter models the observed sequence x1, . zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt where ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal ran ance matrices which may vary with t. This model assumes that the la transformed at time t by the state-transition matrix Gt 2 Rs⇥s . The ef is an additive linear transformation of the latent state obtained by addin Bt 2 Rs⇥c is known as the control-input model. Finally, the observa from the latent state via the observation matrix Ft 2 Rd⇥s . In the following sections, we show how to replace all the linear tran transformations parameterized by neural nets. The upshot is that the n al. On a synthetic setting we empirically validate that our model within a very noisy setting and model the effect of external ac- we show that our model can successfully perform counterfactual ct of anti-diabetic drugs on diabetic patients. a sequence of unobserved variables z1, . . . , zT 2 Rs . For each orresponding observation xt 2 Rd , and a corresponding action the medical domain, the variables zt might denote the true state dicate known diagnoses and lab test results, and the actions ut ns and medical procedures which aim to change the state of the models the observed sequence x1, . . . xT as follows: (action-transition) , xt = Ftzt + ⌘t (observation), t) are zero-mean i.i.d. normal random variables, with covari- h t. This model assumes that the latent space evolves linearly, ransition matrix Gt 2 Rs⇥s . The effect of the control signal ut of the latent state obtained by adding the vector Btut 1, where ol-input model. Finally, the observations are generated linearly tion matrix Ft 2 Rd⇥s . w how to replace all the linear transformations with non-linear derive a bound on the log-likelihood of sequential data and an algorithm to learn a bro class of Kalman filters. • We evaluate the efficacy of different recognition distributions for inference and learning • We consider this model for use in counterfactual inference with emphasis on the medi setting. To the best of our knowledge, the use of continuous state space models has n been considered for this goal. On a synthetic setting we empirically validate that our mod is able to capture patterns within a very noisy setting and model the effect of external a tions. On real patient data we show that our model can successfully perform counterfactu inference to show the effect of anti-diabetic drugs on diabetic patients. 2 Background Kalman Filters Assume we have a sequence of unobserved variables z1, . . . , zT 2 Rs . For ea unobserved variable zt we have a corresponding observation xt 2 Rd , and a corresponding acti ut 2 Rc , which is also observed. In the medical domain, the variables zt might denote the true st of a patient, the observations xt indicate known diagnoses and lab test results, and the actions correspond to prescribed medications and medical procedures which aim to change the state of t patient. The classical Kalman filter models the observed sequence x1, . . . xT as follows: zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt + ⌘t (observation), where ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal random variables, with cova ance matrices which may vary with t. This model assumes that the latent space evolves linear transformed at time t by the state-transition matrix Gt 2 Rs⇥s . The effect of the control signal is an additive linear transformation of the latent state obtained by adding the vector Btut 1, whe Bt 2 Rs⇥c is known as the control-input model. Finally, the observations are generated linea from the latent state via the observation matrix Ft 2 Rd⇥s . In the following sections, we show how to replace all the linear transformations with non-line derive a bound on the log-likelihood of sequential data and an algorithm to learn a broad class of Kalman filters. • We evaluate the efficacy of different recognition distributions for inference and learning. • We consider this model for use in counterfactual inference with emphasis on the medical setting. To the best of our knowledge, the use of continuous state space models has not been considered for this goal. On a synthetic setting we empirically validate that our model is able to capture patterns within a very noisy setting and model the effect of external ac- tions. On real patient data we show that our model can successfully perform counterfactual inference to show the effect of anti-diabetic drugs on diabetic patients. Background alman Filters Assume we have a sequence of unobserved variables z1, . . . , zT 2 Rs . For each observed variable zt we have a corresponding observation xt 2 Rd , and a corresponding action 2 Rc , which is also observed. In the medical domain, the variables zt might denote the true state a patient, the observations xt indicate known diagnoses and lab test results, and the actions ut rrespond to prescribed medications and medical procedures which aim to change the state of the ient. The classical Kalman filter models the observed sequence x1, . . . xT as follows: zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt + ⌘t (observation), ere ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal random variables, with covari- ce matrices which may vary with t. This model assumes that the latent space evolves linearly, nsformed at time t by the state-transition matrix Gt 2 Rs⇥s . The effect of the control signal ut an additive linear transformation of the latent state obtained by adding the vector Btut 1, where 2 Rs⇥c is known as the control-input model. Finally, the observations are generated linearly m the latent state via the observation matrix Ft 2 Rd⇥s . nsider this model for use in counterfactual inference with emphasis on the medical . To the best of our knowledge, the use of continuous state space models has not onsidered for this goal. On a synthetic setting we empirically validate that our model to capture patterns within a very noisy setting and model the effect of external ac- On real patient data we show that our model can successfully perform counterfactual ce to show the effect of anti-diabetic drugs on diabetic patients. und s Assume we have a sequence of unobserved variables z1, . . . , zT 2 Rs . For each able zt we have a corresponding observation xt 2 Rd , and a corresponding action is also observed. In the medical domain, the variables zt might denote the true state e observations xt indicate known diagnoses and lab test results, and the actions ut rescribed medications and medical procedures which aim to change the state of the ssical Kalman filter models the observed sequence x1, . . . xT as follows: zt 1 + Btut 1 + ✏t (action-transition) , xt = Ftzt + ⌘t (observation), (0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal random variables, with covari- which may vary with t. This model assumes that the latent space evolves linearly, ime t by the state-transition matrix Gt 2 Rs⇥s . The effect of the control signal ut near transformation of the latent state obtained by adding the vector Btut 1, where known as the control-input model. Finally, the observations are generated linearly state via the observation matrix Ft 2 Rd⇥s . g sections, we show how to replace all the linear transformations with non-linear parameterized by neural nets. The upshot is that the non-linearity makes learning
  • 6. ¤ ¤ Kalman Filters Assume we have a sequence of unobserved variables unobserved variable zt we have a corresponding observation xt 2 Rd ut 2 Rc , which is also observed. In the medical domain, the variables z of a patient, the observations xt indicate known diagnoses and lab te correspond to prescribed medications and medical procedures which a patient. The classical Kalman filter models the observed sequence x1, . zt = Gtzt 1 + Btut 1 + ✏t (action-transition) , xt = Ftz where ✏t ⇠ N(0, ⌃t), ⌘t ⇠ N(0, t) are zero-mean i.i.d. normal ran ance matrices which may vary with t. This model assumes that the l transformed at time t by the state-transition matrix Gt 2 Rs⇥s . The e is an additive linear transformation of the latent state obtained by addi Bt 2 Rs⇥c is known as the control-input model. Finally, the observ from the latent state via the observation matrix Ft 2 Rd⇥s . In the following sections, we show how to replace all the linear tran transformations parameterized by neural nets. The upshot is that the n 2 a sequence of unobserved variables z1, . . . , zT 2 Rs . For each orresponding observation xt 2 Rd , and a corresponding action the medical domain, the variables zt might denote the true state dicate known diagnoses and lab test results, and the actions ut ns and medical procedures which aim to change the state of the models the observed sequence x1, . . . xT as follows: (action-transition) , xt = Ftzt + ⌘t (observation), t) are zero-mean i.i.d. normal random variables, with covari- h t. This model assumes that the latent space evolves linearly, ansition matrix Gt 2 Rs⇥s . The effect of the control signal ut of the latent state obtained by adding the vector Btut 1, where l-input model. Finally, the observations are generated linearly tion matrix Ft 2 Rd⇥s . w how to replace all the linear transformations with non-linear eural nets. The upshot is that the non-linearity makes learning 2 much more challenging, as the posterior distribution p(z1, . . . zT |x1, . . . , xT , u1, . . . , uT ) beco ntractable to compute. tochastic Backpropagation In order to overcome the intractability of posterior inference, we m se of recently introduced variational autoencoders (Rezende et al. , 2014; Kingma & Welling, 20 o optimize a variational lower bound on the model log-likelihood. The key technical innova the introduction of a recognition network, a neural network which approximates the intracta osterior. et p(x, z) = p0(z)p✓(x|z) be a generative model for the set of observations x, where p0(z) is rior on z and p✓(x|z) is a generative model parameterized by ✓. In a model such as the one osit, the posterior distribution p✓(z|x) is typically intractable. Using the well-known variatio rinciple, we posit an approximate posterior distribution q (z|x), also called a recognition mod ee Figure 1a. We then obtain the following lower bound on the marginal likelihood:
  • 7. Stochastic Backpropagation ¤ SGVB ¤ Variational autoencoder(VAE) se of recently introduced variational autoencoders (Rezende et al. , 2014; Kingma & Welling, 20 o optimize a variational lower bound on the model log-likelihood. The key technical innovat s the introduction of a recognition network, a neural network which approximates the intracta osterior. Let p(x, z) = p0(z)p✓(x|z) be a generative model for the set of observations x, where p0(z) is rior on z and p✓(x|z) is a generative model parameterized by ✓. In a model such as the one osit, the posterior distribution p✓(z|x) is typically intractable. Using the well-known variatio rinciple, we posit an approximate posterior distribution q (z|x), also called a recognition mod ee Figure 1a. We then obtain the following lower bound on the marginal likelihood: log p✓(x) = log Z z q (z|x) q (z|x) p✓(x|z)p0(z)dz Z z q (z|x) log p✓(x|z)p0(z) q (z|x) dz = E q (z|x) [log p✓(x|z)] KL( q (z|x)||p0(z) ) = L(x; (✓, )), where the inequality is by Jensen’s inequality. Variational autoencoders aim to maximize the lo ound using a parametric model q conditioned on the input. Specifically, Rezende et al. (201 Kingma & Welling (2013) both suggest using a neural net to parameterize q , such that are arameters of the neural net. The challenge in the resulting optimization problem is that the lo ound (1) includes an expectation w.r.t. q , which implicitly depends on the network parame . This difficulty is overcome by using stochastic backpropagation: assuming that the latent s s normally distributed q (z|x) ⇠ N (µ (x), ⌃ (x)), a simple transformation allows one to ob Monte Carlo estimates of the gradients of Eq (z|x) [log p✓(x|z)] with respect to . The KL term 1) can be estimated similarly since it is also an expectation. If we assume that the prior p0(z lso normally distributed, the KL and its gradients may be obtained analytically. Let p(x, z) = p0(z)p✓(x|z) be a generative model for the set of observations x prior on z and p✓(x|z) is a generative model parameterized by ✓. In a model posit, the posterior distribution p✓(z|x) is typically intractable. Using the we principle, we posit an approximate posterior distribution q (z|x), also called a see Figure 1a. We then obtain the following lower bound on the marginal likeli log p✓(x) = log Z z q (z|x) q (z|x) p✓(x|z)p0(z)dz Z z q (z|x) log p✓(x|z q ( = E q (z|x) [log p✓(x|z)] KL( q (z|x)||p0(z) ) = L(x; (✓, where the inequality is by Jensen’s inequality. Variational autoencoders aim to bound using a parametric model q conditioned on the input. Specifically, Re Kingma & Welling (2013) both suggest using a neural net to parameterize q parameters of the neural net. The challenge in the resulting optimization prob bound (1) includes an expectation w.r.t. q , which implicitly depends on the . This difficulty is overcome by using stochastic backpropagation: assuming is normally distributed q (z|x) ⇠ N (µ (x), ⌃ (x)), a simple transformation Monte Carlo estimates of the gradients of Eq (z|x) [log p✓(x|z)] with respect to (1) can be estimated similarly since it is also an expectation. If we assume th also normally distributed, the KL and its gradients may be obtained analytically Counterfactual Estimation Counterfactual estimation is the task of inferring
  • 9. Pearl(2009) ¤ do ! % & → ! % () & ¤ & % ¤ & % ¤ do ¤ ¤
  • 10.
  • 11. ¤ ¤ ¤ ¤ ¤ generative model to a sequence of observations and actions, motivated by the th record data. We assume that the observations come from a latent state which We assume the observations are a noisy, non-linear function of this latent state. ume that we can observe actions, which affect the latent state in a possibly e of observations ~x = (x1, . . . , xT ) and actions ~u = (u1, . . . , uT 1), with states ~z = (z1, . . . , zT ). As previously, we assume that xt 2 Rd , ut 2 Rc , and tive model for the deep Kalman filter is then given by: z1 ⇠ N(µ0; ⌃0) zt ⇠ N(G↵(zt 1, ut 1, t), S (zt 1, ut 1, t)) xt ⇠ ⇧(F(zt)). (2) hat the distribution of the latent states is Normal, with a mean and covariance unctions of the previous latent state, the previous actions, and the time different 4 e of observations and actions, motivated by the t the observations come from a latent state which e a noisy, non-linear function of this latent state. ions, which affect the latent state in a possibly . . , xT ) and actions ~u = (u1, . . . , uT 1), with reviously, we assume that xt 2 Rd , ut 2 Rc , and n filter is then given by: ), S (zt 1, ut 1, t)) (2) nt states is Normal, with a mean and covariance state, the previous actions, and the time different Model r goal is to fit a generative model to a sequence of observations and actions, motivated by the ure of patient health record data. We assume that the observations come from a latent state which olves over time. We assume the observations are a noisy, non-linear function of this latent state. nally, we also assume that we can observe actions, which affect the latent state in a possibly n-linear manner. note the sequence of observations ~x = (x1, . . . , xT ) and actions ~u = (u1, . . . , uT 1), with responding latent states ~z = (z1, . . . , zT ). As previously, we assume that xt 2 Rd , ut 2 Rc , and 2 Rs . The generative model for the deep Kalman filter is then given by: z1 ⇠ N(µ0; ⌃0) zt ⇠ N(G↵(zt 1, ut 1, t), S (zt 1, ut 1, t)) xt ⇠ ⇧(F(zt)). (2) at is, we assume that the distribution of the latent states is Normal, with a mean and covariance ich are nonlinear functions of the previous latent state, the previous actions, and the time different 4 Model goal is to fit a generative model to a sequence of observations and actions, motivated ure of patient health record data. We assume that the observations come from a latent state lves over time. We assume the observations are a noisy, non-linear function of this latent ally, we also assume that we can observe actions, which affect the latent state in a po -linear manner. note the sequence of observations ~x = (x1, . . . , xT ) and actions ~u = (u1, . . . , uT 1) esponding latent states ~z = (z1, . . . , zT ). As previously, we assume that xt 2 Rd , ut 2 R 2 Rs . The generative model for the deep Kalman filter is then given by: z1 ⇠ N(µ0; ⌃0) zt ⇠ N(G↵(zt 1, ut 1, t), S (zt 1, ut 1, t)) xt ⇠ ⇧(F(zt)). t is, we assume that the distribution of the latent states is Normal, with a mean and cova ch are nonlinear functions of the previous latent state, the previous actions, and the time dif 4
  • 12. VAE ¤ ¤ ¤ DKF 0 0 d the parameters of the generative model. We use a diagonal covariance matrix S (·), and employ a log-parameterization, thus ensuring that the covariance matrix is positive-definite. The model is presented in Figure 1b, along with the recognition model q which we outline in Section 5. The key point here is that Eq. (2) subsumes a large family of linear and non-linear latent space models. By restricting the functional forms of G↵, S , F, we can train different kinds of Kalman filters within the framework we propose. For example, by setting G↵(zt 1, ut 1) = Gtzt 1 + Btut 1, S = ⌃t, F = Ftzt where Gt, Bt, ⌃t, Ft are matrices, we obtain classical Kalman fil- ters. In the past, modifications to the Kalman filter typically introduced a new learning algorithm and heuristics to approximate the posterior more accurately. In contrast, within the framework we propose any parametric differentiable function can be substituted in for one of G↵, S , F. Learning any such model can be done using backpropagation as will be detailed in the next section. x z ✓ (a) Variational Autoencoder Deep Kalman Filters Rahul G. Krishnan Uri Shalit David Sontag Courant Institute of Mathematical Sciences New York University November 25, 2015 x1 x2 . . . xT z1 z2 zT u1 . . . uT 1 q (~z | ~x, ~u) Figure 1: Deep Kalman Filter (b) Deep Kalman Filter Figure 1: (a): Learning static generative models. Solid lines denote the generative model p0(z)p✓(x|z), dashed lines denote the variational approximation q (z|x) to the intractable posterior p(z|x). The variational param- eters are learned jointly with the generative model parameters ✓. (b): Learning in a Deep Kalman Filter. A parametric approximation to p✓(~z|~x), denoted q (~z|~x, ~u), is used to perform inference during learning. x Variational Autoencoder x1 x2 . . . xT q (~z | ~x, ~u) Figure 1: Deep Kalman Filter 1 (b) Deep Kalman Filter e 1: (a): Learning static generative models. Solid lines denote the generative model p0(z)p✓(x|z), dashed denote the variational approximation q (z|x) to the intractable posterior p(z|x). The variational param- are learned jointly with the generative model parameters ✓. (b): Learning in a Deep Kalman Filter. A etric approximation to p✓(~z|~x), denoted q (~z|~x, ~u), is used to perform inference during learning. Learning using Stochastic Backpropagation Maximizing a Lower Bound m to fit the generative model parameters ✓ which maximize the conditional likelihood of the given the external actions, i.e we desire max✓ log p✓(x1 . . . , xT |u1 . . . uT 1). Using the vari- al principle, we apply the lower bound on the log-likelihood of the observations ~x derived in 1). We consider the extension of the Eq. (1) to the temporal setting where we use the following ization of the prior: q (~z|~x, ~u) = TY t=1 q(zt|zt 1, xt, . . . , xT , ~u) (3)
  • 13. ¤ ¤ ¤ ¤ 1 astic Backpropagation und del parameters ✓ which maximize the conditional likelihood of the i.e we desire max✓ log p✓(x1 . . . , xT |u1 . . . uT 1). Using the vari- lower bound on the log-likelihood of the observations ~x derived in on of the Eq. (1) to the temporal setting where we use the following ~z|~x, ~u) = TY t=1 q(zt|zt 1, xt, . . . , xT , ~u) (3) orization of q in Section 5.2. We condition the variational approxi- but also on the actions ~u. ound to the conditional log-likelihood in a form that will factorize amenable. The lower bound in Eq. (1) has an analytic form of the 1 earning using Stochastic Backpropagation Maximizing a Lower Bound m to fit the generative model parameters ✓ which maximize the conditional likelihood o ven the external actions, i.e we desire max✓ log p✓(x1 . . . , xT |u1 . . . uT 1). Using the principle, we apply the lower bound on the log-likelihood of the observations ~x deriv . We consider the extension of the Eq. (1) to the temporal setting where we use the follo zation of the prior: q (~z|~x, ~u) = TY t=1 q(zt|zt 1, xt, . . . , xT , ~u) tivate this structured factorization of q in Section 5.2. We condition the variational app not just on the inputs ~x but also on the actions ~u. al is to derive a lower bound to the conditional log-likelihood in a form that will fac and make learning more amenable. The lower bound in Eq. (1) has an analytic form o m only for the simplest of transition models G↵, S . Resorting to sampling for estimatin nt of the KL term results in very high variance. Below we show another way to factoriz m which results in more stable gradients, by using the Markov property of our model. Algorithm 1 Learning Deep Kalman Filters while notConverged() do ~x sampleMiniBatch() Perform inference and estimate likelihood: 1. ˆz ⇠ q (~z|~x, ~u) 2. ˆx ⇠ p✓(~x|ˆz) 3. Compute r✓L and r L (Differentiating (5)) 4. Update ✓, using ADAM end while We have for the conditional log-likelihood (see Supplemental Section A for a more detailed deriva- tion): log p✓(~x|~u) Z ~z q (~z|~x, ~u) log p0(~z|~u)p✓(~x|~z, ~u) q (~z|~x, ~u) d~z = E q (~z|~x,~u) [log p✓(~x|~z, ~u)] KL(q (~z|~x, ~u)||p0(~z|~u)) (using xt ?? x¬t|~z) = TX t=1 E zt⇠q (zt|~x,~u) [log p✓(xt|zt, ut 1)] KL(q (~z|~x, ~u)||p0(~z|~u)) = L(x; (✓, )). The KL divergence can be factorized as: (4)KL(q (~z|~x, ~u)||p (~z))
  • 14. ¤ KL ¤ log p✓(~x|~u) ~z q (~z|~x, ~u) log q (~z|~x, ~u) d~z = E q (~z|~x,~u) [log p✓(~x|~z, ~u)] KL(q (~z|~x, ~u)||p0(~z|~u)) (using xt ?? x¬t|~z) = TX t=1 E zt⇠q (zt|~x,~u) [log p✓(xt|zt, ut 1)] KL(q (~z|~x, ~u)||p0(~z|~u)) = L(x; (✓, )). The KL divergence can be factorized as: (4)KL(q (~z|~x, ~u)||p0(~z)) = Z z1 . . . Z zT q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u) log p0(z1, · · · , zT ) q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u) d~z (Factorization of p(~z)) = KL(q (z1|~x, ~u)||p0(z1)) + TX t=2 E zt 1⇠q (zt 1|~x,~u) [KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] . This yields: log p✓(~x|~u) L(x; (✓, )) = TX t=1 E q (zt|~x,~u) [log p✓(xt|zt)] KL(q (z1|~x, ~u)||p0(z1)) TX t=2 E q (zt 1|~x,~u) [KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] . (5) We have for the conditional log-likelihood (see Supplemental Section A for a more detailed deriva- tion): log p✓(~x|~u) Z ~z q (~z|~x, ~u) log p0(~z|~u)p✓(~x|~z, ~u) q (~z|~x, ~u) d~z = E q (~z|~x,~u) [log p✓(~x|~z, ~u)] KL(q (~z|~x, ~u)||p0(~z|~u)) (using xt ?? x¬t|~z) = TX t=1 E zt⇠q (zt|~x,~u) [log p✓(xt|zt, ut 1)] KL(q (~z|~x, ~u)||p0(~z|~u)) = L(x; (✓, )). The KL divergence can be factorized as: (4)KL(q (~z|~x, ~u)||p0(~z)) = Z z1 . . . Z zT q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u) log p0(z1, · · · , zT ) q (z1|~x, ~u) . . . q (zT |zT 1, ~x, ~u) d~z (Factorization of p(~z)) = KL(q (z1|~x, ~u)||p0(z1)) + TX t=2 E zt 1⇠q (zt 1|~x,~u) [KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] . This yields: log p✓(~x|~u) L(x; (✓, )) = TX t=1 E q (zt|~x,~u) [log p✓(xt|zt)] KL(q (z1|~x, ~u)||p0(z1)) TX t=2 E q (zt 1|~x,~u) [KL(q (zt|zt 1, ~x, ~u)||p0(zt|zt 1, ut 1))] . (5) Equation (5) is differentiable in the parameters of the model (✓, ), and we can apply backprop- agation for updating ✓, and stochastic backpropagation for estimating the gradient w.r.t. of the
  • 15. ¤ → ¤ ¤ &⃗ ¤ ! + Universality of normalizing flows[Rezende+ 2015] • q-INDEP: q(zt|xt, ut) parameterized by an MLP • q-LR: q(zt|xt 1, xt, xt+1, ut 1, ut, ut+1) parameterized by an MLP • q-RNN: q(zt|x1, . . . , xt, u1, . . . ut) parameterized by a RNN • q-BRNN: q(zt|x1, . . . , xT , u1, . . . , uT ) parameterized by a bi-directional RNN In the experimental section we compare the performance of these four models on a challenging sequence reconstruction task. An interesting question is whether the Markov properties of the model can enable better design of approximations to the posterior. Theorem 5.1. For the graphical model depicted in Figure 1b, the posterior factorizes as: p(~z|~x, ~u) = p(z1|~x, ~u) TY t=2 p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1) Proof. We use the independence statements implied by the generative model in Figure 1b to note that p(~z|~x, ~u), the true posterior, factorizes as: • q-INDEP: q(zt|xt, ut) parameterized by an MLP • q-LR: q(zt|xt 1, xt, xt+1, ut 1, ut, ut+1) parameterized by an MLP • q-RNN: q(zt|x1, . . . , xt, u1, . . . ut) parameterized by a RNN • q-BRNN: q(zt|x1, . . . , xT , u1, . . . , uT ) parameterized by a bi-directional RNN In the experimental section we compare the performance of these four models on a challenging sequence reconstruction task. An interesting question is whether the Markov properties of the model can enable better design of approximations to the posterior. Theorem 5.1. For the graphical model depicted in Figure 1b, the posterior factorizes as: p(~z|~x, ~u) = p(z1|~x, ~u) TY t=2 p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1) Proof. We use the independence statements implied by the generative model in Figure 1b to note that p(~z|~x, ~u), the true posterior, factorizes as: p(~z|~x, ~u) = p(z1|~x, ~u) TY p(zt|zt 1, ~x, ~u)
  • 16.
  • 17. Healing MNIST ¤ ¤ MNIST ¤ , & ¤ 3 ¤ , ¤ 20% ¤ ¤ Small Healing MNIST ¤ 40000 5 ¤ Large Healing MNIST ¤ 140000 5 TS (a) Reconstruction during training ( Figure 2: Large Healing MNIST. (a) P
  • 18. Small Healing MNIST ¤ 4 noise. We infer a sequence of 1 timestep and display the reconstructions the latent state and perform forward sampling and reconstruction from the right rotation. 0 100 200 300 400 500 Epochs 2110 2100 2090 2080 2070 2060 2050 2040 TestLogLikelihood q-BRNN q-RNN q-LR q-INDEP • q-INDEP: q(zt|xt, ut) parameterized by an MLP • q-LR: q(zt|xt 1, xt, xt+1, ut 1, ut, ut+1) parameterized by an MLP • q-RNN: q(zt|x1, . . . , xt, u1, . . . ut) parameterized by a RNN • q-BRNN: q(zt|x1, . . . , xT , u1, . . . , uT ) parameterized by a bi-directional RNN In the experimental section we compare the performance of these four models on a challenging sequence reconstruction task. An interesting question is whether the Markov properties of the model can enable better design of approximations to the posterior. Theorem 5.1. For the graphical model depicted in Figure 1b, the posterior factorizes as: p(~z|~x, ~u) = p(z1|~x, ~u) TY t=2 p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1) Proof. We use the independence statements implied by the generative model in Figure 1b to note that p(~z|~x, ~u), the true posterior, factorizes as: p(~z|~x, ~u) = p(z1|~x, ~u) TY t=2 p(zt|zt 1, ~x, ~u) Now, we notice that zt ?? x1, . . . , xt 1|zt 1 and zt ?? u1 . . . , ut 2|zt 1, yielding: p(~z|~x, ~u) = p(z1|~x, ~u) TY t=2 p(zt|zt 1, xt, . . . , xT , ut 1, . . . , uT 1) q-BRNN forward backward q-INDEP q-LR
  • 19. Small Healing MNIST ¤ ¤ q-BRNN q-RNN (c) Counterfactual Reasoning. We reconstruct variants of the digits 5, 1 not presen (bottom) and without (top) bit-flip noise. We infer a sequence of 1 timestep and di from the posterior. We then keep the latent state and perform forward sampling and generative model under a constant right rotation. q-BRNNq-RNNq-LRq-INDEP (a) Samples from models trained with different q (b) Test Log-Likelihood different q Figure 3: Small Healing MNIST: (a) Mean probabilities sampled under different constant, large rotation applied to the right. (b) Test log-likelihood under different rec
  • 20. Large Healing MNIST ¤ ¤ TS: ¤ R: ¤ R TS R TS R TS R TS R TS (a) Reconstruction during training (b) Samples: Different rotations (c) In Figure 2: Large Healing MNIST. (a) Pairs of Training Sequences (TS) and Mean tions (R) shown above. (b) Mean probabilities sampled from the model under
  • 21. Large Healing MNIST ¤ ¤ (a) Reconstruction during training Large Rotation Right Large Rotation Left No Rotation (b) Samples: Different rotations (c) Inference on Figure 2: Large Healing MNIST. (a) Pairs of Training Sequences (TS) and Mean Probabilit tions (R) shown above. (b) Mean probabilities sampled from the model under different, c (c) Counterfactual Reasoning. We reconstruct variants of the digits 5, 1 not present in the Small Rotation Right Small Rotation Left Large Rotation Right
  • 22. Large Healing MNIST ¤ ¤ ¤ ¤ (a) Reconstruction during training (b) Samples: Different rotations (c) Inference on unseen digits Figure 2: Large Healing MNIST. (a) Pairs of Training Sequences (TS) and Mean Probabilities of Reconstruc- tions (R) shown above. (b) Mean probabilities sampled from the model under different, constant rotations. (c) Counterfactual Reasoning. We reconstruct variants of the digits 5, 1 not present in the training set, with (bottom) and without (top) bit-flip noise. We infer a sequence of 1 timestep and display the reconstructions from the posterior. We then keep the latent state and perform forward sampling and reconstruction from the generative model under a constant right rotation.
  • 23. ¤ ¤ ¤ 8000 ¤ ¤ A1c ¤ ¤ 9 ¤ 3 4 1 ¤ . ¤ A1c
  • 24. ¤ ¤ A1c ¤ ¤ lab indicator &-./ ¤ lab ¤ lab indicator ¤ do ¤ Lab indicator 1 xt ztzt 1 zt+1 xt ind (a) Graphical model during training xt ztzt 1 zt+1 1 (b) Graphical model during counterfactual inference Figure 5: (a) Generative model with lab indicator variable, focusing on time step t. (b) For counterfactual inference we set xt ind to 1, implementing Pearl’s do-operator
  • 25. ¤ q-BRNN 8000 ¤ ¤ L : linear relationship ¤ NL : 2 NN 1 ¤ Em(ission) : Fκ ¤ Tr(ansition) : Gα 0 500 1000 1500 2000 Epochs 150 140 130 120 110 100 90 80 TestLL Em: L, Tr: NL Em: NL, Tr: NL Em: L, Tr: L Em: NL, Tr: L (b)
  • 26. ¤ ¤ 800 ¤ with without ¤ A1c ¤ 0 2 4 6 8 10 12 Time Steps 0.0 0.2 0.4 0.6 0.8 1.0 ProportionofpatientswithhighGlucose With diabetic drugs 0 2 4 6 8 10 12 Time Steps 0.0 0.2 0.4 0.6 0.8 1.0 ProportionofpatientswithhighGlucose Without diabetic drugs (b) 2000 (a) Test Log-Likelihood (b) 0 2 4 6 8 10 12 Time Steps 0.0 0.2 0.4 0.6 0.8 1.0 ProportionofpatientswithhighA1c With diabetic drugs 0 2 4 6 8 10 12 Time Steps 0.0 0.2 0.4 0.6 0.8 1.0 ProportionofpatientswithhighA1c Without diabetic drugs (c) Figure 4: Results of disease progression modeling for 8000 diabetic and pre-diabetic patients. (a) Test log- likelihood under different model variants. Em(ission) denotes F, Tr(ansition) denotes G↵ under Linear (L) and Non-Linear (NL) functions. We learn a fixed diagonal S . (b) Proportion of patients inferred to have high (top quantile) glucose level with and without anti-diabetic drugs, starting from the time of first Metformin prescription. (c) Proportion of patients inferred to have high (above 8%) A1c level with and without anti- diabetic drugs, starting from the time of first Metformin prescription. Both (b) and (c) are created using the
  • 28. ¤ ¤ ( ) http://www.slideshare.net/takehikoihayashi/ss-13441401 ¤ Judea Pearl do http://takehiko-i- hayashi.hatenablog.com/entry/20111222/1324487579