SlideShare une entreprise Scribd logo
1  sur  70
Télécharger pour lire hors ligne
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models




Dynamic models



       6    Dynamic models
              Dependent data
              The AR(p) model
              The MA(q) model
              Hidden Markov models
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Dependent data



Dependent data



       Huge portion of real-life data involving dependent datapoints


       Example (Capture-recapture)
               capture histories
               capture sizes
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Dependent data



Eurostoxx 50

       First four stock indices of of the financial index Eurostoxx 50
                                       ABN AMRO HOLDING                                   AEGON
                          30




                                                                        50
                          25




                                                                        40
                                                                        30
                          20




                                                                        20
                          15




                                                                        10
                          10




                               0       500                1000   1500         0   500                 1000   1500

                                                 t                                           t
                                             AHOLD KON.                                 AIR LIQUIDE


                                                                        160
                                                                        150
                          30




                                                                        140
                                                                        130
                          20




                                                                        120
                                                                        110
                          10




                                                                        100




                               0       500                1000   1500         0   500                 1000   1500

                                                 t                                           t
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Dependent data



Markov chain


       Stochastic process (xt )t∈T where distribution of xt given the past
       values x0:(t−1) only depends on xt−1 .
       Homogeneity: distribution of xt given the past constant in t ∈ T .

       Corresponding likelihood
                                                                    T
                               ℓ(θ|x0:T ) = f0 (x0 |θ)                    f (xt |xt−1 , θ)
                                                                  t=1

       [Homogeneity means f independent of t]
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Dependent data



Stationarity constraints




       Difference with the independent case: stationarity and causality
       constraints put restrictions on the parameter space
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Dependent data



Stationarity processes

       Definition (Stationary stochastic process)
       (xt )t∈T is stationary if the joint distributions of (x1 , . . . , xk ) and
       (x1+h , . . . , xk+h ) are the same for all h, k’s.
       It is second-order stationary if, given the autocovariance function

                  γx (r, s) = E[{xr − E(xr )}{xs − E(xs )}],               r, s ∈ T ,

       then

               E(xt ) = µ           and       γx (r, s) = γx (r + t, s + t) ≡ γx (r − s)

       for all r, s, t ∈ T .
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Dependent data



Imposing or not imposing stationarity


       Bayesian inference on a non-stationary process can be [formaly]
       conducted
       Debate
       From a Bayesian point of view, to impose the stationarity
       condition is objectionable:stationarity requirement on finite datasets
       artificial and/or datasets themselves should indicate whether the model is
       stationary
       Reasons for imposing stationarity:asymptotics (Bayes estimators are
       not necessarily convergent in non-stationary settings) causality,
       identifiability and ... common practice.
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Dependent data



Unknown stationarity constraints




       Practical difficulty: for complex models, stationarity constraints get
       quite involved to the point of being unknown in some cases
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



The AR(1) model

       Case of linear Markovian dependence on the last value
                                                                          i.i.d.
                          xt = µ + ̺(xt−1 − µ) + ǫt , ǫt ∼ N (0, σ 2 )


       If |̺| < 1, (xt )t∈Z can be written as
                                                               ∞
                                              xt = µ +              ̺j ǫt−j
                                                              j=0

       and this is a stationary representation.
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Stationary but...


       If |̺| > 1, alternative stationary representation
                                                             ∞
                                            xt = µ −              ̺−j ǫt+j .
                                                            j=1



       This stationary solution is criticized as artificial because xt is
       correlated with future white noises (ǫt )s>t , unlike the case when
       |̺| < 1.
       Non-causal representation...
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Standard constraint



        c Customary to restrict AR(1) processes to the case |̺| < 1

       Thus use of a uniform prior on [−1, 1] for ̺

       Exclusion of the case |̺| = 1 that leads to a random walk because the
       process is then a random walk [no stationary solution]
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



The AR(p) model

       Conditional model
                                                                  p
                         xt |xt−1 , . . . ∼ N           µ+            ̺i (xt−i − µ), σ 2
                                                                i=1


               Generalisation of AR(1)
               Among the most commonly used models in dynamic settings
               More challenging than the static models (stationarity
               constraints)
               Different models depending on the processing of the starting
               value x0
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Stationarity+causality


       Stationarity constraints in the prior as a restriction on the values of
       θ.
       Theorem
       AR(p) model second-order stationary and causal iff the roots of the
       polynomial
                                                                    p
                                             P(x) = 1 −                   ̺i xi
                                                                  i=1

       are all outside the unit circle
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Initial conditions

       Unobserved initial values can be processed in various ways
          1    All x−i ’s (i > 0) set equal to µ, for computational
               convenience
          2    Under stationarity and causality constraints, (xt )t∈Z has a
               stationary distribution: Assume x−p:−1 distributed from
               stationary Np (µ1p , A) distribution
               Corresponding marginal likelihood
                            Z            T
                                                     (                     p
                                                                                                 !2 )
                                    −T
                                         Y               −1                X
                                σ              exp              xt − µ −         ̺i (xt−i − µ)
                                         t=0
                                                         2σ 2              i=1

                                             f (x−p:−1 |µ, A) dx−p:−1 ,
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Initial conditions (cont’d)



          3    Condition instead on the initial observed values x0:(p−1)

                       ℓc (µ, ̺1 , . . . , ̺p , σ|xp:T , x0:(p−1) ) ∝
                                                                                                
                                      T                            p                 2          
                               −T
                             σ             exp − xt − µ −             ̺i (xt−i − µ)       2σ 2       .
                                                                                                
                                      t=p                                 i=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Prior selection

       For AR(1) model, Jeffreys’ prior associated with the stationary
       representation is
                                                                1          1
                                       π1 (µ, σ 2 , ̺) ∝
                                        J
                                                                                   .
                                                                σ2        1 − ̺2

       Extension to higher orders quite complicated (̺ part)!

       Natural conjugate prior for θ = (µ, ̺1 , . . . , ̺p , σ 2 ) :
       normal distribution on (µ, ̺1 , . . . , ̺p ) and inverse gamma
       distribution on σ 2
       ... and for constrained ̺’s?
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Stationarity constraints


       Under stationarity constraints, complex parameter space: each
       value of ̺ needs to be checked for roots of corresponding
       polynomial with modulus less than 1

       E.g., for an AR(2) process with autoregressive polynomial
       P(u) = 1 − ̺1 u − ̺2 u2 , constraint is

                         ̺1 + ̺2 < 1,              ̺1 − ̺2 < 1 and        |̺2 | < 1 .

                                                                                   Skip Durbin forward
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



A first useful reparameterisation

       Durbin–Levinson recursion proposes a reparametrisation from the
       parameters ̺i to the partial autocorrelations

                                                    ψi ∈ [−1, 1]

       which allow for a uniform prior on the hypercube.
       Partial autocorrelation defined as

                           ψi = corr (xt − E[xt |xt+1 , . . . , xt+i−1 ],
                                           xt+i − E[xt+1 |xt+1 , . . . , xt+i−1 ])

       [see also Yule-Walker equations]
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Durbin–Levinson recursion




       Transform
          1    Define ϕii = ψi and ϕij = ϕ(i−1)j − ψi ϕ(i−1)(i−j) , for i > 1
               and j = 1, · · · , i − 1 .
          2    Take ̺i = ϕpi for i = 1, · · · , p.
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Stationarity & priors

       For AR(1) model, Jeffreys’ prior associated with the stationary
       representation is
                                                                1          1
                                       π1 (µ, σ 2 , ̺) ∝
                                        J
                                                                                   .
                                                                σ2        1 − ̺2

       Within the non-stationary region |̺| > 1, Jeffreys’ prior is

                                                1           1                    1 − ̺2T
                      π2 (µ, σ 2 , ̺) ∝
                       J
                                                                           1−               .
                                                σ2      |1 − ̺2 |               T (1 − ̺2 )

             The dominant part of the prior is the non-stationary region!
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Alternative prior
                            J
       The reference prior π1 is only defined when the stationary
       constraint holds.
       Idea Symmetrise to the region |̺| > 1

                                                1        1/ 1 − ̺2    if |̺| < 1,
                      π B (µ, σ 2 , ̺) ∝                                          ,
                                                σ2       1/|̺| ̺2 − 1 if |̺| > 1,
           7
           6
           5
           4
      pi
           3
           2
           1
           0




                 −3            −2            −1            0              1   2   3
                                                           x
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



MCMC consequences




       When devising an MCMC algorithm, use the Durbin-Levinson
       recursion to end up with single normal simulations of the ψi ’s since
       the ̺j ’s are linear functions of the ψi ’s
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Root parameterisation
          Skip Durbin back   Lag polynomial representation
                                                       p
                                             Id −          ̺i B i         xt = ǫt
                                                     i=1

       with (inverse) roots
                                              p
                                                  (Id − λi B) xt = ǫt
                                            i=1



       Closed form expression of the likelihood as a function of the
       (inverse) roots
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Uniform prior under stationarity

       Stationarity The λi ’s are within the unit circle if in C [complex
       numbers] and within [−1, 1] if in R [real numbers]

       Naturally associated with a flat prior on either the unit circle or
       [−1, 1]
                          1           1             1
                                        I|λi |<1      I
                      ⌊k/2⌋ + 1       2             π |λi |<1
                                                  λi ∈R                   λi ∈R

       where ⌊k/2⌋ + 1 number of possible cases


               Term ⌊k/2⌋ + 1 is important for reversible jump applications
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



MCMC consequences


       In a Gibbs sampler, each λi∗ can be simulated conditionaly on the
       others since
                                 p
                                     (Id − λi B) xt = yt − λi∗ yt−1 = ǫt
                               i=1

       where
                                           Yt =            (Id − λi B) xt
                                                    i=i∗
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Metropolis-Hastings implementation




          1    use the prior π itself as a proposal on the (inverse) roots of P,
               selecting one or several roots of P to be simulated from π;
          2    acceptance ratio is likelihood ratio
          3    need to watch out for real/complex dichotomy
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



A [paradoxical] reversible jump implementation


               Define “model” M2k (0 ≤ k ≤ ⌊p/2⌋) as corresponding to a
               number 2k of complex roots o ≤ k ≤ ⌊p/2⌋)
               Moving from model M2k to model M2k+2 means that two real
               roots have been replaced by two conjugate complex roots.
               Propose jump from M2k to M2k+2 with probability 1/2 and from
               M2k to M2k−2 with probability 1/2 [boundary exceptions]
               accept move from M2k to M2k+ or −2 with probability

                                     ℓc (µ, ̺⋆ , . . . , ̺⋆ , σ|xp:T , x0:(p−1) )
                                             1            p
                                                                                  ∧ 1,
                                     ℓc (µ, ̺1 , . . . , ̺p , σ|xp:T , x0:(p−1) )
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Checking your code
       Try with no data and recover the prior
          0.15
          0.10
          0.05
          0.00




                 0              5                10              15       20

                                                 k
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Checking your code
       Try with no data and recover the prior
           1.0
           0.5
           0.0
      λ2
           −0.5
           −1.0




                  −1.0         −0.5             0.0              0.5      1.0

                                                 λ1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Order estimation




       Typical setting for model choice: determine order p of AR(p)
       model
       Roots [may] change drastically from one p to the other.
       No difficulty from the previous perspective: recycle above
       reversible jump algorithm
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



AR(?) reversible jump algorithm



       Use (purely birth-and-death) proposals based on the uniform prior
               k → k+1             [Creation of real root]
               k → k+2             [Creation of complex root]
               k → k-1            [Deletion of real root]
               k → k-2            [Deletion of complex root]
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The AR(p) model



Reversible jump output




                                                                                  0.0 0.1 0.2 0.3 0.4 0.5
            4




                                                                                                                                                                                        5
                                                                                                                                                                                        4
            2




                                                                                                                                                                                        3
       xt

            0




                                                                                                                                                                                        2
            −2




                                                                                                                                                                                        1
                                                                                                                                                                                        0
                 0    100    200               300      400       500                                       0              5            10             15            20                                   −0.4          −0.2           0.0         0.2     0.4

                                          t                                                                                               p                                                                                      αi



                                     p=3                                                                                                p=4                                                                                    p=5


                                                                                  0 1 2 3 4 5 6 7
            5




                                                                                                                                                                                        0 2 4 6 8 10
            4
            3
            2
            1
            0




                     −0.4    −0.2               0.0         0.2         0.4                                         −0.4       −0.2            0.0          0.2          0.4                            −0.4           −0.2           0.0          0.2         0.4

                                          αi                                                                                              αi                                                                                     αi


                                     p=3                                                                                              means
                              mean 0.997                                                                                   −0.0817      0.256        −0.412
                                                                                  0.0 0.2 0.4




                                                                                                                                                                                        0.6
            8
            6




                                                                                                                                                                                        0.2
                                                                                                                                                                               Im(λi)
                                                                              α
            4




                                                                                                                                                                                        −0.2
            2




                                                                                  −0.4




                                                                                                                                                                                        −0.6
            0




                       0.9          1.0               1.1         1.2                                           0     1000     2000          3000      4000       5000                                 −0.1      0.0     0.1     0.2         0.3   0.4   0.5

                                          σ2                                                                                          Iterations                                                                               Re(λi)




       AR(3) simulated dataset of 530 points (upper left) with true parameters αi
       (−0.1, 0.3, −0.4) and σ = 1. First histogram associated with p, following
       histograms with the αi ’s, for different values of p, and of σ 2 . Final graph:
       scatterplot of the complex roots. One before last: evolution of α1 , α2 , α3 .
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



The MA(q) model

       Alternative type of time series
                                                     q
                           xt = µ + ǫt −                 ϑj ǫt−j ,         ǫt ∼ N (0, σ 2 )
                                                   j=1

       Stationary but, for identifiability considerations, the polynomial
                                                                    q
                                             Q(x) = 1 −                   ϑj xj
                                                                  j=1

       must have all its roots outside the unit circle.
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Identifiability

       Example
       For the MA(1) model, xt = µ + ǫt − ϑ1 ǫt−1 ,

                                             var(xt ) = (1 + ϑ2 )σ 2
                                                              1

       can also be written
                                                        1
                            xt = µ + ǫt−1 −
                                     ˜                     ǫ
                                                           ˜t ,      ˜ ∼ N (0, ϑ2 σ 2 ) ,
                                                                     ǫ          1
                                                        ϑ1
       Both pairs (ϑ1 , σ) & (1/ϑ1 , ϑ1 σ) lead to alternative
       representations of the same model.
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Properties of MA models




               Non-Markovian model (but special case of hidden Markov)
               Autocovariance γx (s) is null for |s| > q
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Representations

       x1:T is a normal random variable with constant mean µ and
       covariance matrix
                                                                                              
                         σ2            γ1     γ2      ...    γq           0     ...   0    0
                        γ1            σ2     γ1      . . . γq−1          γq    ...   0    0
                                                                                              
                     Σ=                              ..                                       ,
                                                         .                                    
                                                                                             2
                         0              0      0      ...       0         0     . . . γ1   σ

       with (|s| ≤ q)
                                                            q−|s|
                                                        2
                                             γs = σ                 ϑi ϑi+|s|
                                                            i=0

       Not manageable in practice [large T’s]
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Representations (contd.)

       Conditional on past (ǫ0 , . . . , ǫ−q+1 ),

                        L(µ, ϑ1 , . . . , ϑq , σ|x1:T , ǫ0 , . . . , ǫ−q+1 ) ∝
                                                                                 2    
                                     T          
                                                                       q                
                                                                                         
                            σ −T
                                          exp −     xt − µ +              ϑj ˆt−j
                                                                              ǫ     2σ 2 ,
                                                
                                                                                        
                                                                                         
                                    t=1                                j=1


       where (t > 0)
                                             q
                    ǫ t = xt − µ +
                    ˆ                            ϑj ǫt−j , ˆ0 = ǫ0 , . . . , ǫ1−q = ǫ1−q
                                                    ˆ      ǫ                 ˆ
                                           j=1

       Recursive definition of the likelihood, still costly O(T × q)
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Recycling the AR algorithm


       Same algorithm as in the AR(p) case when modifying the likelihood

       Simulation of the past noises ǫ−i (i = 1, . . . , q) done via a
       Metropolis-Hastings step with target
                                                                      0              T
                                                                          −ǫ2 /2σ2            2   2
            f (ǫ0 , . . . , ǫ−q+1 |x1:T , µ, σ, ϑ) ∝                      e i              e−bt /2σ ,
                                                                                             ǫ

                                                                 i=−q+1              t=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Representations (contd.)

       Encompassing approach for general time series models

       State-space representation

                                                xt = Gyt + εt ,           (1)
                                             yt+1 = F yt + ξt ,           (2)

       (1) is the observation equation and (2) is the state equation


       Note
       As seen below, this is a special case of hidden Markov model
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



MA(q) state-space representation

       For the MA(q) model, take

                                          yt = (ǫt−q , . . . , ǫt−1 , ǫt )′

       and then
                                                                                  
                                            0      1 0        ...     0             0
                                           0                                     0
                                                  0 1        ...     0            
                         yt+1       =                        ...       yt + ǫt+1  . 
                                                                                   .
                                           
                                           0
                                                                       
                                                                                  .
                                                   0 0        ...     1            0
                                            0      0 0        ...     0             1
                             xt     =      µ − ϑq         ϑq−1       . . . ϑ1   −1 yt .
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



MA(q) state-space representation (cont’d)

       Example
       For the MA(1) model, observation equation

                                                   xt = (1 0)yt

       with
                                                 yt = (y1t y2t )′
       directed by the state equation

                                                  0 1           1
                                   yt+1 =             yt + ǫt+1           .
                                                  0 0           ϑ1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



ARMA extension



       ARMA(p, q) model
                           p                                    q
                xt −           ̺i xt−1 = µ + ǫt −                   ϑj ǫt−j ,   ǫt ∼ N (0, σ 2 )
                         i=1                                  j=1

       Identical stationarity and identifiability conditions for both groups
       (̺1 , . . . , ̺p ) and (ϑ1 , . . . , ϑq )
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Reparameterisation
       Identical root representations
                                      p                            q
                                          (Id − λi B)xt =              (Id − ηi B)ǫt
                                    i=1                          i=1

        State-space representation

                          xt = xt = µ − ϑr−1                  ϑr−2        . . . ϑ1    −1 yt

       and                                                                               0 1
                                  0                                          1            0
                                 0            1         0       ...       0
                               B0                                                        B0C
                               B              0         1       ...       0C C           B C
                       yt+1   =B                                ...          C yt + ǫt+1 B . C ,
                                                                                         B.C
                               B                                             C           B.C
                               @0             0         0       ...       1A             @0A
                                ̺r          ̺r−1      ̺r−2      ...       ̺1              1
       under the convention that ̺m = 0 if m > p and ϑm = 0 if m > q.
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     The MA(q) model



Bayesian approximation


       Quasi-identical MCMC implementation:
              1     Simulate (̺1 , . . . , ̺p ) conditional on (ϑ1 , . . . , ϑq )
                    and µ
              2     Simulate (ϑ1 , . . . , ϑq ) conditional on (̺1 , . . . , ̺p )
                    and µ
              3     Simulate (µ, σ) conditional on (̺1 , . . . , ̺p ) and
                    (ϑ1 , . . . , ϑq )


        c Code can be recycled almost as is!
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Hidden Markov models


       Generalisation both of a mixture and of a state space model.
       Example
       Extension of a mixture model with Markov dependence
                                     2
         xt |z, xj j = t ∼ N (µzt , σzt ),                        P (zt = u|zj , j < t) = pzt−1u ,

       (u = 1, . . . , k)

               Label switching also strikes in this model!
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Generic dependence graph

                                             
                                                 xt                   xt+1
                                             
                                              6 6
                                             
                            ···          -       - ···
                                                 yt             - yt+1
                                             


                        (xt , yt )|x0:(t−1) , y0:(t−1) ∼ f (yt |yt−1 ) f (xt |yt )
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Definition

       Observable series {xt }t≥1 associated with a second process
       {yt }t≥1 , with a finite set of N possible values such that
          1.   indicators Yt have an homogeneous Markov dynamic

                                     p(yt |y1:t−1 ) = p(yt |yt−1 ) = Pyt−1 yt

               where y1:t−1 denotes the sequence {y1 , y2 , . . . , yt−1 }.
          2.   Observables xt are independent conditionally on the indicators
               yt
                                                                          T
                                            p(x1:T |y1:T ) =                  p(xt |yt )
                                                                      t=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Dnadataset

       DNA sequence [made of A, C, G, and T’s] corresponding to a
       complete HIV genome where A, C, G, and T have been recoded as
       1, ..., 4.




       Possible modeling by a two-state hidden Markov model with

                                Y = {1, 2}             and       X = {1, 2, 3, 4}
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Parameterization

               For the Markov bit, transition matrix
                                                                          N
                                       P = [pij ]          where                pij = 1
                                                                          j=1

               and initial distribution
                                                            ̺ = ̺P
               for the observables,

                                        fi (xt ) = p(xt |yt = i) = f (xt |θi )

               usually within the same parametrized class of distributions.
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Finite case


       When both hidden and observed chains are finite, with
       Y = {1, . . . , κ} and X = {1, . . . , k}, parameter θ made up of p
       probability vectors q1 = (q1 , . . . , qk ), . . . , qκ = (q1 , . . . , qk )
                                      1        1                   κ            κ

       Joint distribution of (xt , yt )0≤t≤T
                                                         T
                                                 y                      y
                                            ̺y0 qx00
                                                              pyt−1 yt qxt ,
                                                                          t
                                                        t=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Bayesian inference in the finite case

       Posterior of (θ, P) given (xt , yt )t factorizes as
                                                 κ     k                      κ   p
                                                                                         m
                            π(θ, P) ̺y0               (qj )nij
                                                        i
                                                                          ×             pij ij ,
                                               i=1 j=1                        i=1 j=1

       where nij # of visits to state j by the xt ’s when the corresponding
       yt ’s are equal to i and mij # of transitions from state i to state j
       on the hidden chain (yt )t∈N

                                           i
       Under a flat prior on pij ’s and qj ’s, posterior distributions are
       [almost] Dirichlet [initial distribution side effect]
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



MCMC implementation


       Finite State HMM Gibbs Sampler
               Initialization:
                                                                        i
                  1    Generate random values of the pij ’s and of the qj ’s
                  2    Generate the hidden Markov chain (yt )0≤t≤T by (i = 1, 2)

                                                                       i
                                                                  pii qx0       if t = 0 ,
                                            P(yt = i) ∝                    i
                                                                  pyt−1 i qxt   if t  0 ,

                       and compute the corresponding sufficient statistics
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



MCMC implementation (cont’d)
       Finite State HMM Gibbs Sampler
               Iteration m (m ≥ 1):
                  1    Generate

                                       (pi1 , . . . , piκ ) ∼ D(1 + ni1 , . . . , 1 + niκ )
                                           i            i
                                         (q1 , . . . , qk ) ∼ D(1 + mi1 , . . . , 1 + mik )

                       and correct for missing initial probability by a MH step with
                       acceptance probability ̺′ 0 /̺y0
                                                y
                  2    Generate successively each yt (0 ≤ t ≤ T ) by

                                                                               i
                                                                          pii qx1 piy1         if t = 0 ,
                             P(yt = i|xt , yt−1 , yt+1 ) ∝                         i
                                                                          pyt−1 i qxt piyt+1   if t  0 ,

                       and compute corresponding sufficient statistics
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Dnadataset




                                                                 0.9
      0.85




                                                                 0.8
      0.80




                                                                 0.7
      0.75
      0.70




                                                                 0.6
      0.65




                                                                 0.5
             0      200      400         600     800      1000                                        0   200   400         600   800   1000
                                   p11                                                                                p22




                                                                 0.00 0.05 0.10 0.15 0.20 0.25
      0.40
      0.35
      0.30




             0      200      400         600     800      1000                                        0   200   400         600   800   1000
                                    1
                                   q1                                                                                  1
                                                                                                                      q2




                                                                 0.10 0.15 0.20 0.25 0.30 0.35 0.40
      0.45
      0.40
      0.35
      0.30
      0.25




             0      200      400         600     800      1000                                        0   200   400         600   800   1000
                                   q1
                                    3                                                                                 q2
                                                                                                                       1
      0.5




                                                                 0.25
                                                                 0.20
      0.4




                                                                 0.15
      0.3




                                                                 0.10
      0.2




                                                                 0.05
                                                                 0.00
      0.1




             0      200      400         600     800      1000                                        0   200   400         600   800   1000
                                    2                                                                                  2
                                                                                                                      q3
                                   q2
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Forward-Backward formulae




       Existence of a (magical) recurrence relation that provides the
       observed likelihood function in manageable computing time
       Called forward-backward or Baum–Welch formulas
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Observed likelihood computation

       Likelihood of the complete model simple:
                                                          T
                                   ℓc (θ|x, y) =               pyt−1 yt f (xt |θyt )
                                                         t=2

       but likelihood of the observed model is not:

                                     ℓ(θ|x) =                             ℓc (θ|x, y)
                                                     y∈{1,...,κ}T




                                             c O(κT ) complexity
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Forward-Backward paradox



       It is possible to express the (observed) likelihood LO (θ|x) in

                                                     O(T 2 × κ)

       computations, based on the Markov property of the pair (xt , yt ).
          Direct to backward smoothing
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Conditional distributions

       We have
                                                       f (xt |yt ) p(y1:t |x1:(t−1) )
                               p(y1:t |x1:t ) =
                                                              p(xt |x1:(t−1) )
                                                                               [Smoothing/Bayes]
       and
                       p(y1:t |x1:(t−1) ) = k(yt |yt−1 )p(y1:(t−1) |x1:(t−1) )
                                                                 [Prediction]
       where k(yt |yt−1 ) = pyt−1 yt associated with the matrix P and

                                             f (xt |yt ) = f (xt |θyt )
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Update of predictive


       Therefore
                                             p(yt |x1:(t−1) ) f (xt |yt )
                p(y1:t |x1:t ) =
                                                  p(xt |x1:(t−1) )
                                             f (xt |yt ) k(yt |yt−1 )
                                      =                               p(y1:(t−1) |x1:(t−1) )
                                                p(xt |x1:(t−1) )

       with the same order of complexity for p(y1:t |x1:t ) as for
       p(xt |x1:(t−1) )
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Propagation and actualization equations



                    p(yt |x1:(t−1) ) =                    p(y1:(t−1) |x1:(t−1) ) k(yt |yt−1 )
                                              y1:(t−1)

                                                                                        [Propagation]
       and
                                                      p(yt |x1:(t−1) ) f (xt |yt )
                                 p(yt |x1:t ) =                                    .
                                                           p(xt |x1:(t−1) )
                                                                                       [Actualization]
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Forward–backward equations (1)


       Evaluation of
                                               p(yt |x1:T ) t ≤ T
       by forward-backward algorithm
       Denote t ≤ T

                                        γt (i) = P (yt = i|x1:T )
                                        αt (i) = p(x1:t , yt = i)
                                        βt (i) = p(xt+1:T |yt = i)
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Recurrence relations
       Then              
                          α1 (i)
                                             =       f (x1 |yt = i)̺i
                                                                              κ
                          αt+1 (j)
                                             =       f (xt+1 |yt+1 = j)          αt (i)pij
                                                                            i=1

                                                                                               [Forward]
                         
                          βT (i) =
                                                 1
                                                   κ

                          βt (i)
                                         =             pij f (xt+1 |yt+1 = j)βt+1 (j)
                                                  j=1

                                                                                              [Backward]
       and
                                                             αt (i)βt (i)
                                           γt (i) =        κ
                                                               αt (j)βt (j)
                                                         j=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Extension of the recurrence relations


       For

                  ξt (i, j) = P (yt = i, yt+1 = j|x1:T )                  i, j = 1, . . . , κ,

       we also have
                                             αt (i)Pij f (xt+1 |yt = j)βt+1 (j)
                    ξt (i, j) =        κ     κ
                                                  αt (i)Pij f (xt+1 |yt+1 = j)βt+1 (j)
                                     i=1 j=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Overflows and underflows



               On-line scalings of the αt (i)’s and βT (i)’s for each t by
                                              κ                                    κ
                              ct = 1               αt (i)      and        dt = 1         βt (i)
                                             i=1                                   i=1

               avoid overflows or/and underflows for large datasets
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Backward smoothing

       Recursive derivation of conditionals
       We have
                        p(ys |ys−1 , x1:t ) = p(ys |ys−1 , xs:t )
                                                                                  [Markov property!]
       Therefore (s = T, T − 1, . . . , 1)

             p(ys |ys−1 , x1:T ) ∝ k(ys |ys−1 ) f (xs |ys )                      p(ys+1 |ys , x1:T )
                                                                          ys+1

                                                                             [Backward equation]
       with
                            p(yT |yT −1 , x1:T ) ∝ k(yT |yT −1 )f (xT |yt ) .
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



End of the backward smoothing


       The first term is

                        p(y1 |x1:t ) ∝ π(y1 ) f (x1 |y1 )                      p(y2 |y1 , x1:t ) ,
                                                                          y2

       with π stationary distribution of P

       The conditional for ys needs to be defined for each of the κ values
       of ys−1
                            c O(t × κ2 ) operations
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Details


       Need to introduce unnormalized version of the conditionals
       p(yt |yt−1 , x0:T ) such that

              p⋆ (yT |yT −1 , x0:T ) = pyT −1 yT f (xT |yT )
               T
                                                                                κ
                 p⋆ (yt |yt−1 , x1:T )
                  t                             = pyt−1 yt f (xt |yt )              p⋆ (i|yt , x1:T )
                                                                                     t+1
                                                                              i=1
                                                                          κ
                         p⋆ (y0 |x0:T ) = ̺y0 f (x0 |y0 )
                          0                                                     p⋆ (i|y0 , x0:t )
                                                                                 1
                                                                          i=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Likelihood computation

     Bayes formula

                            p(x1:T |y1:T )p(y1:T )
          p(x1:T ) =
                               p(y1:T |x1:T )

       gives a representation of the likelihood based on the
       forward–backward formulae and an arbitrary sequence xo (since
                                                             1:T
       the l.h.s. does not depend on x1:T ).

       Obtained through the p⋆ ’s as
                             t

                                                              κ
                                          p(x0:T ) =              p⋆ (i|x0:T )
                                                                   1
                                                            i=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Prediction filter

       If
                                          ϕt (i) = p(yt = i|x1:t−1 )
       Forward equations

                         ϕ1 (j) = p(y1 = j)
                                                      κ
                                             1
                      ϕt+1 (j) =                           f (xt |yt = i)ϕt (i)pij (t ≥ 1)
                                             ct
                                                  i=1

       where
                                                  κ
                                       ct =               f (xt |yt = k)ϕt (k) ,
                                               k=1
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Dynamic models
     Hidden Markov models



Likelihood computation (2)


       Follows the same principle as the backward equations
       The (log-)likelihood is thus
                                               t             κ
                      log p(x1:t ) =               log           p(xt , yt = i|x1:(r−1) )
                                             r=1           i=1
                                              t             κ
                                        =          log           f (xt |yt = i)ϕt (i)
                                             r=1           i=1

Contenu connexe

Tendances

05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
zukun
 
Ch 2-stress-strains and yield criterion
Ch 2-stress-strains and yield criterionCh 2-stress-strains and yield criterion
Ch 2-stress-strains and yield criterion
VJTI Production
 
Lesson 16: Inverse Trigonometric Functions
Lesson 16: Inverse Trigonometric FunctionsLesson 16: Inverse Trigonometric Functions
Lesson 16: Inverse Trigonometric Functions
Matthew Leingang
 

Tendances (20)

05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
Slides amsterdam-2013
Slides amsterdam-2013Slides amsterdam-2013
Slides amsterdam-2013
 
Slides compiegne
Slides compiegneSlides compiegne
Slides compiegne
 
Slides dauphine
Slides dauphineSlides dauphine
Slides dauphine
 
slides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial scienceslides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial science
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
 
Berlin
BerlinBerlin
Berlin
 
Ch 2-stress-strains and yield criterion
Ch 2-stress-strains and yield criterionCh 2-stress-strains and yield criterion
Ch 2-stress-strains and yield criterion
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Lesson 16: Inverse Trigonometric Functions
Lesson 16: Inverse Trigonometric FunctionsLesson 16: Inverse Trigonometric Functions
Lesson 16: Inverse Trigonometric Functions
 
Quantile and Expectile Regression
Quantile and Expectile RegressionQuantile and Expectile Regression
Quantile and Expectile Regression
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Slides ihp
Slides ihpSlides ihp
Slides ihp
 
Slides univ-van-amsterdam
Slides univ-van-amsterdamSlides univ-van-amsterdam
Slides univ-van-amsterdam
 
Multiattribute utility copula
Multiattribute utility copulaMultiattribute utility copula
Multiattribute utility copula
 
Slides lausanne-2013-v2
Slides lausanne-2013-v2Slides lausanne-2013-v2
Slides lausanne-2013-v2
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Fougeres Besancon Archimax
Fougeres Besancon ArchimaxFougeres Besancon Archimax
Fougeres Besancon Archimax
 
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
 

En vedette

Decision tree powerpoint presentation templates
Decision tree powerpoint presentation templatesDecision tree powerpoint presentation templates
Decision tree powerpoint presentation templates
SlideTeam.net
 

En vedette (8)

The power of noticing
The power of noticingThe power of noticing
The power of noticing
 
Chapter 4 R Part I
Chapter 4 R Part IChapter 4 R Part I
Chapter 4 R Part I
 
The power of noticing: what the best leaders see
The power of noticing: what the best leaders seeThe power of noticing: what the best leaders see
The power of noticing: what the best leaders see
 
Decision tree powerpoint presentation templates
Decision tree powerpoint presentation templatesDecision tree powerpoint presentation templates
Decision tree powerpoint presentation templates
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
 
Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision making
Decision makingDecision making
Decision making
 

Similaire à Bayesian Core: Chapter 7

Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)
Haruki Nishimura
 

Similaire à Bayesian Core: Chapter 7 (20)

Regression
RegressionRegression
Regression
 
Demographic forecasting
Demographic forecastingDemographic forecasting
Demographic forecasting
 
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
 
Discussion of Matti Vihola's talk
Discussion of Matti Vihola's talkDiscussion of Matti Vihola's talk
Discussion of Matti Vihola's talk
 
04_AJMS_288_20.pdf
04_AJMS_288_20.pdf04_AJMS_288_20.pdf
04_AJMS_288_20.pdf
 
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)
 
lectr20 extinction probability of different.ppt
lectr20 extinction probability of different.pptlectr20 extinction probability of different.ppt
lectr20 extinction probability of different.ppt
 
Lec 3 continuous random variable
Lec 3 continuous random variableLec 3 continuous random variable
Lec 3 continuous random variable
 
ppt of Calculus
ppt of Calculusppt of Calculus
ppt of Calculus
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdf
 
Functional Regression Analysis
Functional Regression AnalysisFunctional Regression Analysis
Functional Regression Analysis
 
Exploratory Statistics with R
Exploratory Statistics with RExploratory Statistics with R
Exploratory Statistics with R
 
Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009
 
Discrete talk
Discrete talkDiscrete talk
Discrete talk
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
 
smtlectures.1
smtlectures.1smtlectures.1
smtlectures.1
 
R180304110115
R180304110115R180304110115
R180304110115
 
Representation formula for traffic flow estimation on a network
Representation formula for traffic flow estimation on a networkRepresentation formula for traffic flow estimation on a network
Representation formula for traffic flow estimation on a network
 

Plus de Christian Robert

Plus de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 

Dernier

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 

Bayesian Core: Chapter 7

  • 1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dynamic models 6 Dynamic models Dependent data The AR(p) model The MA(q) model Hidden Markov models
  • 2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories capture sizes
  • 3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Eurostoxx 50 First four stock indices of of the financial index Eurostoxx 50 ABN AMRO HOLDING AEGON 30 50 25 40 30 20 20 15 10 10 0 500 1000 1500 0 500 1000 1500 t t AHOLD KON. AIR LIQUIDE 160 150 30 140 130 20 120 110 10 100 0 500 1000 1500 0 500 1000 1500 t t
  • 4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Markov chain Stochastic process (xt )t∈T where distribution of xt given the past values x0:(t−1) only depends on xt−1 . Homogeneity: distribution of xt given the past constant in t ∈ T . Corresponding likelihood T ℓ(θ|x0:T ) = f0 (x0 |θ) f (xt |xt−1 , θ) t=1 [Homogeneity means f independent of t]
  • 5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Stationarity constraints Difference with the independent case: stationarity and causality constraints put restrictions on the parameter space
  • 6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Stationarity processes Definition (Stationary stochastic process) (xt )t∈T is stationary if the joint distributions of (x1 , . . . , xk ) and (x1+h , . . . , xk+h ) are the same for all h, k’s. It is second-order stationary if, given the autocovariance function γx (r, s) = E[{xr − E(xr )}{xs − E(xs )}], r, s ∈ T , then E(xt ) = µ and γx (r, s) = γx (r + t, s + t) ≡ γx (r − s) for all r, s, t ∈ T .
  • 7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Imposing or not imposing stationarity Bayesian inference on a non-stationary process can be [formaly] conducted Debate From a Bayesian point of view, to impose the stationarity condition is objectionable:stationarity requirement on finite datasets artificial and/or datasets themselves should indicate whether the model is stationary Reasons for imposing stationarity:asymptotics (Bayes estimators are not necessarily convergent in non-stationary settings) causality, identifiability and ... common practice.
  • 8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Unknown stationarity constraints Practical difficulty: for complex models, stationarity constraints get quite involved to the point of being unknown in some cases
  • 9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model The AR(1) model Case of linear Markovian dependence on the last value i.i.d. xt = µ + ̺(xt−1 − µ) + ǫt , ǫt ∼ N (0, σ 2 ) If |̺| < 1, (xt )t∈Z can be written as ∞ xt = µ + ̺j ǫt−j j=0 and this is a stationary representation.
  • 10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationary but... If |̺| > 1, alternative stationary representation ∞ xt = µ − ̺−j ǫt+j . j=1 This stationary solution is criticized as artificial because xt is correlated with future white noises (ǫt )s>t , unlike the case when |̺| < 1. Non-causal representation...
  • 11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Standard constraint c Customary to restrict AR(1) processes to the case |̺| < 1 Thus use of a uniform prior on [−1, 1] for ̺ Exclusion of the case |̺| = 1 that leads to a random walk because the process is then a random walk [no stationary solution]
  • 12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model The AR(p) model Conditional model p xt |xt−1 , . . . ∼ N µ+ ̺i (xt−i − µ), σ 2 i=1 Generalisation of AR(1) Among the most commonly used models in dynamic settings More challenging than the static models (stationarity constraints) Different models depending on the processing of the starting value x0
  • 13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationarity+causality Stationarity constraints in the prior as a restriction on the values of θ. Theorem AR(p) model second-order stationary and causal iff the roots of the polynomial p P(x) = 1 − ̺i xi i=1 are all outside the unit circle
  • 14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Initial conditions Unobserved initial values can be processed in various ways 1 All x−i ’s (i > 0) set equal to µ, for computational convenience 2 Under stationarity and causality constraints, (xt )t∈Z has a stationary distribution: Assume x−p:−1 distributed from stationary Np (µ1p , A) distribution Corresponding marginal likelihood Z T ( p !2 ) −T Y −1 X σ exp xt − µ − ̺i (xt−i − µ) t=0 2σ 2 i=1 f (x−p:−1 |µ, A) dx−p:−1 ,
  • 15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Initial conditions (cont’d) 3 Condition instead on the initial observed values x0:(p−1) ℓc (µ, ̺1 , . . . , ̺p , σ|xp:T , x0:(p−1) ) ∝   T  p 2  −T σ exp − xt − µ − ̺i (xt−i − µ) 2σ 2 .   t=p i=1
  • 16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Prior selection For AR(1) model, Jeffreys’ prior associated with the stationary representation is 1 1 π1 (µ, σ 2 , ̺) ∝ J . σ2 1 − ̺2 Extension to higher orders quite complicated (̺ part)! Natural conjugate prior for θ = (µ, ̺1 , . . . , ̺p , σ 2 ) : normal distribution on (µ, ̺1 , . . . , ̺p ) and inverse gamma distribution on σ 2 ... and for constrained ̺’s?
  • 17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationarity constraints Under stationarity constraints, complex parameter space: each value of ̺ needs to be checked for roots of corresponding polynomial with modulus less than 1 E.g., for an AR(2) process with autoregressive polynomial P(u) = 1 − ̺1 u − ̺2 u2 , constraint is ̺1 + ̺2 < 1, ̺1 − ̺2 < 1 and |̺2 | < 1 . Skip Durbin forward
  • 18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model A first useful reparameterisation Durbin–Levinson recursion proposes a reparametrisation from the parameters ̺i to the partial autocorrelations ψi ∈ [−1, 1] which allow for a uniform prior on the hypercube. Partial autocorrelation defined as ψi = corr (xt − E[xt |xt+1 , . . . , xt+i−1 ], xt+i − E[xt+1 |xt+1 , . . . , xt+i−1 ]) [see also Yule-Walker equations]
  • 19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Durbin–Levinson recursion Transform 1 Define ϕii = ψi and ϕij = ϕ(i−1)j − ψi ϕ(i−1)(i−j) , for i > 1 and j = 1, · · · , i − 1 . 2 Take ̺i = ϕpi for i = 1, · · · , p.
  • 20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationarity & priors For AR(1) model, Jeffreys’ prior associated with the stationary representation is 1 1 π1 (µ, σ 2 , ̺) ∝ J . σ2 1 − ̺2 Within the non-stationary region |̺| > 1, Jeffreys’ prior is 1 1 1 − ̺2T π2 (µ, σ 2 , ̺) ∝ J 1− . σ2 |1 − ̺2 | T (1 − ̺2 ) The dominant part of the prior is the non-stationary region!
  • 21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Alternative prior J The reference prior π1 is only defined when the stationary constraint holds. Idea Symmetrise to the region |̺| > 1 1 1/ 1 − ̺2 if |̺| < 1, π B (µ, σ 2 , ̺) ∝ , σ2 1/|̺| ̺2 − 1 if |̺| > 1, 7 6 5 4 pi 3 2 1 0 −3 −2 −1 0 1 2 3 x
  • 22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model MCMC consequences When devising an MCMC algorithm, use the Durbin-Levinson recursion to end up with single normal simulations of the ψi ’s since the ̺j ’s are linear functions of the ψi ’s
  • 23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Root parameterisation Skip Durbin back Lag polynomial representation p Id − ̺i B i xt = ǫt i=1 with (inverse) roots p (Id − λi B) xt = ǫt i=1 Closed form expression of the likelihood as a function of the (inverse) roots
  • 24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Uniform prior under stationarity Stationarity The λi ’s are within the unit circle if in C [complex numbers] and within [−1, 1] if in R [real numbers] Naturally associated with a flat prior on either the unit circle or [−1, 1] 1 1 1 I|λi |<1 I ⌊k/2⌋ + 1 2 π |λi |<1 λi ∈R λi ∈R where ⌊k/2⌋ + 1 number of possible cases Term ⌊k/2⌋ + 1 is important for reversible jump applications
  • 25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model MCMC consequences In a Gibbs sampler, each λi∗ can be simulated conditionaly on the others since p (Id − λi B) xt = yt − λi∗ yt−1 = ǫt i=1 where Yt = (Id − λi B) xt i=i∗
  • 26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Metropolis-Hastings implementation 1 use the prior π itself as a proposal on the (inverse) roots of P, selecting one or several roots of P to be simulated from π; 2 acceptance ratio is likelihood ratio 3 need to watch out for real/complex dichotomy
  • 27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model A [paradoxical] reversible jump implementation Define “model” M2k (0 ≤ k ≤ ⌊p/2⌋) as corresponding to a number 2k of complex roots o ≤ k ≤ ⌊p/2⌋) Moving from model M2k to model M2k+2 means that two real roots have been replaced by two conjugate complex roots. Propose jump from M2k to M2k+2 with probability 1/2 and from M2k to M2k−2 with probability 1/2 [boundary exceptions] accept move from M2k to M2k+ or −2 with probability ℓc (µ, ̺⋆ , . . . , ̺⋆ , σ|xp:T , x0:(p−1) ) 1 p ∧ 1, ℓc (µ, ̺1 , . . . , ̺p , σ|xp:T , x0:(p−1) )
  • 28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Checking your code Try with no data and recover the prior 0.15 0.10 0.05 0.00 0 5 10 15 20 k
  • 29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Checking your code Try with no data and recover the prior 1.0 0.5 0.0 λ2 −0.5 −1.0 −1.0 −0.5 0.0 0.5 1.0 λ1
  • 30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Order estimation Typical setting for model choice: determine order p of AR(p) model Roots [may] change drastically from one p to the other. No difficulty from the previous perspective: recycle above reversible jump algorithm
  • 31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model AR(?) reversible jump algorithm Use (purely birth-and-death) proposals based on the uniform prior k → k+1 [Creation of real root] k → k+2 [Creation of complex root] k → k-1 [Deletion of real root] k → k-2 [Deletion of complex root]
  • 32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Reversible jump output 0.0 0.1 0.2 0.3 0.4 0.5 4 5 4 2 3 xt 0 2 −2 1 0 0 100 200 300 400 500 0 5 10 15 20 −0.4 −0.2 0.0 0.2 0.4 t p αi p=3 p=4 p=5 0 1 2 3 4 5 6 7 5 0 2 4 6 8 10 4 3 2 1 0 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 αi αi αi p=3 means mean 0.997 −0.0817 0.256 −0.412 0.0 0.2 0.4 0.6 8 6 0.2 Im(λi) α 4 −0.2 2 −0.4 −0.6 0 0.9 1.0 1.1 1.2 0 1000 2000 3000 4000 5000 −0.1 0.0 0.1 0.2 0.3 0.4 0.5 σ2 Iterations Re(λi) AR(3) simulated dataset of 530 points (upper left) with true parameters αi (−0.1, 0.3, −0.4) and σ = 1. First histogram associated with p, following histograms with the αi ’s, for different values of p, and of σ 2 . Final graph: scatterplot of the complex roots. One before last: evolution of α1 , α2 , α3 .
  • 33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model The MA(q) model Alternative type of time series q xt = µ + ǫt − ϑj ǫt−j , ǫt ∼ N (0, σ 2 ) j=1 Stationary but, for identifiability considerations, the polynomial q Q(x) = 1 − ϑj xj j=1 must have all its roots outside the unit circle.
  • 34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Identifiability Example For the MA(1) model, xt = µ + ǫt − ϑ1 ǫt−1 , var(xt ) = (1 + ϑ2 )σ 2 1 can also be written 1 xt = µ + ǫt−1 − ˜ ǫ ˜t , ˜ ∼ N (0, ϑ2 σ 2 ) , ǫ 1 ϑ1 Both pairs (ϑ1 , σ) & (1/ϑ1 , ϑ1 σ) lead to alternative representations of the same model.
  • 35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Properties of MA models Non-Markovian model (but special case of hidden Markov) Autocovariance γx (s) is null for |s| > q
  • 36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Representations x1:T is a normal random variable with constant mean µ and covariance matrix   σ2 γ1 γ2 ... γq 0 ... 0 0  γ1 σ2 γ1 . . . γq−1 γq ... 0 0   Σ= .. ,  .  2 0 0 0 ... 0 0 . . . γ1 σ with (|s| ≤ q) q−|s| 2 γs = σ ϑi ϑi+|s| i=0 Not manageable in practice [large T’s]
  • 37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Representations (contd.) Conditional on past (ǫ0 , . . . , ǫ−q+1 ), L(µ, ϑ1 , . . . , ϑq , σ|x1:T , ǫ0 , . . . , ǫ−q+1 ) ∝   2  T   q   σ −T exp − xt − µ + ϑj ˆt−j ǫ  2σ 2 ,     t=1 j=1 where (t > 0) q ǫ t = xt − µ + ˆ ϑj ǫt−j , ˆ0 = ǫ0 , . . . , ǫ1−q = ǫ1−q ˆ ǫ ˆ j=1 Recursive definition of the likelihood, still costly O(T × q)
  • 38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Recycling the AR algorithm Same algorithm as in the AR(p) case when modifying the likelihood Simulation of the past noises ǫ−i (i = 1, . . . , q) done via a Metropolis-Hastings step with target 0 T −ǫ2 /2σ2 2 2 f (ǫ0 , . . . , ǫ−q+1 |x1:T , µ, σ, ϑ) ∝ e i e−bt /2σ , ǫ i=−q+1 t=1
  • 39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Representations (contd.) Encompassing approach for general time series models State-space representation xt = Gyt + εt , (1) yt+1 = F yt + ξt , (2) (1) is the observation equation and (2) is the state equation Note As seen below, this is a special case of hidden Markov model
  • 40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model MA(q) state-space representation For the MA(q) model, take yt = (ǫt−q , . . . , ǫt−1 , ǫt )′ and then     0 1 0 ... 0 0 0  0  0 1 ... 0   yt+1 =  ...  yt + ǫt+1  .  .  0   . 0 0 ... 1 0 0 0 0 ... 0 1 xt = µ − ϑq ϑq−1 . . . ϑ1 −1 yt .
  • 41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model MA(q) state-space representation (cont’d) Example For the MA(1) model, observation equation xt = (1 0)yt with yt = (y1t y2t )′ directed by the state equation 0 1 1 yt+1 = yt + ǫt+1 . 0 0 ϑ1
  • 42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model ARMA extension ARMA(p, q) model p q xt − ̺i xt−1 = µ + ǫt − ϑj ǫt−j , ǫt ∼ N (0, σ 2 ) i=1 j=1 Identical stationarity and identifiability conditions for both groups (̺1 , . . . , ̺p ) and (ϑ1 , . . . , ϑq )
  • 43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Reparameterisation Identical root representations p q (Id − λi B)xt = (Id − ηi B)ǫt i=1 i=1 State-space representation xt = xt = µ − ϑr−1 ϑr−2 . . . ϑ1 −1 yt and 0 1 0 1 0 0 1 0 ... 0 B0 B0C B 0 1 ... 0C C B C yt+1 =B ... C yt + ǫt+1 B . C , B.C B C B.C @0 0 0 ... 1A @0A ̺r ̺r−1 ̺r−2 ... ̺1 1 under the convention that ̺m = 0 if m > p and ϑm = 0 if m > q.
  • 44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Bayesian approximation Quasi-identical MCMC implementation: 1 Simulate (̺1 , . . . , ̺p ) conditional on (ϑ1 , . . . , ϑq ) and µ 2 Simulate (ϑ1 , . . . , ϑq ) conditional on (̺1 , . . . , ̺p ) and µ 3 Simulate (µ, σ) conditional on (̺1 , . . . , ̺p ) and (ϑ1 , . . . , ϑq ) c Code can be recycled almost as is!
  • 45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Hidden Markov models Generalisation both of a mixture and of a state space model. Example Extension of a mixture model with Markov dependence 2 xt |z, xj j = t ∼ N (µzt , σzt ), P (zt = u|zj , j < t) = pzt−1u , (u = 1, . . . , k) Label switching also strikes in this model!
  • 46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Generic dependence graph xt xt+1 6 6 ··· - - ··· yt - yt+1 (xt , yt )|x0:(t−1) , y0:(t−1) ∼ f (yt |yt−1 ) f (xt |yt )
  • 47. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Definition Observable series {xt }t≥1 associated with a second process {yt }t≥1 , with a finite set of N possible values such that 1. indicators Yt have an homogeneous Markov dynamic p(yt |y1:t−1 ) = p(yt |yt−1 ) = Pyt−1 yt where y1:t−1 denotes the sequence {y1 , y2 , . . . , yt−1 }. 2. Observables xt are independent conditionally on the indicators yt T p(x1:T |y1:T ) = p(xt |yt ) t=1
  • 48. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Dnadataset DNA sequence [made of A, C, G, and T’s] corresponding to a complete HIV genome where A, C, G, and T have been recoded as 1, ..., 4. Possible modeling by a two-state hidden Markov model with Y = {1, 2} and X = {1, 2, 3, 4}
  • 49. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Parameterization For the Markov bit, transition matrix N P = [pij ] where pij = 1 j=1 and initial distribution ̺ = ̺P for the observables, fi (xt ) = p(xt |yt = i) = f (xt |θi ) usually within the same parametrized class of distributions.
  • 50. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Finite case When both hidden and observed chains are finite, with Y = {1, . . . , κ} and X = {1, . . . , k}, parameter θ made up of p probability vectors q1 = (q1 , . . . , qk ), . . . , qκ = (q1 , . . . , qk ) 1 1 κ κ Joint distribution of (xt , yt )0≤t≤T T y y ̺y0 qx00 pyt−1 yt qxt , t t=1
  • 51. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Bayesian inference in the finite case Posterior of (θ, P) given (xt , yt )t factorizes as κ k κ p m π(θ, P) ̺y0 (qj )nij i × pij ij , i=1 j=1 i=1 j=1 where nij # of visits to state j by the xt ’s when the corresponding yt ’s are equal to i and mij # of transitions from state i to state j on the hidden chain (yt )t∈N i Under a flat prior on pij ’s and qj ’s, posterior distributions are [almost] Dirichlet [initial distribution side effect]
  • 52. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models MCMC implementation Finite State HMM Gibbs Sampler Initialization: i 1 Generate random values of the pij ’s and of the qj ’s 2 Generate the hidden Markov chain (yt )0≤t≤T by (i = 1, 2) i pii qx0 if t = 0 , P(yt = i) ∝ i pyt−1 i qxt if t 0 , and compute the corresponding sufficient statistics
  • 53. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models MCMC implementation (cont’d) Finite State HMM Gibbs Sampler Iteration m (m ≥ 1): 1 Generate (pi1 , . . . , piκ ) ∼ D(1 + ni1 , . . . , 1 + niκ ) i i (q1 , . . . , qk ) ∼ D(1 + mi1 , . . . , 1 + mik ) and correct for missing initial probability by a MH step with acceptance probability ̺′ 0 /̺y0 y 2 Generate successively each yt (0 ≤ t ≤ T ) by i pii qx1 piy1 if t = 0 , P(yt = i|xt , yt−1 , yt+1 ) ∝ i pyt−1 i qxt piyt+1 if t 0 , and compute corresponding sufficient statistics
  • 54. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Dnadataset 0.9 0.85 0.8 0.80 0.7 0.75 0.70 0.6 0.65 0.5 0 200 400 600 800 1000 0 200 400 600 800 1000 p11 p22 0.00 0.05 0.10 0.15 0.20 0.25 0.40 0.35 0.30 0 200 400 600 800 1000 0 200 400 600 800 1000 1 q1 1 q2 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.40 0.35 0.30 0.25 0 200 400 600 800 1000 0 200 400 600 800 1000 q1 3 q2 1 0.5 0.25 0.20 0.4 0.15 0.3 0.10 0.2 0.05 0.00 0.1 0 200 400 600 800 1000 0 200 400 600 800 1000 2 2 q3 q2
  • 55. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Forward-Backward formulae Existence of a (magical) recurrence relation that provides the observed likelihood function in manageable computing time Called forward-backward or Baum–Welch formulas
  • 56. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Observed likelihood computation Likelihood of the complete model simple: T ℓc (θ|x, y) = pyt−1 yt f (xt |θyt ) t=2 but likelihood of the observed model is not: ℓ(θ|x) = ℓc (θ|x, y) y∈{1,...,κ}T c O(κT ) complexity
  • 57. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Forward-Backward paradox It is possible to express the (observed) likelihood LO (θ|x) in O(T 2 × κ) computations, based on the Markov property of the pair (xt , yt ). Direct to backward smoothing
  • 58. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Conditional distributions We have f (xt |yt ) p(y1:t |x1:(t−1) ) p(y1:t |x1:t ) = p(xt |x1:(t−1) ) [Smoothing/Bayes] and p(y1:t |x1:(t−1) ) = k(yt |yt−1 )p(y1:(t−1) |x1:(t−1) ) [Prediction] where k(yt |yt−1 ) = pyt−1 yt associated with the matrix P and f (xt |yt ) = f (xt |θyt )
  • 59. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Update of predictive Therefore p(yt |x1:(t−1) ) f (xt |yt ) p(y1:t |x1:t ) = p(xt |x1:(t−1) ) f (xt |yt ) k(yt |yt−1 ) = p(y1:(t−1) |x1:(t−1) ) p(xt |x1:(t−1) ) with the same order of complexity for p(y1:t |x1:t ) as for p(xt |x1:(t−1) )
  • 60. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Propagation and actualization equations p(yt |x1:(t−1) ) = p(y1:(t−1) |x1:(t−1) ) k(yt |yt−1 ) y1:(t−1) [Propagation] and p(yt |x1:(t−1) ) f (xt |yt ) p(yt |x1:t ) = . p(xt |x1:(t−1) ) [Actualization]
  • 61. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Forward–backward equations (1) Evaluation of p(yt |x1:T ) t ≤ T by forward-backward algorithm Denote t ≤ T γt (i) = P (yt = i|x1:T ) αt (i) = p(x1:t , yt = i) βt (i) = p(xt+1:T |yt = i)
  • 62. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Recurrence relations Then   α1 (i)  = f (x1 |yt = i)̺i κ  αt+1 (j)  = f (xt+1 |yt+1 = j) αt (i)pij i=1 [Forward]   βT (i) =  1 κ  βt (i)  = pij f (xt+1 |yt+1 = j)βt+1 (j) j=1 [Backward] and αt (i)βt (i) γt (i) = κ αt (j)βt (j) j=1
  • 63. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Extension of the recurrence relations For ξt (i, j) = P (yt = i, yt+1 = j|x1:T ) i, j = 1, . . . , κ, we also have αt (i)Pij f (xt+1 |yt = j)βt+1 (j) ξt (i, j) = κ κ αt (i)Pij f (xt+1 |yt+1 = j)βt+1 (j) i=1 j=1
  • 64. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Overflows and underflows On-line scalings of the αt (i)’s and βT (i)’s for each t by κ κ ct = 1 αt (i) and dt = 1 βt (i) i=1 i=1 avoid overflows or/and underflows for large datasets
  • 65. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Backward smoothing Recursive derivation of conditionals We have p(ys |ys−1 , x1:t ) = p(ys |ys−1 , xs:t ) [Markov property!] Therefore (s = T, T − 1, . . . , 1) p(ys |ys−1 , x1:T ) ∝ k(ys |ys−1 ) f (xs |ys ) p(ys+1 |ys , x1:T ) ys+1 [Backward equation] with p(yT |yT −1 , x1:T ) ∝ k(yT |yT −1 )f (xT |yt ) .
  • 66. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models End of the backward smoothing The first term is p(y1 |x1:t ) ∝ π(y1 ) f (x1 |y1 ) p(y2 |y1 , x1:t ) , y2 with π stationary distribution of P The conditional for ys needs to be defined for each of the κ values of ys−1 c O(t × κ2 ) operations
  • 67. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Details Need to introduce unnormalized version of the conditionals p(yt |yt−1 , x0:T ) such that p⋆ (yT |yT −1 , x0:T ) = pyT −1 yT f (xT |yT ) T κ p⋆ (yt |yt−1 , x1:T ) t = pyt−1 yt f (xt |yt ) p⋆ (i|yt , x1:T ) t+1 i=1 κ p⋆ (y0 |x0:T ) = ̺y0 f (x0 |y0 ) 0 p⋆ (i|y0 , x0:t ) 1 i=1
  • 68. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Likelihood computation Bayes formula p(x1:T |y1:T )p(y1:T ) p(x1:T ) = p(y1:T |x1:T ) gives a representation of the likelihood based on the forward–backward formulae and an arbitrary sequence xo (since 1:T the l.h.s. does not depend on x1:T ). Obtained through the p⋆ ’s as t κ p(x0:T ) = p⋆ (i|x0:T ) 1 i=1
  • 69. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Prediction filter If ϕt (i) = p(yt = i|x1:t−1 ) Forward equations ϕ1 (j) = p(y1 = j) κ 1 ϕt+1 (j) = f (xt |yt = i)ϕt (i)pij (t ≥ 1) ct i=1 where κ ct = f (xt |yt = k)ϕt (k) , k=1
  • 70. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Likelihood computation (2) Follows the same principle as the backward equations The (log-)likelihood is thus t κ log p(x1:t ) = log p(xt , yt = i|x1:(r−1) ) r=1 i=1 t κ = log f (xt |yt = i)ϕt (i) r=1 i=1