SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
EM algorithm and its application in Probabilistic Latent
                   Semantic Analysis (pLSA)

                                                 Duc-Hieu Tran
                                             tdh.net [at] gmail.com

                                            Nanyang Technological University


                                                   July 27, 2010




Duc-Hieu Trantdh.net [at] gmail.com (NTU)              EM in pLSA              July 27, 2010   1 / 27
The parameter estimation problem


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   2 / 27
The parameter estimation problem


  Introduction



   Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi )
   =⇒ optimal classifier
           P(ωj |x) ∝ p(x|ωj )p(ωj )
           decide ωi if p(ωi |x) > P(ωj |x), ∀j = i
   In practice, p(x|ωi ) is unknown – just estimated from training samples
   (e.g., assume p(x|ωi ) ∼ N (µi , Σi )).




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   3 / 27
The parameter estimation problem


  Frequentist vs. Bayesian schools


   Frequentist
           parameters – quantities whose values are fixed but unknown.
           the best estimate of their values – the one that maximizes the
           probability of obtaining the observed samples.
   Bayesian
           paramters – random variables having some known prior distribution.
           observation of the samples converts this to a posterior density;
           revising our opinion about the true values of the parameters.




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   4 / 27
The parameter estimation problem


  Examples

           training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )}
           frequentist: maximum likelihood

                                                max            p(y (i) |x (i) ; θ)
                                                   θ
                                                          i

           bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I)
                                                        m
                                    P(θ|S) ∝                   P(y (i) |x (i) , θ) .P(θ)
                                                       i=1

                                              θMAP = arg max P(θ|S)
                                                                  θ




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                         July 27, 2010   5 / 27
EM algorithm


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA   July 27, 2010   6 / 27
EM algorithm


  An estimation problem

           training set of m independent samples: {x (1) , x (2) , . . . , x (m) }
           goal: fit the paramters of a model p(x, z) to the data
           the likelihood:
                                        m                         m
                                                    (i)
                            (θ) =           log p(x ; θ) =              log       p(x (i) , z; θ)
                                      i=1                         i=1         z

           explicitly maximize (θ) might be difficult.
           z - laten random variable
           if z (i) were observed, then maximum likelihood estimation would be
           easy.
           strategy: repeatedly construct a lower-bound on                           (E-step) and
           optimize that lower-bound (M-step).


Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                             July 27, 2010   7 / 27
EM algorithm


  EM algorithm (1)
           digression: Jensen’s inequality.
           f – convex function; E [f (X )] ≥ f (E [X ])
           for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0

                          (θ) =             log p(x (i) ; θ)
                                     i

                               =            log             p(x (i) , z (i) ; θ)
                                     i              z (i)
                                                                          p(x (i) , z (i) ; θ)
                               =            log             Qi (z (i) )                                          (1)
                                     i
                                                                             Qi (z (i) )
                                                    z (i)
                              applying Jensen’s inequality, concave function log
                                                                       p(x (i) , z (i) ; θ)
                               ≥                    Qi (z (i) )log                                               (2)
                                     i
                                                                          Qi (z (i) )
                                            z (i)

      More detail . . .
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                      EM in pLSA                        July 27, 2010   8 / 27
EM algorithm


  EM algorithm (2)
           for any set of distribution Qi , formula (2) gives a lower-bound on (θ)
           how to choose Qi ?
           strategy: make the inequality hold with equality at our particular
           value of θ.
           require:
                                                 p(x (i) , z (i) ; θ)
                                                                      =c
                                                    Qi (z (i) )
           c – constant not depend on z (i)
           choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ)
           we know           z   Qi (z (i) ) = 1, so

                                     p(x (i) , z (i) ; θ)   p(x (i) , z (i) ; θ)
                  Qi (z (i) ) =                           =                      = p(z (i) |x (i) ; θ)
                                      z p(x (i) , z; θ)       p(x (i) ; θ)

Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                           July 27, 2010   9 / 27
EM algorithm


  EM algorithm (3)


           Qi – posterior distribution of z (i) given x (i) and the parameter θ
   EM algorithm: repeat until convergence
           E-step: for each i
                                                Qi (z (i) ) := p(z (i) |x (i) ; θ)
           M-step:

                                                                                   p(x (i) , z (i) ; θ)
                             θ := arg max                        Qi (z (i) ) log
                                            θ        i
                                                                                      Qi (z (i) )
                                                         z (i)

   The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) )



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                EM in pLSA                                July 27, 2010   10 / 27
EM algorithm


  EM algorithm (4)
   Digression: coordinate ascent algorithm.
       maxW (α1 , . . . αm )
             α
           loop until converge:
           for i ∈ 1, . . . , m:

                                    αi = arg max W (α1 , . . . , αi , . . . , αm )
                                                                 ˆ
                                              αi
                                              ˆ

   EM-algorithm as coordinate ascent algorithm

                                                                             p(x (i) , z (i) ; θ)
                               J(Q, θ) =                   Qi (z (i) ) log
                                              i
                                                                                Qi (z (i) )
                                                   z (i)

            (θ) ≥ J(Q, θ)
           EM algorithm can be viewed as coordinate ascent on J
           E-step: maximize w.r.t Q
           M-step: maximize w.r.t θ
Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                                July 27, 2010   11 / 27
Probabilistic Latent Sematic Analysis


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA   July 27, 2010   12 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (1)

           set of documents D = {d1 , . . . , dN }
           set of words W = {w1 , . . . , wM }
           set of unobserved classes Z = {z1 , . . . , zK }
           conditional independence assumption:

                                         P(di , wj |zk ) = P(di |zk )P(wj |zk )                                (3)

           so,
                                                              K
                                        P(wj |di ) =               P(zk |di )P(wj |zk )                        (4)
                                                             k=1
                                                                   K
                                   P(di , wj ) = P(di )                 P(wj |zk )P(zk |di )
                                                                  k=1
      More detail . . .



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                           July 27, 2010   13 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (2)

           n(di , wj ) – # word wj in doc. di
           Likelihood
                                     N                    N       M
                            L=            P(di ) =                     [P(di , wj )]n(di ,wj )
                                    i=1                 i=1 j=1

                                     N      M                 K                                  n(di ,wj )

                               =                  P(di )              P(wj |zk )P(zk |di )
                                    i=1 j=1                 k=1

           log-likelihood           = log(L)
                  N     M                                                               K
             =                 n(di , wj ) log P(di ) + n(di , wj ) log                     P(wj |zk )P(zk |di )
                 i=1 j=1                                                              k=1



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                                  July 27, 2010   14 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (3)
           maximize w.r.t P(wj |zk ), P(zk |di )
           ≈ maximize
                              N      M                          K
                                         n(di , wj ) log             P(wj |zk )P(zk |di )
                            i=1 j=1                           k=1
                                    N    M                           K
                                                                                      P(wj |zk )P(zk |di )
                            =                 n(di , wj ) log              Qk (zk )
                                                                                           Qk (zk )
                                   i=1 j=1                           k=1
                                    N    M                     K
                                                                                      P(wj |zk )P(zk |di )
                            ≥                 n(di , wj )            Qk (zk ) log
                                                                                           Qk (zk )
                                   i=1 j=1                   k=1

           choose
                                                     P(wj |zk )P(zk |di )
                                  Qk (zk ) =         K
                                                                                       = P(zk |di , wj )
                                                     l=1 P(wj |zl )P(zl |di )
              More detail . . .

Duc-Hieu Trantdh.net [at] gmail.com (NTU)                    EM in pLSA                             July 27, 2010   15 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (4)


           ≈ maximize (w.r.t P(wj |zk ), P(zk |di ))

                         N    M                      K
                                                                                P(wj |zk )P(zk |di )
                                    n(di , wj )           P(zk |di , wj ) log
                                                                                  P(zk |di , wj )
                       i=1 j=1                     k=1

           ≈ maximize
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   16 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (5)
   EM-algorithm
      E-step: update
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step: maximize w.r.t P(wj |zk ), P(zk |di )
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1

           subject to
                                            M
                                                  P(wj |zk ) = 1, k ∈ {1 . . . K }
                                            j=1
                                             K
                                                  P(zk |di ) = 1, i ∈ {1 . . . N}
                                            k=1
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   17 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (6)


   Solution of maximization problem in M-step:
                                                      N
                                                      i=1 n(di , wj )P(zk |di , wj )
                          P(wj |zk ) =           M      N
                                                 m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                 M
                                                 j=1 n(di , wj )P(zk |di , wj )
                          P(zk |di ) =
                                                               n(di )
                                M
   where, n(di ) =              j=1 n(di , wj )
      More detail . . .




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                  EM in pLSA               July 27, 2010   18 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (7)

   All together
           E-step:
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step:
                                                         N
                                                         i=1 n(di , wj )P(zk |di , wj )
                           P(wj |zk ) =             M      N
                                                    m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                    M
                                                    j=1 n(di , wj )P(zk |di , wj )
                            P(zk |di ) =
                                                                  n(di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   19 / 27
Reference


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA   July 27, 2010   20 / 27
Reference




           R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification,
           Wiley-Interscience, 2001.
           T. Hofmann, ”Unsupervised learning by probabilistic latent semantic
           analysis,” Machine Learning, vol. 42, 2001, p. 177–196.
           Course: ”Machine Learning CS229”, Andrew Ng, Stanford University




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA      July 27, 2010   21 / 27
Appendix

   Generative model for word/document co-occurence
       select a document di with probability (w.p) P(di )
       pick a latent class zk w.p P(zk |di )
       generate a word wj w.p P(wj |zk )
                                K                                K
            P(di , wj ) =           P(di , wj |zk )P(zk ) =            P(wj |zk )P(di |zk )P(zk )
                              k=1                                k=1
                                                                  K
                                                            =          P(wj |zk )P(zk |di )P(di )
                                                                 k=1
                                                                          K
                                                            = P(di )           P(wj |zk )P(zk |di )
                                                                         k=1
                                               P(di , wj ) = P(wj |di )P(di )
                                                                 K
                                            =⇒ P(wj |di ) =            P(zk |di )P(wj |zk )
                                                                 k=1

Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA                          July 27, 2010   22 / 27
Appendix




                                                        K
                                       P(wj |di ) =          P(zk |di )P(wj |zk )
                                                       k=1
                       K
           since       k=1 P(zk |di )       = 1, P(wj , di ) is convex combination of P(wj |zk )
           ≈ each document is modelled as a mixture of topics




                                                                                                    Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                     July 27, 2010     23 / 27
Appendix




                                               P(di , wj |zk )P(zk )
                               P(zk |di , wj ) =                                            (5)
                                                   P(di , wj )
                                               P(wj |zk )P(di |zk )P(zk )
                                             =                                              (6)
                                                       P(di , wj )
                                               P(wj |zk )P(zk |di )
                                             =                                              (7)
                                                   P(wj |di )
                                                 P(wj |zk )P(zk |di )
                                             = K                                            (8)
                                                 l=1 P(wj |zl )P(zl |di )

   From (5) to (6) by conditional independence assumption (3). From (7) to
   (8) by (4).                                                        Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)          EM in pLSA               July 27, 2010   24 / 27
Appendix




   Lagrange multipliers τk , ρi
                        N     M                    K
               H=                  n(di , wj )          P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                    k=1
                                                              
                        K                   M                        N              K
                   +         τk 1 −              P(wj |di ) +            ρi 1 −         P(zk |di )
                       k=1                  j=1                      i=1            k=1

                                                  N
                         ∂H                       i=1 P(zk |di , wj )n(di , wj )
                                   =                                                − τk = 0
                       ∂P(wj |zk )                        P(wj |zk )
                                                  M
                         ∂H                       j=1 n(di , wj )P(zk |di , wj )
                                   =                                                − ρi = 0
                       ∂P(zk |di )                        P(zk |di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                           July 27, 2010   25 / 27
Appendix




                M
   from         j=1 P(wj |zk )      =1

                                             M   N
                                   τk =               P(zk |di , wj )n(di , wj )
                                            j=1 i=1

                K
   from         k=1 P(zk |di , wj )         =1

                                                     ρi = n(di )

   =⇒ P(wj |zk ), P(zk |di )                                                                       Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                   July 27, 2010     26 / 27
Appendix


  Applying the Jensen’s inequality




           f (x) = log (x), concave function

                                    p(x (i) , z (i) ; θ)                      p(x (i) , z (i) ; θ)
                f    Ez (i) ∼Qi                              ≥ Ez (i) ∼Qi f
                                       Qi (z (i) )                               Qi (z (i) )

                                                                                                      Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                      July 27, 2010     27 / 27

Contenu connexe

Tendances

Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Lý thuyết độ phức tạp
Lý thuyết độ phức tạp Lý thuyết độ phức tạp
Lý thuyết độ phức tạp Tran Thom
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Vitaly Bondar
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers leopauly
 
Hàm biến phức , Nguyễn Thủy Thanh ( function )
Hàm biến phức , Nguyễn Thủy Thanh ( function )Hàm biến phức , Nguyễn Thủy Thanh ( function )
Hàm biến phức , Nguyễn Thủy Thanh ( function )Bui Loi
 
DeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social RepresentationsDeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social RepresentationsSOYEON KIM
 
Thuật toán Nhân Bình Phương - demo
Thuật toán Nhân Bình Phương - demoThuật toán Nhân Bình Phương - demo
Thuật toán Nhân Bình Phương - demoCông Thắng Trương
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models수철 박
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee
 

Tendances (20)

Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Nhom7 ltdpt
Nhom7  ltdptNhom7  ltdpt
Nhom7 ltdpt
 
Lý thuyết độ phức tạp
Lý thuyết độ phức tạp Lý thuyết độ phức tạp
Lý thuyết độ phức tạp
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)
 
Bài tập số phức
Bài tập số phứcBài tập số phức
Bài tập số phức
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
Hàm biến phức , Nguyễn Thủy Thanh ( function )
Hàm biến phức , Nguyễn Thủy Thanh ( function )Hàm biến phức , Nguyễn Thủy Thanh ( function )
Hàm biến phức , Nguyễn Thủy Thanh ( function )
 
DeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social RepresentationsDeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social Representations
 
Thuật toán Nhân Bình Phương - demo
Thuật toán Nhân Bình Phương - demoThuật toán Nhân Bình Phương - demo
Thuật toán Nhân Bình Phương - demo
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 

Similaire à EM algorithm and its application in probabilistic latent semantic analysis

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talkChristian Robert
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPK Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPer Kristian Lehre
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)NYversity
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataShadhin Rahman
 
Logics of the laplace transform
Logics of the laplace transformLogics of the laplace transform
Logics of the laplace transformTarun Gehlot
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Jae-kwang Kim
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011BigMC
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayesPhong Vo
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...Alexander Decker
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...Alexander Decker
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Eesti Pank
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuSEENET-MTP
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihoodBigMC
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 

Similaire à EM algorithm and its application in probabilistic latent semantic analysis (20)

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy data
 
Logics of the laplace transform
Logics of the laplace transformLogics of the laplace transform
Logics of the laplace transform
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...
 
sada_pres
sada_pressada_pres
sada_pres
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Image denoising
Image denoisingImage denoising
Image denoising
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 

Plus de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 

Plus de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 

Dernier

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

EM algorithm and its application in probabilistic latent semantic analysis

  • 1. EM algorithm and its application in Probabilistic Latent Semantic Analysis (pLSA) Duc-Hieu Tran tdh.net [at] gmail.com Nanyang Technological University July 27, 2010 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 1 / 27
  • 2. The parameter estimation problem Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 2 / 27
  • 3. The parameter estimation problem Introduction Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi ) =⇒ optimal classifier P(ωj |x) ∝ p(x|ωj )p(ωj ) decide ωi if p(ωi |x) > P(ωj |x), ∀j = i In practice, p(x|ωi ) is unknown – just estimated from training samples (e.g., assume p(x|ωi ) ∼ N (µi , Σi )). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 3 / 27
  • 4. The parameter estimation problem Frequentist vs. Bayesian schools Frequentist parameters – quantities whose values are fixed but unknown. the best estimate of their values – the one that maximizes the probability of obtaining the observed samples. Bayesian paramters – random variables having some known prior distribution. observation of the samples converts this to a posterior density; revising our opinion about the true values of the parameters. Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 4 / 27
  • 5. The parameter estimation problem Examples training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )} frequentist: maximum likelihood max p(y (i) |x (i) ; θ) θ i bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I) m P(θ|S) ∝ P(y (i) |x (i) , θ) .P(θ) i=1 θMAP = arg max P(θ|S) θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 5 / 27
  • 6. EM algorithm Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 6 / 27
  • 7. EM algorithm An estimation problem training set of m independent samples: {x (1) , x (2) , . . . , x (m) } goal: fit the paramters of a model p(x, z) to the data the likelihood: m m (i) (θ) = log p(x ; θ) = log p(x (i) , z; θ) i=1 i=1 z explicitly maximize (θ) might be difficult. z - laten random variable if z (i) were observed, then maximum likelihood estimation would be easy. strategy: repeatedly construct a lower-bound on (E-step) and optimize that lower-bound (M-step). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 7 / 27
  • 8. EM algorithm EM algorithm (1) digression: Jensen’s inequality. f – convex function; E [f (X )] ≥ f (E [X ]) for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0 (θ) = log p(x (i) ; θ) i = log p(x (i) , z (i) ; θ) i z (i) p(x (i) , z (i) ; θ) = log Qi (z (i) ) (1) i Qi (z (i) ) z (i) applying Jensen’s inequality, concave function log p(x (i) , z (i) ; θ) ≥ Qi (z (i) )log (2) i Qi (z (i) ) z (i) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 8 / 27
  • 9. EM algorithm EM algorithm (2) for any set of distribution Qi , formula (2) gives a lower-bound on (θ) how to choose Qi ? strategy: make the inequality hold with equality at our particular value of θ. require: p(x (i) , z (i) ; θ) =c Qi (z (i) ) c – constant not depend on z (i) choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ) we know z Qi (z (i) ) = 1, so p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) Qi (z (i) ) = = = p(z (i) |x (i) ; θ) z p(x (i) , z; θ) p(x (i) ; θ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 9 / 27
  • 10. EM algorithm EM algorithm (3) Qi – posterior distribution of z (i) given x (i) and the parameter θ EM algorithm: repeat until convergence E-step: for each i Qi (z (i) ) := p(z (i) |x (i) ; θ) M-step: p(x (i) , z (i) ; θ) θ := arg max Qi (z (i) ) log θ i Qi (z (i) ) z (i) The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 10 / 27
  • 11. EM algorithm EM algorithm (4) Digression: coordinate ascent algorithm. maxW (α1 , . . . αm ) α loop until converge: for i ∈ 1, . . . , m: αi = arg max W (α1 , . . . , αi , . . . , αm ) ˆ αi ˆ EM-algorithm as coordinate ascent algorithm p(x (i) , z (i) ; θ) J(Q, θ) = Qi (z (i) ) log i Qi (z (i) ) z (i) (θ) ≥ J(Q, θ) EM algorithm can be viewed as coordinate ascent on J E-step: maximize w.r.t Q M-step: maximize w.r.t θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 11 / 27
  • 12. Probabilistic Latent Sematic Analysis Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 12 / 27
  • 13. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (1) set of documents D = {d1 , . . . , dN } set of words W = {w1 , . . . , wM } set of unobserved classes Z = {z1 , . . . , zK } conditional independence assumption: P(di , wj |zk ) = P(di |zk )P(wj |zk ) (3) so, K P(wj |di ) = P(zk |di )P(wj |zk ) (4) k=1 K P(di , wj ) = P(di ) P(wj |zk )P(zk |di ) k=1 More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 13 / 27
  • 14. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (2) n(di , wj ) – # word wj in doc. di Likelihood N N M L= P(di ) = [P(di , wj )]n(di ,wj ) i=1 i=1 j=1 N M K n(di ,wj ) = P(di ) P(wj |zk )P(zk |di ) i=1 j=1 k=1 log-likelihood = log(L) N M K = n(di , wj ) log P(di ) + n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 14 / 27
  • 15. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (3) maximize w.r.t P(wj |zk ), P(zk |di ) ≈ maximize N M K n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) = n(di , wj ) log Qk (zk ) Qk (zk ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) ≥ n(di , wj ) Qk (zk ) log Qk (zk ) i=1 j=1 k=1 choose P(wj |zk )P(zk |di ) Qk (zk ) = K = P(zk |di , wj ) l=1 P(wj |zl )P(zl |di ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 15 / 27
  • 16. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (4) ≈ maximize (w.r.t P(wj |zk ), P(zk |di )) N M K P(wj |zk )P(zk |di ) n(di , wj ) P(zk |di , wj ) log P(zk |di , wj ) i=1 j=1 k=1 ≈ maximize N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 16 / 27
  • 17. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (5) EM-algorithm E-step: update P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: maximize w.r.t P(wj |zk ), P(zk |di ) N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 subject to M P(wj |zk ) = 1, k ∈ {1 . . . K } j=1 K P(zk |di ) = 1, i ∈ {1 . . . N} k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 17 / 27
  • 18. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (6) Solution of maximization problem in M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) M where, n(di ) = j=1 n(di , wj ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 18 / 27
  • 19. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (7) All together E-step: P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 19 / 27
  • 20. Reference Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 20 / 27
  • 21. Reference R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, Wiley-Interscience, 2001. T. Hofmann, ”Unsupervised learning by probabilistic latent semantic analysis,” Machine Learning, vol. 42, 2001, p. 177–196. Course: ”Machine Learning CS229”, Andrew Ng, Stanford University Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 21 / 27
  • 22. Appendix Generative model for word/document co-occurence select a document di with probability (w.p) P(di ) pick a latent class zk w.p P(zk |di ) generate a word wj w.p P(wj |zk ) K K P(di , wj ) = P(di , wj |zk )P(zk ) = P(wj |zk )P(di |zk )P(zk ) k=1 k=1 K = P(wj |zk )P(zk |di )P(di ) k=1 K = P(di ) P(wj |zk )P(zk |di ) k=1 P(di , wj ) = P(wj |di )P(di ) K =⇒ P(wj |di ) = P(zk |di )P(wj |zk ) k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 22 / 27
  • 23. Appendix K P(wj |di ) = P(zk |di )P(wj |zk ) k=1 K since k=1 P(zk |di ) = 1, P(wj , di ) is convex combination of P(wj |zk ) ≈ each document is modelled as a mixture of topics Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 23 / 27
  • 24. Appendix P(di , wj |zk )P(zk ) P(zk |di , wj ) = (5) P(di , wj ) P(wj |zk )P(di |zk )P(zk ) = (6) P(di , wj ) P(wj |zk )P(zk |di ) = (7) P(wj |di ) P(wj |zk )P(zk |di ) = K (8) l=1 P(wj |zl )P(zl |di ) From (5) to (6) by conditional independence assumption (3). From (7) to (8) by (4). Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 24 / 27
  • 25. Appendix Lagrange multipliers τk , ρi N M K H= n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1   K M N K + τk 1 − P(wj |di ) + ρi 1 − P(zk |di ) k=1 j=1 i=1 k=1 N ∂H i=1 P(zk |di , wj )n(di , wj ) = − τk = 0 ∂P(wj |zk ) P(wj |zk ) M ∂H j=1 n(di , wj )P(zk |di , wj ) = − ρi = 0 ∂P(zk |di ) P(zk |di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 25 / 27
  • 26. Appendix M from j=1 P(wj |zk ) =1 M N τk = P(zk |di , wj )n(di , wj ) j=1 i=1 K from k=1 P(zk |di , wj ) =1 ρi = n(di ) =⇒ P(wj |zk ), P(zk |di ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 26 / 27
  • 27. Appendix Applying the Jensen’s inequality f (x) = log (x), concave function p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) f Ez (i) ∼Qi ≥ Ez (i) ∼Qi f Qi (z (i) ) Qi (z (i) ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 27 / 27