SlideShare une entreprise Scribd logo
1  sur  179
Télécharger pour lire hors ligne
Graphical Models for the Internet
        Alexander Smola & Amr Ahmed

 Yahoo! Research & Australian National University
                Santa Clara, CA
              alex@smola.org blog.smola.org
Outline
• Part 1 - Motivation
  •   Automatic information extraction
  •   Application areas
• Part 2 - Basic Tools
  •   Density estimation / conjugate distributions
  •   Directed Graphical models and inference
• Part 3 - Topic Models (our workhorse)
  •   Statistical model
  •   Large scale inference (parallelization, particle filters)
• Part 4 - Advanced Modeling
  •   Temporal dependence
  •   Mixing clustering and topic models
  •   Social Networks
  •   Language models
Part 1 - Motivation
Data on the Internet
•   Webpages (content, graph)
•   Clicks (ad, page, social)                          Finite resources
•   Users (OpenID, FB Connect)
•   e-mails (Hotmail, Y!Mail, Gmail)
                                                         •   Editors are expensive
•   Photos, Movies (Flickr, YouTube, Vimeo ...)          •   Editors don’t know users
•
•
    Cookies / tracking info (see Ghostery)
    Installed apps (Android market etc.)      unlimited amounts
                                                      • Barrier to i18n
                                                      • Abuse (intrusions are novel)
                                                   of • Implicit feedback
                                                      data
•   Location (Latitude, Loopt, Foursquared)
•   User generated content (Wikipedia & co)
•   Ads (display, text, DoubleClick, Yahoo)
•   Comments (Disqus, Facebook)                          •   Data analysis (find interesting stuff
•   Reviews (Yelp, Y!Local)
                                                             rather than find x)
•   Third party features (e.g. Experian)
•   Social connections (LinkedIn, Facebook)              •   Integrating many systems
•   Purchase decisions (Netflix, Amazon)
                                                         •   Modular design for data integration
•   Instant Messages (YIM, Skype, Gtalk)
•   Search terms (Google, Bing)                          •   Integrate with given prediction tasks
•   Timestamp (everything)
•   News articles (BBC, NYTimes, Y!News)               Invest in modeling and naming
•
•
    Blog posts (Tumblr, Wordpress)
    Microblogs (Twitter, Jaiku, Meme)
                                                       rather than data generation
Clustering documents
Clustering documents

   airline

                          university




             restaurant
Today’s mission


Find hidden structure in the data
         Human understandable
    Improved knowledge for estimation
Some applications
Hierarchical Clustering




                   NIPS 2010
                   Adams,
                   Ghahramani,
                   Jordan
Topics in text




Latent Dirichlet Allocation; Blei, Ng, Jordan, JMLR 2003
Word segmentation




Mochihashi, Yamada, Ueda, ACL 2009
Language model
      automatically synthesized
      from Penn Treebank


      Mochihashi, Yamada, Ueda
      ACL 2009
User model over time
0.5                                                                                                          Baseball
                                                                     0.3
0.4   Dating




                                                         Propotion
                                             Baseball
0.3                                                                  0.2                    Finance


0.2            Celebrity                                                                                         Jobs
                                                                     0.1
0.1                                                                                                                        Dating

           Health
 0                                                                    0
  0   10                 20      30                     40             0               10                  20       30              40
           Dating        DayBaseball             Celebrity                 Health             Jobs         DayFinance
                               League             Snooki
            women                                                            skin                job             financial  
                              baseball              Tom  
              men                                                            body              career            Thomson  
                             basketball,           Cruise
             dating                                                        fingers            business             chart  
                            doublehead             Katie
            singles                                                          cells            assistant             real  
                              Bergesen            Holmes  
           personals                                                         toes              hiring              Stock
                               Griffey            Pinkett
            seeking                                                        wrinkle           part-­‐time          Trading
                               bullpen            Kudrow
             match                                                          layers          receptionist         currency
                               Greinke           Hollywood


                            Ahmed et al., KDD 2011
Face recognition from captions




  Jain, Learned-Miller, McCallum, ICCV 2007
Storylines from news
                Ahmed et al,
                AISTATS 2011
Ideology detection




Ahmed et al, 2010; Bitterlemons collection
Hypertext topic extraction




   Gruber, Rosen-Zvi, Weiss; UAI 2008
Alternatives
Ontologies
          • continuous
            maintenance
          • no guarantee
            of coverage
          • difficult
            categories


      expensive, small
Face Classification

             • 100-1000
               people
             • 10k faces
             • curated
               (not realistic)
             • expensive to
               generate
Topic Detection & Tracking
                   • editorially
                     curated
                     training data
                   • expensive to
                     generate
                   • subjective in
                     selection of
                     threads
                   • language
                     specific
Advertising Targeting




• Needs training data in every language
• Is it really relevant for better ads?
• Does it cover relevant areas?
Challenges
•    Scale
    • Millions to billions of instances
        (documents, clicks, users, messages, ads)
    •   Rich structure of data (ontology, categories, tags)
  • Model description typically larger than memory of single workstation
• Modeling
    •   Usually clustering or topic models do not solve the problem
    •   Temporal structure of data
    •   Side information for variables

    • Solve problem. Don’t simply apply a model!
•   Inference
    •   10k-100k clusters for hierarchical model
    •   1M-100M words
    •   Communication is an issue for large state space
Summary - Part 1
• Essentially infinite amount of data
• Labeling is prohibitively expensive
• Not scalable for i18n
• Even for supervised problems unlabeled data
  abounds. Use it.
• User-understandable structure for
  representation purposes
• Solutions are often customized to problem
  We can only cover building blocks in tutorial.
Part 2 - Basic Tools
Statistics 101
Probability
• Space of events X
 • server status (working, slow, broken)
 • income of the user (e.g. $95,000)
 • search queries (e.g. “graphical models”)
• Probability axioms (Kolmogorov)
         Pr(X) ∈ [0, 1], Pr(X ) = 1
                     
         Pr(∪i Xi ) = i Pr(Xi ) if Xi ∩ Xj = ∅
• Example queries
 • P(server working) = 0.999
 • P(90,000  income  100,000) = 0.1
(In)dependence
• Independence    Pr(x, y) = Pr(x) · Pr(y)
 • Login behavior of two users (approximately)
 • Disk crash in different colos (approximately)
• Dependent events
 • Emails         Pr(x, y) = Pr(x) · Pr(y)
 • Queries
 • News stream / Buzz / Tweets
 • IM communication              Everywhere!

 • Russian Roulette
Independence



    0.3        0.2

    0.3        0.2
Dependence



   0.45      0.05

   0.05      0.45
A Graphical Model



  Spam                      Mail




p(spam, mail) = p(spam) p(mail|spam)
Bayes Rule

• Joint Probability
     Pr(X, Y ) = Pr(X|Y ) Pr(Y ) = Pr(Y |X) Pr(X)
• Bayes Rule
                        Pr(Y |X) · Pr(X)
             Pr(X|Y ) =
                             Pr(Y )
• Hypothesis testing
• Reverse conditioning
AIDS test (Bayes rule)
• Data
 • Approximately 0.1% are infected
 • Test detects all infections
 • Test reports positive for 1% healthy people
• Probability of having AIDS if test is positive
                  Pr(t|a = 1) · Pr(a = 1)
    Pr(a = 1|t) =
                            Pr(t)
                                 Pr(t|a = 1) · Pr(a = 1)
                =
                  Pr(t|a = 1) · Pr(a = 1) + Pr(t|a = 0) · Pr(a = 0)
                          1 · 0.001
                =                          = 0.091
                  1 · 0.001 + 0.01 · 0.999
Improving the diagnosis
• Use a follow-up test
 • Test 2 reports positive for 90% infections
 • Test 2 reports positive for 5% healthy people
                 0.01 · 0.05 · 0.999
                                             = 0.357
       1 · 0.9 · 0.001 + 0.01 · 0.05 · 0.999
• Why can’t we use Test 1 twice?
  Outcomes are not independent but tests 1 and
  2 are conditionally independent
              p(t1 , t2 |a) = p(t1 |a) · p(t2 |a)
Application: Naive Bayes
Naive Bayes Spam Filter
• Key assumption
  Words occur independently of each other
  given the label of the document
                            n          
          p(w1 , . . . , wn |spam) =         p(wi |spam)
                                       i=1
• Spam classification via Bayes Rule
                               n              
     p(spam|w1 , . . . , wn ) ∝ p(spam)             p(wi |spam)
• Parameter estimation
                                              i=1


  Compute spam probability and word
  distributions for spam and ham
A Graphical Model

            spam                                         spam
                                       how to estimate
                                         p(w|spam)




  w1           w2            ...        wn                wi

                             n
                                                          i=1..n
p(w1 , . . . , wn |spam) =         p(wi |spam)
                             i=1
Naive NaiveBayes Classifier
 • Two classes (spam/ham)
 • Binary features (e.g. presence of $$$, viagra)
 • Simplistic Algorithm
   • Count occurrences of feature for spam/ham
   • Count number of spam/ham mails
                                             spam probability
feature probability
                             n(i, y)            n(y)
            p(xi = TRUE|y) =         and p(y) =
                              n(y)                n
                 n(y)       n(i, y)          n(y) − n(i, y)
       p(y|x) ∝
                  n           n(y)                 n(y)
                      i:xi =TRUE     i:xi =FALSE
Naive NaiveBayes Classifier


                                         what if n(i,y)=n(y)?

    what if n(i,y)=0?


           n(y)               n(i, y)                n(y) − n(i, y)
  p(y|x) ∝
            n                   n(y)                       n(y)
                  i:xi =TRUE             i:xi =FALSE
Estimating Probabilities
Two outcomes (binomial)
• Example: probability of ‘viagra’ in spam/ham
• Data likelihood
               p(X; π) = π n1 (1 − π)n0
• Maximum Likelihood Estimation
 • Constraint π ∈ [0, 1]
 • Taking derivatives yields
                       n1
                 π=
                    n0 + n1
n outcomes (multinomial)
• Example: USA, Canada, India, UK, NZ
• Data likelihood               
                                     ni
                    p(X; π) =       πi
                                i
• Maximum Likelihood Estimation           
 • Constrained optimization problem           πi = 1
                                          i
 • Using log-transform yields
                            ni
                      πi = 
                             j nj
Tossing a Dice

12         24




60         120
Conjugate Priors
• Unless we have lots of data estimates are weak
• Usually we have an idea of what to expect
                         p(θ|X) ∝ p(X|θ) · p(θ)
  we might even have ‘seen’ such data before
• Solution: add ‘fake’ observations
  p(θ) ∝ p(Xfake |θ) hence p(θ|X) ∝ p(X|θ)p(Xfake |θ) = p(X ∪ Xfake |θ)

• Inference (generalized Laplace smoothing)
     n                   n
  1                1                m     fake count
        φ(xi ) −→           φ(xi ) +     µ0
  n i=1           n + m i=1          n+m
                                                            fake mean
Conjugate Prior in action
                                                  mi = m · [µ0 ]i
• Discrete Distribution
                    ni               ni + mi
         p(x = i) =    −→ p(x = i) =
                    n                 n+m
• Tossing a dice
   Outcome             1      2      3      4      5      6
   Counts              3      6      2      1      4      4
   MLE              0.15   0.30   0.10   0.05   0.20   0.20
   MAP (m0 = 6)     0.15   0.27   0.12   0.08   0.19   0.19
   MAP (m0 = 100)   0.16   0.19   0.16   0.15   0.17   0.17

• Rule of thumb
  need 10 data points (or prior) per parameter
Honest dice

MLE




MAP
Tainted dice

MLE




MAP
Exponential Families
Exponential Families
• Density function
             p(x; θ) = exp (φ(x), θ − g(θ))
                           
          where g(θ) = log    exp (φ(x ), θ)
                                        

                              x

• Log partition function generates cumulants
                  ∂θ g(θ) = E [φ(x)]
                   2
                  ∂θ g(θ)   = Var [φ(x)]
• g is convex (second derivative is p.s.d.)
Examples

• Binomial Distribution            φ(x) = x

• Discrete Distribution           φ(x) = ex
                                               
  (ex is unit vector for x)                 1 
                                  φ(x) = x, xx
• Gaussian                                  2

• Poisson (counting measure 1/x!) φ(x) = x
• Dirichlet, Beta, Gamma, Wishart, ...
Normal Distribution
Poisson Distribution
              λx e−λ
    p(x; λ) =
                x!
Beta Distribution




                         xα−1 (1 − x)β−1
            p(x; α, β) =
                            B(α, β)
Dirichlet Distribution




... this is a distribution over distributions ...
Maximum Likelihood
• Negative log-likelihood
                            n
                            
          − log p(X; θ) =         g(θ) − φ(xi ), θ
                            i=1

                                                       empirical
                                      mean             average
• Taking derivatives                                   
                                      n
                                    1
      −∂θ log p(X; θ) = m E[φ(x)] −       φ(xi )
                                    m i=1

  We pick the parameter such that the
  distribution matches the empirical average.
Example: Gaussian Estimation
• Sufficient statistics: x, x       2


• Mean and variance given by
             µ = Ex [x] and σ 2 = Ex [x2 ] − E2 [x]
                                              x

• Maximum Likelihood Estimate
                n               n
              1             2  1      2     2
           µ=
           ˆ        xi and σ =       xi − µ
                                          ˆ
              n i=1            n i=1

• Maximum a Posteriori Estimate                   smoother
                 n
                                          n
                                           
         1                     2  1               2       n0       2
    µ=
    ˆ                xi and σ =                  xi   +        1−µ
                                                                 ˆ
       n + n0    i=1
                                n + n0     i=1
                                                        n + n0
smoother
Collapsing
  • Conjugate priors
                       p(θ) ∝ p(Xfake |θ)
    Hence we know how to compute normalization
                     
  • Prediction p(x|X) =             p(x|θ)p(θ|X)dθ
                                
    (Beta, binomial)        ∝       p(x|θ)p(X|θ)p(Xfake |θ)dθ
(Dirichlet, multinomial)        
  (Gamma, Poisson)          =       p({x} ∪ X ∪ Xfake |θ)dθ
   (Wishart, Gauss)                    look up closed
                                      form expansions

           http://en.wikipedia.org/wiki/Exponential_family
Directed Graphical Models
... some Web 2.0 service
                 MySQL             Apache




                         Website



• Joint distribution (assume a and m are independent)
                   p(m, a, w) = p(w|m, a)p(m)p(a)
• Explaining away
                              p(w|m, a)p(m)p(a)
        p(m, a|w) = 
                            ,a p(w|m , a )p(m )p(a )
                                                 
                         m
  a and m are dependent conditioned on w
... some Web 2.0 service
                 MySQL             Apache




                         Website

     is broken                              is working


  At least one of the
                                     MySQL is working
two services is broken
                                     Apache is working
  (not independent)
Directed graphical model
    m       a         m       a      m         a


        w                 w               w


                                                user
• Easier estimation                       u
                                               action
 • 15 parameters for full joint distribution
 • 1+1+3+1 for factorizing distribution
• Causal relations
• Inference for unobserved variables
No loops allowed

           p(c|e)p(e|c)




           p(c|e)p(e) or p(e|c)p(c)
Directed Graphical Model
• Joint probability distribution
               
      p(x) =       p(xi |xparents(i) )
               i

• Parameter estimation
 • If x is fully observed the likelihood breaks up
                      
           log p(x|θ) =           log p(xi |xparents(i) , θ)
                              i
 • If x is partially observed things get interesting
   maximization, EM, variational, sampling ...
Clustering
Density Estimation                                                 θ
                      n
                      
    p(x, θ) = p(θ)          p(xi |θ)
                      i=1                                          x
Clustering            K              n
                                                                 θ
  p(x, y, θ) = p(π)         p(θk )         p(yi |π)p(xi |θ, yi )
                      k=1            i=1

                                                                   y

                                                                   x
Chains
Markov Chain                                                    Plate

 past      past      present   future    future




Hidden Markov Chain                                 user’s
                                                   mindset




                                                   observed
                                                  user action
user model for traversal through search results
Chains
Markov Chain                                                        Plate
                         n−1
                         
   p(x; θ) = p(x0 ; θ)         p(xi+1 |xi ; θ)
                         i=1

Hidden Markov Chain                                     user’s
                                                       mindset
                         n−1
                                                n
                                                 
p(x, y; θ) = p(x0 ; θ)         p(xi+1 |xi ; θ)         p(yi |xi )
                         i=1                     i=1

                                                   observed
                                                  user action
user model for traversal through search results
Factor Graphs
                                     Latent Factors


                                         Observed
                                          Effects

• Observed effects
  Click behavior, queries, watched news, emails
• Latent factors
  User profile, news content, hot keywords, social
  connectivity graph, events
Recommender Systems
      news,
  SearchMonkey
     answers      u          m
      social
     ranking
      OMG               r    ... intersecting plates ...
    personals                 (like nested for loops)

• Users u
• Movies m
• Ratings r (but only for a subset of users)
Challenges
                                    domain
• How to design models              expert
 • Common (engineering) sense
 • Computational tractability
• Inference                       statistics
 • Easy for fully observed situations
 • Many algorithms if not fully observed
 • Dynamic programming / message passing
Summary - Part 2

• Probability theory to estimate events
• Conjugate priors and Laplace smoothing
• Conjugate = phantasy data
• Collapsing
• Laplace smoothing
• Directed graphical models
Part 3 - Clustering  Topic Models
Inference Algorithms
Clustering
Density Estimation             log-concave                         θ
                      n
                      
    p(x, θ) = p(θ)          p(xi |θ)
                      i=1                        find θ             x
Clustering            K              n
                                                                 θ
  p(x, y, θ) = p(π)         p(θk )         p(yi |π)p(xi |θ, yi )
                      k=1            i=1

                            general nonlinear                      y

                                                                   x
Clustering
• Optimization problem
             
  maximize       p(x, y, θ)
      θ
             y
                        K
                                            n
                                                        
  maximize log p(π) +         log p(θk ) +         log           [p(yi |π)p(xi |θ, yi )]
      θ
                        k=1                  i=1         yi ∈Y
• Options
 • Direct nonconvex optimization (e.g. BFGS)
 • Sampling (draw from the joint distribution)
 • Variational approximation
   (concave lower bounds aka EM algorithm)
Clustering
• Integrate out y          θ              • Integrate out θ
        θ                                                  Y
                          y
        x                                                  x
                          x
• Nonconvex                               • Y is coupled
  optimization                            • Sampling
  problem
                                          • Collapsed p
• EM algorithm      p(y|x) ∝ p({x} | {xi : yi = y} ∪ Xfake )p(y|Y ∪ Yfake )
Gibbs sampling
• Sampling:
  Draw an instance x from distribution p(x)
• Gibbs sampling:
 • In most cases direct sampling not possible
 • Draw one set of variables at a time
                                (b,g)   - draw   p(.,g)
                                (g,g)   - draw   p(g,.)
             0.45     0.05      (g,g)   - draw   p(.,g)
                                (b,g)   - draw   p(b,.)
             0.05     0.45      (b,b)   ...
Gibbs sampling for clustering
Gibbs sampling for clustering




   random
initialization
Gibbs sampling for clustering




   sample
cluster labels
Gibbs sampling for clustering




  resample
cluster model
Gibbs sampling for clustering




  resample
cluster labels
Gibbs sampling for clustering




  resample
cluster model
Gibbs sampling for clustering




  resample
cluster labels
Gibbs sampling for clustering




  resample
cluster model e.g. Mahout Dirichlet Process Clustering
Inference Algorithm ≠ Model
     Corollary: EM ≠ Clustering
Topic models
Grouping objects


      Singapore
Grouping objects

 airline

                        university




           restaurant
Grouping objects
                       Australia
USA




               Singapore
Topic Models
                Australia
                             Singapore
                university
 USA                           airline
airline


                Singapore
                university

     USA                      Singapore
     food                        food
Clustering  Topic Models
   Clustering          Topics




        ?




  group objects   decompose objects
  by prototypes     into prototypes
Clustering  Topic Models
 clustering               Latent Dirichlet Allocation

     α        prior                   α       prior

                cluster                          topic
     θ        probability             θ       probability

                cluster
     y           label
                                      y       topic label


               instance                        instance
     x                                x
Clustering  Topic Models


   Cluster/
     topic       x   membership       =    Documents
 distributions




                 clustering: (0, 1) matrix
                 topic model: stochastic matrix
                 LSI: arbitrary matrices
Topics in text




Latent Dirichlet Allocation; Blei, Ng, Jordan, JMLR 2003
Collapsed Gibbs Sampler
Joint Probability Distribution
   sample Ψ
independently                          sample θ           slo
        p(θ, z, ψ, x|α, β)          independently            w
        K
                         m
                                                   α
    =          p(ψk |β)         p(θi |α)
        k=1               i=1                                 topic
        m,mi
                                                   θi     probability
                p(zij |θi )p(xij |zij , ψ)
         i,j
                     sample z                       zij    topic label
                  independently
                                                            instance
 language prior                      β       ψk     xij
Collapsed Sampler
    p(z, x|α, β)
                                                        fa
    m
                    k
                                                         st
=         p(zi |α)         p({xij |zij = k} |β)   α
    i=1              k=1

                                                            topic
               sample z                           θi     probability
             sequentially


                                                  zij    topic label


                                                             instance
language prior                      β       ψk    xij
Collapsed Sampler
                            Griffiths  Steyvers, 2005
    p(z, x|α, β)
                                                             fa
    m
                    k
                                                              st
=         p(zi |α)         p({xij |zij = k} |β)         α
    i=1              k=1

                                                                 topic
    −ij
                                                        θi    probability
n (t, d) + αt                n−ij (t, w) + βt
                                      
n−i (d) + t αt               n−i (t) + t βt
                                                    zij       topic label


                                                                  instance
language prior                      β       ψk      xij
Sequential Algorithm
• Collapsed Gibbs Sampler
 • For 1000 iterations do
   • For each document do
     • For each word in the document do
       • Resample topic for the word
       • Update local (document, topic) table
       • Update global (word, topic) table

               this kills parallelism
State of the art
    UMass Mallet, UC Irvine, Google
• For 1000 iterations do
                                                                   table out
 • For each document do
                                                                    of sync
   • For each word in the document do
      • Resample topic for the word                                 memory
      • Update local (document, topic) table                       inefficient
      • Update CPU local (word, topic) table                        blocking
 • Update global (word, topic) table
                                                                    network
                                                                     bound
 changes rapidly
                   αt        n(t, d = i) n(t, w = wij ) [n(t, d = i) + αt ]
 p(t|wij ) ∝ βw        ¯ + βw n(t) + β +
                                     ¯                       ¯
                n(t) + β                           n(t) + β
   slow                  moderately fast
Our Approach
• For 1000 iterations do (independently per computer)
  • For each thread/core do
    • For each document do
      • For each word in the document do
        • Resample topic for the word
        • Update local (document, topic) table
        • Generate computer local (word, topic) message
    • In parallel update local (word, topic) table
  • In parallel update global (word, topic) table
    network       memory       table out
                                              blocking
     bound       inefficient     of sync

   concurrent     minimal      continuous     barrier
  cpu hdd net      view           sync         free
Architecture details
Multicore Architecture
                                                 Intel Threading Building Blocks
tokens
                     sampler
                       sampler                    diagnostics
            file                         count                    output to
                         sampler                                            topics
          combiner         sampler     updater                      file
                                                  optimization
                             sampler
topics

                       joint state table
• Decouple multithreaded sampling and updating
  (almost) avoids stalling for locks in the sampler
• Joint state table
 • much less memory required
 • samplers syncronized (10 docs vs. millions delay)
• Hyperparameter update via stochastic gradient descent
• No need to keep documents in memory (streaming)
Cluster Architecture
 sampler     sampler         sampler   sampler


                       ice

• Distributed (key,value) storage via memcached
• Background asynchronous synchronization
 • single word at a time to avoid deadlocks
 • no need to have joint dictionary
 • uses disk, network, cpu simultaneously
Cluster Architecture
 sampler     sampler     sampler     sampler


   ice          ice         ice         ice

• Distributed (key,value) storage via ICE
• Background asynchronous synchronization
 • single word at a time to avoid deadlocks
 • no need to have joint dictionary
 • uses disk, network, cpu simultaneously
Making it work
• Startup
 • Randomly initialize topics on each node
    (read from disk if already assigned - hotstart)
 • Sequential Monte Carlo for startup much faster
 • Aggregate changes on the fly
• Failover
 • State constantly being written to disk
    (worst case we lose 1 iteration out of 1000)
 • Restart via standard startup routine
• Achilles heel: need to restart from checkpoint if even
  a single machine dies.
Easily extensible
• Better language model (topical n-grams)
  can process millions of users (vs 1000s)
• Conditioning on side information (upstream)
  estimate topic based on authorship, source,
  joint user model ...
• Conditioning on dictionaries (downstream)
  integrate topics between different languages
• Time dependent sampler for user model
  approximate inference per episode
Google
                          Mallet      Irvine’08    Irvine’09 Yahoo LDA
              LDA

Multicore      no           yes          yes          yes          yes


 Cluster       MPI          no           MPI      point 2 point memcached


            dictionary   separate                                 joint
State table                           separate     separate
               split      sparse                                 sparse
                                                  asynchronous
            synchronous synchronous   synchronous              asynchronous
Schedule                                          approximate
               exact       exact         exact                     exact
                                                    messages
Speed
• 1M documents per day on 1 computer
  (1000 topics per doc, 1000 words per doc)
• 350k documents per day per node
  (context switches  memcached  stray reducers)
• 8 Million docs (Pubmed)
  (sampler does not burn in well - too short doc)
 • Irvine: 128 machines, 10 hours
 • Yahoo: 1 machine, 11 days
 • Yahoo: 20 machines, 9 hours
• 20 Million docs (Yahoo! News Articles)
 • Yahoo: 100 machines, 12 hours
Scalability
                     200k documents/computer
40


30


20


10


 0                                                                    CPUs
     1         10               20                 50           100
          Runtime (hours)         Initial topics per word x10

              Likelihood even improves with parallelism!
         -3.295 (1 node) -3.288 (10 nodes) -3.287 (20 nodes)
The Competition
Dataset size (millions)                                    50k
 20                              50000
 15
                                         Throughput/h
 10
  5                              37500
  0
       Google   Irvine   Yahoo
                                 25000
       Cluster size
130
97.5                             12500
 65                                               6.4k
32.5
                                          150
  0                                  0
       Google   Irvine   Yahoo           Google   Irvine   Yahoo
Design Principles
Variable Replication
• Global shared variable
                                                 computer

    x       y       z           x       y        y’         z

                           synchronize         local copy
• Make local copy
  • Distributed (key,value) storage table for global copy
  • Do all bookkeeping locally (store old versions)
  • Sync local copies asynchronously using message passing
    (no global locks are needed)
• This is an approximation!
Asymmetric Message Passing
• Large global shared state space
  (essentially as large as the memory in computer)
• Distribute global copy over several machines
  (distributed key,value storage)
                                       global state




                               current copy
            old copy
Out of core storage
• Very large state space
                                           x             y                  z

• Gibbs sampling requires us to traverse the data sequentially many
  times (think 1000x)
• Stream local data from disk and update coupling variable each
  time local data is accessed
• This is exact

    tokens
                         sampler
                           sampler                           diagnostics
                file                             count                           output to
                             sampler                                                       topics
              combiner         sampler         updater                             file
                                                             optimization
                                 sampler
    topics
Summary - Part 3

• Inference in graphical models
• Clustering
• Topic models
• Sampling
• Implementation details
Part 4 - Advanced Modeling
Chinese Restaurant Process

      φ1   φ2   φ3
Problem
• How many clusters should we pick?
• How about a prior for infinitely many clusters?
• Finite model
                               n(y) + αy
                  p(y|Y, α) =     
                              n + y  αy 

• Infinite model
  Assume that the total smoother weight is constant
                       n(y)                             α
    p(y|Y, α) =                     and p(new|Y, α) =
                  n+     y   αy                      n+α
Chinese Restaurant Metaphor


                                       φ1                                                   φ2                 φ3




                                                                                                              the rich get richer
      GeneraBve	
  Process

      -­‐For	
  data	
  point	
  xi	
  
                -­‐	
  Choose	
  table	
  j	
  ∝	
  mj	
  	
  	
  	
  and	
  	
  Sample	
  xi	
  ~	
  f(φj)
                -­‐	
  Choose	
  a	
  new	
  table	
  	
  K+1	
  ∝	
  α	
  
                          -­‐	
  Sample	
  φK+1	
  ~	
  G0	
  	
  	
  and	
  Sample	
  xi	
  ~	
  f(φK+1)



Pitman; Antoniak; Ishwaran; Jordan et al.; Teh et al.;
Evolutionary Clustering


• Time series of objects, e.g. news stories
• Stories appear / disappear
• Want to keep track of clusters automatically
Recurrent Chinese Restaurant Process

                                 T=1


  φ1,1       φ2,1      φ3,1



  m'1,1=2   m'2,1=3   m'3,1=1
                                  T=2
Recurrent Chinese Restaurant Process

                                 T=1


  φ1,1       φ2,1      φ3,1



  m'1,1=2   m'2,1=3   m'3,1=1
                                  T=2
   φ1,1       φ2,1     φ3,1
Recurrent Chinese Restaurant Process

                                 T=1


  φ1,1       φ2,1      φ3,1



  m'1,1=2   m'2,1=3   m'3,1=1
                                  T=2
   φ1,1       φ2,1     φ3,1
Recurrent Chinese Restaurant Process

                                                          T=1


  φ1,1                              φ2,1         φ3,1



  m'1,1=2                         m'2,1=3       m'3,1=1
                                                           T=2
   φ1,1                              φ2,1        φ3,1




            Sample	
  φ1,2	
  ~	
  P(.| φ1,1)
Recurrent Chinese Restaurant Process

                                 T=1


  φ1,1       φ2,1      φ3,1



  m'1,1=2   m'2,1=3   m'3,1=1
                                  T=2
   φ1,1       φ2,1     φ3,1
Recurrent Chinese Restaurant Process

                                              T=1


  φ1,1          φ2,1       φ3,1



  m'1,1=2      m'2,1=3     m'3,1=1
                                               T=2
   φ1,2          φ2,2       φ3,1      φ4,2



            dead cluster             new cluster
Longer History
          φ1,1     φ2,1     φ3,1

                                                     T=1




       m'1,1=2    m'2,1=3    m'3,1=1                 T=2
          φ1,2      φ2,2      φ3,1     φ4,2




                 m'2,3

                                                     T=3
φ1,2             φ2,2                         φ4,2
TDPM Generative Power
 DPM

  W= ∞
  λ=∞




 TDPM

 W=4                Powerlaw
 λ = .4



Independent
   DPMs

  W= 0
λ = ? (any)

                               37
User modeling
                                   Baseball
             0.3
 Propotion




             0.2        Finance


                                        Jobs
             0.1
                                               Dating


              0
               0   10             20           30       40
                                  Day
Buying a camera




  show ads now    too late
          time
User modeling
   Problem	
  formulaBon
                     Movies
         Auto
Car                  Theatre
         Price
Deals                Art
         Used
                     gallery
van      inspecBon

                               Diet
         Hiring
job                            Calories
         Salary
Hiring                         Recipe
         Diet
diet                           chocolate
         calories


         Flight                School
         London                Supplies
         Hotel                 Loan
         weather               college
User modeling
   Problem	
  formulaBon
         CARS                            Art
                            Movies
         Auto
Car                         Theatre
         Price
Deals                       Art
         Used
                            gallery
van      inspecBon
         Jobs
                     Diet              Diet
         Hiring
job                                    Calories
         Salary
Hiring                                 Recipe
         Diet
diet                                   chocolate
         calories
         Travel
         Flight                        School      finance
         London              College   Supplies
         Hotel                         Loan
         weather                       college
User modeling
                            Problem	
  formulaBon
Input
• Queries	
  issued	
  by	
  the	
  user	
  or	
  Tags	
  of	
  watched	
  content
• Snippet	
  of	
  page	
  examined	
  by	
  user
• Time	
  stamp	
  of	
  each	
  acBon	
  (day	
  resoluBon)

Output

•	
  	
  Users’	
  daily	
  distribuBon	
  over	
  intents
•	
  	
  Dynamic	
  intent	
  representaBon
                                           Travel
                                           Flight                                         School     finance
                                           London                               College   Supplies
                                           Hotel                                          Loan
                                           weather                                        college
Time dependent models

• LDA for topical model of users where
 • User interest distribution changes over time
 • Topics change over time
• This is like a Kalman filter except that
 • Don’t know what to track (a priori)
 • Can’t afford a Rauch-Tung-Striebel smoother
 • Much more messy than plain LDA
Graphical Model
         α    αt−1      αt      αt+1
                                            time dependent
plain    θi    t−1
              θi         t
                        θi
                                 t+1
                                θi            user interest
LDA     zij             zij


        wij             wij               user actions

        φk    φt−1
               k
                        φt
                         k      φt+1
                                 k


                  t−1       t       t+1
                                          actions per topic
         β    β         β       β
All
                                                                                                                                μ3

                      month
                                                                                                                μ2


                                                             week

     Long-­‐term                                                                                                μ
                                     short-­‐term                                                                                               Prior	
  for	
  user	
  
                                                                                                                                               acBons	
  at	
  Bme	
  t

                                                                             food
Food             recipe           Part-­‐Bme                   Kelly         chicken
Chicken          job              Opening                      recipe        Pizza
pizza            hiring           salary                       cuisine       millage
                                                                                       	
  	
  	
  	
  	
  	
  	
  	
  t	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  t+1	
  	
  	
  	
  	
     Time
          Diet                Cars              Job                  Finance
           Recipe               Car                                        Bank
                                                   job	
  
          Chocolate            Blue                                       Online
                                                 Career
             Pizza            Book                                        Credit
                                                Business
            Food              Kelley                                       Card
                                                Assistant
           Chicken            Prices                                       debt	
  
                                                  Hiring
             Milk             Small                                      por_olio
                                                Part-­‐Bme
            Buaer             Speed                                      Finance
                                               RecepBonist
           Powder              large                                      Chase
At	
  0me	
  t                        At	
  0me	
  t+1
                                                                                                 Car          job	
       Bank
                                                                                Recipe         AlBma        Career       Online
                                                                               Chocolate       Accord      Business      Credit
                                                                                  Pizza         Blue       Assistant      Card
                                                                                 Food           Book        Hiring        debt	
  
                                                                                Chicken        Kelley     Part-­‐Bme    por_olio
                                                                                  Milk         Prices     RecepBoni     Finance
                                                                                 Buaer          Small          st        Chase
                                                                                Powder         Speed




                                                                                           short-­‐term
                                                                                             priors
Food	
  Chicken
Pizza	
  	
  mileage


                            GeneraBve	
  Process
                            •	
  For	
  each	
  user	
  interacBon
                                   •	
  Choose	
  an	
  intent	
  from	
  local	
  distribuBon
                                         • Sample	
  word	
  from	
  the	
  topic’s	
  word-­‐distribuBon	
  
Car	
  speed	
  offer               •Choose	
  a	
  new	
  intent	
  	
  ∝	
  α	
  
Camry	
  accord	
  career                • Sample	
  a	
  new	
  intent	
  from	
  the	
  global	
  distribuBon
                                             •	
  Sample	
  word	
  from	
  the	
  new	
  topic	
  word-­‐distribuBon	
  
At	
  0me	
  t   At	
  0me	
  t+1   At	
  0me	
  t+2   At	
  0me	
  t+3

Global      m
process     m'



            n
User	
  1   n'
process




User	
  2
process




User	
  3
process
Sample users
0.5                                                                                                              Baseball
                                                                           0.3
0.4   Dating




                                                               Propotion
                                                 Baseball
0.3                                                                        0.2                    Finance


0.2            Celebrity                                                                                             Jobs
                                                                           0.1
0.1                                                                                                                           Dating

            Health
 0                                                                          0
  0   10              20                    30              40               0               10             20              30         40
                      Day                                                                                   Day
      Dating                Baseball               Celebrity                     Health             Jobs              Finance
                              League                Snooki
       women                                                                       skin                job             financial  
                             baseball                 Tom  
         men                                                                       body              career            Thomson  
                            basketball,              Cruise
        dating                                                                   fingers            business             chart  
                           doublehead                Katie
       singles                                                                     cells            assistant             real  
                             Bergesen               Holmes  
      personals                                                                    toes              hiring              Stock
                              Griffey               Pinkett
       seeking                                                                   wrinkle           part-­‐time          Trading
                              bullpen               Kudrow
        match                                                                     layers          receptionist         currency
                              Greinke              Hollywood
Datasets
Data
ROC score improvement
ROC score improvement
                                                           Dataset−2

                                                                                         baseline
  62                                                                                     TLDA
                                                                                         TLDA+Baseline


  60



  58



  56



  54



  52



  50
                       ]




                                                                             ]



                                                                                     ]
                                                                     ]
                                  ]



                                             ]



                                                        ]
       0




                                                                                          0
                  00




                                                                          40



                                                                                  20
                                                                 60
                                00



                                           00



                                                      00
   00




                                                                                         2
                  ,6




                                                                         0,



                                                                                 0,
                                                                0,
                                ,4



                                           ,2



                                                      ,1
  1


             00




                                                                         [6



                                                                                 [4
                                                                 0
                           00



                                      00



                                                 00


                                                              [1
              0


                           [6



                                      [4



                                                 [2
           [1
LDA for user profiling
   Sample  Z          Sample  Z          Sample  Z         Sample  Z
   For  users         For  users         For  users        For  users


 Write  counts      Write  counts      Write  counts     Write  counts  
      to                 to                 to                to  
 memcached          memcached          memcached         memcached

                                      Barrier

Collect  counts  
                     Do  nothing        Do  nothing       Do  nothing
and  sample  


                                      Barrier

Read   from         Read   from        Read   from       Read   from  
memcached           memcached          memcached         memcached
News
News Stream
News Stream
• Over 1 high quality news article per second
• Multiple sources (Reuters, AP, CNN, ...)
• Same story from multiple sources
• Stories are related


• Goals
 • Aggregate articles into a storyline
 • Analyze the storyline (topics, entities)
Clustering / RCRP
     • Assume active story
       distribution at time t
     • Draw story indicator
     • Draw words from story
       distribution
     • Down-weight story counts for
       next day

       Ahmed  Xing, 2008
Clustering / RCRP
• Pro
 • Nonparametric model of story generation
   (no need to model frequency of stories)
 • No fixed number of stories
 • Efficient inference via collapsed sampler
• Con
 • We learn nothing!
 • No content analysis
Latent Dirichlet Allocation
       • Generate topic distribution
         per article
       • Draw topics per word from
         topic distribution
       • Draw words from topic specific
         word distribution

         Blei, Ng, Jordan, 2003
Latent Dirichlet Allocation

• Pro
 • Topical analysis of stories
 • Topical analysis of words (meaning, saliency)
 • More documents improve estimates
• Con
 • No clustering
More Issues
• Named entities are special, topics less
  (e.g. Tiger Woods and his mistresses)
• Some stories are strange
  (topical mixture is not enough - dirty models)
• Articles deviate from general story
  (Hierarchical DP)
Storylines
Amr Ahmed, Quirong Ho, Jake Eisenstein,
   Alex Smola, Choon Hui Teo, 2011
Storylines Model
          • Topic model
          • Topics per cluster
          • RCRP for cluster
          • Hierarchical DP for
            article
          • Separate model
            for named entities
          • Story specific
            correction
Storylines Model




                  High-level
Tightly-focused
                  concepts



                                 46
The	
  Graphical	
  Model
                Storylines Model




Tightly-­‐focused    High-­‐level	
  concepts
The	
  Graphical	
  Model
                  Storylines Model




Each	
  story	
  has:
•DistribuBon	
  over	
  words
•DistribuBon	
  over	
  topics
•DistribuBon	
  over	
  named	
  enBtes	
  
The	
  Graphical	
  Model
                         Storylines Model




•	
  Document’s	
  topic	
  mix	
  is	
  sampled	
  from	
  its	
  story	
  prior
•	
  Words	
  inside	
  a	
  document	
  either	
  global	
  or	
  story	
  specific   49
The	
  GeneraBve	
  Process
Generative process




                               50
The	
  GeneraBve	
  Process
Generative process




                               51
The	
  GeneraBve	
  Process
Generative process




                               52
The	
  GeneraBve	
  Process
Generative process




                               53
Estimation
• Sequential Monte Carlo (Particle Filter)
 • For new time period draw stories s, topics z
       p(st+1 , zt+1 |x1...t+1 , s1...t , z1...t )
    using Gibbs Sampling for each particle
 • Reweight particle via

       p(xt+1 |x1...t , s1...t , z1...t )
 • Regenerate particles if l2 norm too heavy
Numbers ...
• TDT5 (Topic Detection and Tracking)
  macro-averaged minimum detection cost: 0.714
      time       entities     topics    story words

       0.84        0.90        0.86           0.75

  This is the best performance on TDT5!
• Yahoo News data
  ... beats all other clustering algorithms
Stories
%
'
#(
)*
!#
                                         		      +,#
#	#
                                              *            
             
     !#$$
Related Stories
!
      
                                 
                                                               #
$$
Detecting Ideologies

    Ahmed and Xing, 2010
Problem	
  Statement
                        Ideologies


                                Build	
  a	
  model	
  to	
  describe	
  both	
  
                                          collecBons	
  of	
  data


VisualizaBon
•	
  How	
  does	
  each	
  ideology	
  view	
  mainstream	
  events?
•	
  On	
  which	
  topics	
  do	
  they	
  differ?
•	
  On	
  which	
  topics	
  do	
  they	
  agree?
Problem	
  Statement
                              Ideologies


                                         Build	
  a	
  model	
  to	
  describe	
  both	
  
                                                   collecBons	
  of	
  data


VisualizaBon
ClassificaBon
•Given	
  a	
  new	
  news	
  arBcle	
  	
  or	
  a	
  blog	
  post,	
  the	
  system	
  should	
  infer
   •	
  From	
  which	
  side	
  it	
  was	
  wriaen
   •	
  	
  JusBfy	
  its	
  answer	
  on	
  a	
  topical	
  level	
  (view	
  on	
  aborBon,	
  taxes,	
  health	
  care)

Contenu connexe

En vedette

Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network RoutingApplying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routingbutest
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolutionmark madsen
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningGianluca Bontempi
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques Jigar Patel
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and gridspotaters
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Modelsbutest
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big DataKezhan SHI
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningFrancesco Gadaleta
 
A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...Alberto Fernandez Villan
 
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition [PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition DongHyun Kwak
 
A system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user wallsA system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user wallsIEEEFINALYEARPROJECTS
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spacesMounia Lalmas-Roelleke
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesUyoyo Edosio
 
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...Alexander Crépin
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision treesJulià Minguillón
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」Tsukasa Sugiura
 

En vedette (19)

Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network RoutingApplying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routing
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and grids
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Models
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big Data
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement Learning
 
A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...
 
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition [PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
 
A system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user wallsA system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user walls
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and Challenges
 
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision trees
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
 

Similaire à Graphical models for automatic information extraction

Toledo Regional Story - User Experience
Toledo Regional Story - User ExperienceToledo Regional Story - User Experience
Toledo Regional Story - User ExperienceKeith Instone
 
Avoiding Social Silos by Samir Ghosh
Avoiding Social Silos by Samir GhoshAvoiding Social Silos by Samir Ghosh
Avoiding Social Silos by Samir GhoshSamir Ghosh
 
Multi-generational Issues in the Changing Workforce
Multi-generational Issues in the Changing WorkforceMulti-generational Issues in the Changing Workforce
Multi-generational Issues in the Changing WorkforceJoseph Kristy
 
Your company & the “social stuff” - advanced
Your company & the “social stuff” - advancedYour company & the “social stuff” - advanced
Your company & the “social stuff” - advancedDavid Hachez
 
Truth and Dare - Out of the echochamber into the fire
Truth and Dare - Out of the echochamber into the fireTruth and Dare - Out of the echochamber into the fire
Truth and Dare - Out of the echochamber into the fireJason Mesut
 
Core and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutCore and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutAre Halland
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0Enterprise 2.0 Conference
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-MakingTeradata Aster
 
NFLabs Profile
NFLabs ProfileNFLabs Profile
NFLabs ProfileSejun Ra
 
2012 URISA Track, Creating Jobs with GIS, Wayne Kocina
2012 URISA Track, Creating Jobs with GIS, Wayne Kocina2012 URISA Track, Creating Jobs with GIS, Wayne Kocina
2012 URISA Track, Creating Jobs with GIS, Wayne KocinaGIS in the Rockies
 
Linkbuilding Tools and Tactics - SES Singapore
Linkbuilding Tools and Tactics - SES SingaporeLinkbuilding Tools and Tactics - SES Singapore
Linkbuilding Tools and Tactics - SES SingaporeJon Quinton
 
Buzzient short presentation_nov8_slideshare
Buzzient short presentation_nov8_slideshareBuzzient short presentation_nov8_slideshare
Buzzient short presentation_nov8_slideshareTBJ Investments, LLC
 
CoolestWomen,org Social Media 101
CoolestWomen,org Social Media 101CoolestWomen,org Social Media 101
CoolestWomen,org Social Media 101bethgyurovits
 
Understanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCUnderstanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCThomas Vander Wal
 
Public private-cloud
Public private-cloudPublic private-cloud
Public private-cloudJamie Taylor
 

Similaire à Graphical models for automatic information extraction (18)

Toledo Regional Story - User Experience
Toledo Regional Story - User ExperienceToledo Regional Story - User Experience
Toledo Regional Story - User Experience
 
Avoiding Social Silos by Samir Ghosh
Avoiding Social Silos by Samir GhoshAvoiding Social Silos by Samir Ghosh
Avoiding Social Silos by Samir Ghosh
 
Multi-generational Issues in the Changing Workforce
Multi-generational Issues in the Changing WorkforceMulti-generational Issues in the Changing Workforce
Multi-generational Issues in the Changing Workforce
 
Your company & the “social stuff” - advanced
Your company & the “social stuff” - advancedYour company & the “social stuff” - advanced
Your company & the “social stuff” - advanced
 
Truth and Dare - Out of the echochamber into the fire
Truth and Dare - Out of the echochamber into the fireTruth and Dare - Out of the echochamber into the fire
Truth and Dare - Out of the echochamber into the fire
 
Core and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutCore and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and Out
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
 
Higgins ESE
Higgins ESEHiggins ESE
Higgins ESE
 
NFLabs Profile
NFLabs ProfileNFLabs Profile
NFLabs Profile
 
2012 URISA Track, Creating Jobs with GIS, Wayne Kocina
2012 URISA Track, Creating Jobs with GIS, Wayne Kocina2012 URISA Track, Creating Jobs with GIS, Wayne Kocina
2012 URISA Track, Creating Jobs with GIS, Wayne Kocina
 
Linkbuilding Tools and Tactics - SES Singapore
Linkbuilding Tools and Tactics - SES SingaporeLinkbuilding Tools and Tactics - SES Singapore
Linkbuilding Tools and Tactics - SES Singapore
 
BuzzJobs
BuzzJobsBuzzJobs
BuzzJobs
 
Buzzient short presentation_nov8_slideshare
Buzzient short presentation_nov8_slideshareBuzzient short presentation_nov8_slideshare
Buzzient short presentation_nov8_slideshare
 
CoolestWomen,org Social Media 101
CoolestWomen,org Social Media 101CoolestWomen,org Social Media 101
CoolestWomen,org Social Media 101
 
Open sourcebi
Open sourcebiOpen sourcebi
Open sourcebi
 
Understanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCUnderstanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DC
 
Public private-cloud
Public private-cloudPublic private-cloud
Public private-cloud
 

Plus de antiw

Cvpr2010 open source vision software, intro and training part viii point clou...
Cvpr2010 open source vision software, intro and training part viii point clou...Cvpr2010 open source vision software, intro and training part viii point clou...
Cvpr2010 open source vision software, intro and training part viii point clou...antiw
 
Cvpr2010 open source vision software, intro and training part vii point cloud...
Cvpr2010 open source vision software, intro and training part vii point cloud...Cvpr2010 open source vision software, intro and training part vii point cloud...
Cvpr2010 open source vision software, intro and training part vii point cloud...antiw
 
Recent Advances in Computer Vision
Recent Advances in Computer VisionRecent Advances in Computer Vision
Recent Advances in Computer Visionantiw
 
15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given me15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given meantiw
 
Randy pauschtimemanagement2007
Randy pauschtimemanagement2007Randy pauschtimemanagement2007
Randy pauschtimemanagement2007antiw
 
Write a research paper howto - good presentation
Write a research paper   howto - good presentationWrite a research paper   howto - good presentation
Write a research paper howto - good presentationantiw
 
15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given me15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given meantiw
 
Note beamer
Note beamerNote beamer
Note beamerantiw
 
Open Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 TutorialOpen Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 Tutorialantiw
 

Plus de antiw (9)

Cvpr2010 open source vision software, intro and training part viii point clou...
Cvpr2010 open source vision software, intro and training part viii point clou...Cvpr2010 open source vision software, intro and training part viii point clou...
Cvpr2010 open source vision software, intro and training part viii point clou...
 
Cvpr2010 open source vision software, intro and training part vii point cloud...
Cvpr2010 open source vision software, intro and training part vii point cloud...Cvpr2010 open source vision software, intro and training part vii point cloud...
Cvpr2010 open source vision software, intro and training part vii point cloud...
 
Recent Advances in Computer Vision
Recent Advances in Computer VisionRecent Advances in Computer Vision
Recent Advances in Computer Vision
 
15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given me15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given me
 
Randy pauschtimemanagement2007
Randy pauschtimemanagement2007Randy pauschtimemanagement2007
Randy pauschtimemanagement2007
 
Write a research paper howto - good presentation
Write a research paper   howto - good presentationWrite a research paper   howto - good presentation
Write a research paper howto - good presentation
 
15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given me15 pieces of advice i wish my ph d advisor had given me
15 pieces of advice i wish my ph d advisor had given me
 
Note beamer
Note beamerNote beamer
Note beamer
 
Open Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 TutorialOpen Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 Tutorial
 

Dernier

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 

Dernier (20)

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 

Graphical models for automatic information extraction

  • 1. Graphical Models for the Internet Alexander Smola & Amr Ahmed Yahoo! Research & Australian National University Santa Clara, CA alex@smola.org blog.smola.org
  • 2. Outline • Part 1 - Motivation • Automatic information extraction • Application areas • Part 2 - Basic Tools • Density estimation / conjugate distributions • Directed Graphical models and inference • Part 3 - Topic Models (our workhorse) • Statistical model • Large scale inference (parallelization, particle filters) • Part 4 - Advanced Modeling • Temporal dependence • Mixing clustering and topic models • Social Networks • Language models
  • 3. Part 1 - Motivation
  • 4. Data on the Internet • Webpages (content, graph) • Clicks (ad, page, social) Finite resources • Users (OpenID, FB Connect) • e-mails (Hotmail, Y!Mail, Gmail) • Editors are expensive • Photos, Movies (Flickr, YouTube, Vimeo ...) • Editors don’t know users • • Cookies / tracking info (see Ghostery) Installed apps (Android market etc.) unlimited amounts • Barrier to i18n • Abuse (intrusions are novel) of • Implicit feedback data • Location (Latitude, Loopt, Foursquared) • User generated content (Wikipedia & co) • Ads (display, text, DoubleClick, Yahoo) • Comments (Disqus, Facebook) • Data analysis (find interesting stuff • Reviews (Yelp, Y!Local) rather than find x) • Third party features (e.g. Experian) • Social connections (LinkedIn, Facebook) • Integrating many systems • Purchase decisions (Netflix, Amazon) • Modular design for data integration • Instant Messages (YIM, Skype, Gtalk) • Search terms (Google, Bing) • Integrate with given prediction tasks • Timestamp (everything) • News articles (BBC, NYTimes, Y!News) Invest in modeling and naming • • Blog posts (Tumblr, Wordpress) Microblogs (Twitter, Jaiku, Meme) rather than data generation
  • 6. Clustering documents airline university restaurant
  • 7. Today’s mission Find hidden structure in the data Human understandable Improved knowledge for estimation
  • 9. Hierarchical Clustering NIPS 2010 Adams, Ghahramani, Jordan
  • 10. Topics in text Latent Dirichlet Allocation; Blei, Ng, Jordan, JMLR 2003
  • 12. Language model automatically synthesized from Penn Treebank Mochihashi, Yamada, Ueda ACL 2009
  • 13. User model over time 0.5 Baseball 0.3 0.4 Dating Propotion Baseball 0.3 0.2 Finance 0.2 Celebrity Jobs 0.1 0.1 Dating Health 0 0 0 10 20 30 40 0 10 20 30 40 Dating DayBaseball Celebrity Health Jobs DayFinance League   Snooki women   skin job   financial   baseball   Tom   men body   career Thomson   basketball,   Cruise dating   fingers   business chart   doublehead Katie singles   cells   assistant real   Bergesen Holmes   personals   toes   hiring Stock Griffey Pinkett seeking   wrinkle   part-­‐time Trading bullpen   Kudrow match layers receptionist currency Greinke Hollywood Ahmed et al., KDD 2011
  • 14. Face recognition from captions Jain, Learned-Miller, McCallum, ICCV 2007
  • 15. Storylines from news Ahmed et al, AISTATS 2011
  • 16. Ideology detection Ahmed et al, 2010; Bitterlemons collection
  • 17. Hypertext topic extraction Gruber, Rosen-Zvi, Weiss; UAI 2008
  • 19. Ontologies • continuous maintenance • no guarantee of coverage • difficult categories expensive, small
  • 20. Face Classification • 100-1000 people • 10k faces • curated (not realistic) • expensive to generate
  • 21. Topic Detection & Tracking • editorially curated training data • expensive to generate • subjective in selection of threads • language specific
  • 22. Advertising Targeting • Needs training data in every language • Is it really relevant for better ads? • Does it cover relevant areas?
  • 23. Challenges • Scale • Millions to billions of instances (documents, clicks, users, messages, ads) • Rich structure of data (ontology, categories, tags) • Model description typically larger than memory of single workstation • Modeling • Usually clustering or topic models do not solve the problem • Temporal structure of data • Side information for variables • Solve problem. Don’t simply apply a model! • Inference • 10k-100k clusters for hierarchical model • 1M-100M words • Communication is an issue for large state space
  • 24. Summary - Part 1 • Essentially infinite amount of data • Labeling is prohibitively expensive • Not scalable for i18n • Even for supervised problems unlabeled data abounds. Use it. • User-understandable structure for representation purposes • Solutions are often customized to problem We can only cover building blocks in tutorial.
  • 25. Part 2 - Basic Tools
  • 27. Probability • Space of events X • server status (working, slow, broken) • income of the user (e.g. $95,000) • search queries (e.g. “graphical models”) • Probability axioms (Kolmogorov) Pr(X) ∈ [0, 1], Pr(X ) = 1 Pr(∪i Xi ) = i Pr(Xi ) if Xi ∩ Xj = ∅ • Example queries • P(server working) = 0.999 • P(90,000 income 100,000) = 0.1
  • 28. (In)dependence • Independence Pr(x, y) = Pr(x) · Pr(y) • Login behavior of two users (approximately) • Disk crash in different colos (approximately) • Dependent events • Emails Pr(x, y) = Pr(x) · Pr(y) • Queries • News stream / Buzz / Tweets • IM communication Everywhere! • Russian Roulette
  • 29. Independence 0.3 0.2 0.3 0.2
  • 30. Dependence 0.45 0.05 0.05 0.45
  • 31. A Graphical Model Spam Mail p(spam, mail) = p(spam) p(mail|spam)
  • 32. Bayes Rule • Joint Probability Pr(X, Y ) = Pr(X|Y ) Pr(Y ) = Pr(Y |X) Pr(X) • Bayes Rule Pr(Y |X) · Pr(X) Pr(X|Y ) = Pr(Y ) • Hypothesis testing • Reverse conditioning
  • 33. AIDS test (Bayes rule) • Data • Approximately 0.1% are infected • Test detects all infections • Test reports positive for 1% healthy people • Probability of having AIDS if test is positive Pr(t|a = 1) · Pr(a = 1) Pr(a = 1|t) = Pr(t) Pr(t|a = 1) · Pr(a = 1) = Pr(t|a = 1) · Pr(a = 1) + Pr(t|a = 0) · Pr(a = 0) 1 · 0.001 = = 0.091 1 · 0.001 + 0.01 · 0.999
  • 34. Improving the diagnosis • Use a follow-up test • Test 2 reports positive for 90% infections • Test 2 reports positive for 5% healthy people 0.01 · 0.05 · 0.999 = 0.357 1 · 0.9 · 0.001 + 0.01 · 0.05 · 0.999 • Why can’t we use Test 1 twice? Outcomes are not independent but tests 1 and 2 are conditionally independent p(t1 , t2 |a) = p(t1 |a) · p(t2 |a)
  • 36. Naive Bayes Spam Filter • Key assumption Words occur independently of each other given the label of the document n p(w1 , . . . , wn |spam) = p(wi |spam) i=1 • Spam classification via Bayes Rule n p(spam|w1 , . . . , wn ) ∝ p(spam) p(wi |spam) • Parameter estimation i=1 Compute spam probability and word distributions for spam and ham
  • 37. A Graphical Model spam spam how to estimate p(w|spam) w1 w2 ... wn wi n i=1..n p(w1 , . . . , wn |spam) = p(wi |spam) i=1
  • 38. Naive NaiveBayes Classifier • Two classes (spam/ham) • Binary features (e.g. presence of $$$, viagra) • Simplistic Algorithm • Count occurrences of feature for spam/ham • Count number of spam/ham mails spam probability feature probability n(i, y) n(y) p(xi = TRUE|y) = and p(y) = n(y) n n(y) n(i, y) n(y) − n(i, y) p(y|x) ∝ n n(y) n(y) i:xi =TRUE i:xi =FALSE
  • 39. Naive NaiveBayes Classifier what if n(i,y)=n(y)? what if n(i,y)=0? n(y) n(i, y) n(y) − n(i, y) p(y|x) ∝ n n(y) n(y) i:xi =TRUE i:xi =FALSE
  • 41. Two outcomes (binomial) • Example: probability of ‘viagra’ in spam/ham • Data likelihood p(X; π) = π n1 (1 − π)n0 • Maximum Likelihood Estimation • Constraint π ∈ [0, 1] • Taking derivatives yields n1 π= n0 + n1
  • 42. n outcomes (multinomial) • Example: USA, Canada, India, UK, NZ • Data likelihood ni p(X; π) = πi i • Maximum Likelihood Estimation • Constrained optimization problem πi = 1 i • Using log-transform yields ni πi = j nj
  • 43. Tossing a Dice 12 24 60 120
  • 44. Conjugate Priors • Unless we have lots of data estimates are weak • Usually we have an idea of what to expect p(θ|X) ∝ p(X|θ) · p(θ) we might even have ‘seen’ such data before • Solution: add ‘fake’ observations p(θ) ∝ p(Xfake |θ) hence p(θ|X) ∝ p(X|θ)p(Xfake |θ) = p(X ∪ Xfake |θ) • Inference (generalized Laplace smoothing) n n 1 1 m fake count φ(xi ) −→ φ(xi ) + µ0 n i=1 n + m i=1 n+m fake mean
  • 45. Conjugate Prior in action mi = m · [µ0 ]i • Discrete Distribution ni ni + mi p(x = i) = −→ p(x = i) = n n+m • Tossing a dice Outcome 1 2 3 4 5 6 Counts 3 6 2 1 4 4 MLE 0.15 0.30 0.10 0.05 0.20 0.20 MAP (m0 = 6) 0.15 0.27 0.12 0.08 0.19 0.19 MAP (m0 = 100) 0.16 0.19 0.16 0.15 0.17 0.17 • Rule of thumb need 10 data points (or prior) per parameter
  • 49. Exponential Families • Density function p(x; θ) = exp (φ(x), θ − g(θ)) where g(θ) = log exp (φ(x ), θ) x • Log partition function generates cumulants ∂θ g(θ) = E [φ(x)] 2 ∂θ g(θ) = Var [φ(x)] • g is convex (second derivative is p.s.d.)
  • 50. Examples • Binomial Distribution φ(x) = x • Discrete Distribution φ(x) = ex (ex is unit vector for x) 1 φ(x) = x, xx • Gaussian 2 • Poisson (counting measure 1/x!) φ(x) = x • Dirichlet, Beta, Gamma, Wishart, ...
  • 52. Poisson Distribution λx e−λ p(x; λ) = x!
  • 53. Beta Distribution xα−1 (1 − x)β−1 p(x; α, β) = B(α, β)
  • 54. Dirichlet Distribution ... this is a distribution over distributions ...
  • 55. Maximum Likelihood • Negative log-likelihood n − log p(X; θ) = g(θ) − φ(xi ), θ i=1 empirical mean average • Taking derivatives n 1 −∂θ log p(X; θ) = m E[φ(x)] − φ(xi ) m i=1 We pick the parameter such that the distribution matches the empirical average.
  • 56. Example: Gaussian Estimation • Sufficient statistics: x, x 2 • Mean and variance given by µ = Ex [x] and σ 2 = Ex [x2 ] − E2 [x] x • Maximum Likelihood Estimate n n 1 2 1 2 2 µ= ˆ xi and σ = xi − µ ˆ n i=1 n i=1 • Maximum a Posteriori Estimate smoother n n 1 2 1 2 n0 2 µ= ˆ xi and σ = xi + 1−µ ˆ n + n0 i=1 n + n0 i=1 n + n0 smoother
  • 57. Collapsing • Conjugate priors p(θ) ∝ p(Xfake |θ) Hence we know how to compute normalization • Prediction p(x|X) = p(x|θ)p(θ|X)dθ (Beta, binomial) ∝ p(x|θ)p(X|θ)p(Xfake |θ)dθ (Dirichlet, multinomial) (Gamma, Poisson) = p({x} ∪ X ∪ Xfake |θ)dθ (Wishart, Gauss) look up closed form expansions http://en.wikipedia.org/wiki/Exponential_family
  • 59. ... some Web 2.0 service MySQL Apache Website • Joint distribution (assume a and m are independent) p(m, a, w) = p(w|m, a)p(m)p(a) • Explaining away p(w|m, a)p(m)p(a) p(m, a|w) = ,a p(w|m , a )p(m )p(a ) m a and m are dependent conditioned on w
  • 60. ... some Web 2.0 service MySQL Apache Website is broken is working At least one of the MySQL is working two services is broken Apache is working (not independent)
  • 61. Directed graphical model m a m a m a w w w user • Easier estimation u action • 15 parameters for full joint distribution • 1+1+3+1 for factorizing distribution • Causal relations • Inference for unobserved variables
  • 62. No loops allowed p(c|e)p(e|c) p(c|e)p(e) or p(e|c)p(c)
  • 63. Directed Graphical Model • Joint probability distribution p(x) = p(xi |xparents(i) ) i • Parameter estimation • If x is fully observed the likelihood breaks up log p(x|θ) = log p(xi |xparents(i) , θ) i • If x is partially observed things get interesting maximization, EM, variational, sampling ...
  • 64. Clustering Density Estimation θ n p(x, θ) = p(θ) p(xi |θ) i=1 x Clustering K n θ p(x, y, θ) = p(π) p(θk ) p(yi |π)p(xi |θ, yi ) k=1 i=1 y x
  • 65. Chains Markov Chain Plate past past present future future Hidden Markov Chain user’s mindset observed user action user model for traversal through search results
  • 66. Chains Markov Chain Plate n−1 p(x; θ) = p(x0 ; θ) p(xi+1 |xi ; θ) i=1 Hidden Markov Chain user’s mindset n−1 n p(x, y; θ) = p(x0 ; θ) p(xi+1 |xi ; θ) p(yi |xi ) i=1 i=1 observed user action user model for traversal through search results
  • 67. Factor Graphs Latent Factors Observed Effects • Observed effects Click behavior, queries, watched news, emails • Latent factors User profile, news content, hot keywords, social connectivity graph, events
  • 68. Recommender Systems news, SearchMonkey answers u m social ranking OMG r ... intersecting plates ... personals (like nested for loops) • Users u • Movies m • Ratings r (but only for a subset of users)
  • 69. Challenges domain • How to design models expert • Common (engineering) sense • Computational tractability • Inference statistics • Easy for fully observed situations • Many algorithms if not fully observed • Dynamic programming / message passing
  • 70. Summary - Part 2 • Probability theory to estimate events • Conjugate priors and Laplace smoothing • Conjugate = phantasy data • Collapsing • Laplace smoothing • Directed graphical models
  • 71. Part 3 - Clustering Topic Models
  • 73. Clustering Density Estimation log-concave θ n p(x, θ) = p(θ) p(xi |θ) i=1 find θ x Clustering K n θ p(x, y, θ) = p(π) p(θk ) p(yi |π)p(xi |θ, yi ) k=1 i=1 general nonlinear y x
  • 74. Clustering • Optimization problem maximize p(x, y, θ) θ y K n maximize log p(π) + log p(θk ) + log [p(yi |π)p(xi |θ, yi )] θ k=1 i=1 yi ∈Y • Options • Direct nonconvex optimization (e.g. BFGS) • Sampling (draw from the joint distribution) • Variational approximation (concave lower bounds aka EM algorithm)
  • 75. Clustering • Integrate out y θ • Integrate out θ θ Y y x x x • Nonconvex • Y is coupled optimization • Sampling problem • Collapsed p • EM algorithm p(y|x) ∝ p({x} | {xi : yi = y} ∪ Xfake )p(y|Y ∪ Yfake )
  • 76. Gibbs sampling • Sampling: Draw an instance x from distribution p(x) • Gibbs sampling: • In most cases direct sampling not possible • Draw one set of variables at a time (b,g) - draw p(.,g) (g,g) - draw p(g,.) 0.45 0.05 (g,g) - draw p(.,g) (b,g) - draw p(b,.) 0.05 0.45 (b,b) ...
  • 77. Gibbs sampling for clustering
  • 78. Gibbs sampling for clustering random initialization
  • 79. Gibbs sampling for clustering sample cluster labels
  • 80. Gibbs sampling for clustering resample cluster model
  • 81. Gibbs sampling for clustering resample cluster labels
  • 82. Gibbs sampling for clustering resample cluster model
  • 83. Gibbs sampling for clustering resample cluster labels
  • 84. Gibbs sampling for clustering resample cluster model e.g. Mahout Dirichlet Process Clustering
  • 85. Inference Algorithm ≠ Model Corollary: EM ≠ Clustering
  • 87. Grouping objects Singapore
  • 88. Grouping objects airline university restaurant
  • 89. Grouping objects Australia USA Singapore
  • 90. Topic Models Australia Singapore university USA airline airline Singapore university USA Singapore food food
  • 91. Clustering Topic Models Clustering Topics ? group objects decompose objects by prototypes into prototypes
  • 92. Clustering Topic Models clustering Latent Dirichlet Allocation α prior α prior cluster topic θ probability θ probability cluster y label y topic label instance instance x x
  • 93. Clustering Topic Models Cluster/ topic x membership = Documents distributions clustering: (0, 1) matrix topic model: stochastic matrix LSI: arbitrary matrices
  • 94. Topics in text Latent Dirichlet Allocation; Blei, Ng, Jordan, JMLR 2003
  • 96. Joint Probability Distribution sample Ψ independently sample θ slo p(θ, z, ψ, x|α, β) independently w K m α = p(ψk |β) p(θi |α) k=1 i=1 topic m,mi θi probability p(zij |θi )p(xij |zij , ψ) i,j sample z zij topic label independently instance language prior β ψk xij
  • 97. Collapsed Sampler p(z, x|α, β) fa m k st = p(zi |α) p({xij |zij = k} |β) α i=1 k=1 topic sample z θi probability sequentially zij topic label instance language prior β ψk xij
  • 98. Collapsed Sampler Griffiths Steyvers, 2005 p(z, x|α, β) fa m k st = p(zi |α) p({xij |zij = k} |β) α i=1 k=1 topic −ij θi probability n (t, d) + αt n−ij (t, w) + βt n−i (d) + t αt n−i (t) + t βt zij topic label instance language prior β ψk xij
  • 99. Sequential Algorithm • Collapsed Gibbs Sampler • For 1000 iterations do • For each document do • For each word in the document do • Resample topic for the word • Update local (document, topic) table • Update global (word, topic) table this kills parallelism
  • 100. State of the art UMass Mallet, UC Irvine, Google • For 1000 iterations do table out • For each document do of sync • For each word in the document do • Resample topic for the word memory • Update local (document, topic) table inefficient • Update CPU local (word, topic) table blocking • Update global (word, topic) table network bound changes rapidly αt n(t, d = i) n(t, w = wij ) [n(t, d = i) + αt ] p(t|wij ) ∝ βw ¯ + βw n(t) + β + ¯ ¯ n(t) + β n(t) + β slow moderately fast
  • 101. Our Approach • For 1000 iterations do (independently per computer) • For each thread/core do • For each document do • For each word in the document do • Resample topic for the word • Update local (document, topic) table • Generate computer local (word, topic) message • In parallel update local (word, topic) table • In parallel update global (word, topic) table network memory table out blocking bound inefficient of sync concurrent minimal continuous barrier cpu hdd net view sync free
  • 103. Multicore Architecture Intel Threading Building Blocks tokens sampler sampler diagnostics file count output to sampler topics combiner sampler updater file optimization sampler topics joint state table • Decouple multithreaded sampling and updating (almost) avoids stalling for locks in the sampler • Joint state table • much less memory required • samplers syncronized (10 docs vs. millions delay) • Hyperparameter update via stochastic gradient descent • No need to keep documents in memory (streaming)
  • 104. Cluster Architecture sampler sampler sampler sampler ice • Distributed (key,value) storage via memcached • Background asynchronous synchronization • single word at a time to avoid deadlocks • no need to have joint dictionary • uses disk, network, cpu simultaneously
  • 105. Cluster Architecture sampler sampler sampler sampler ice ice ice ice • Distributed (key,value) storage via ICE • Background asynchronous synchronization • single word at a time to avoid deadlocks • no need to have joint dictionary • uses disk, network, cpu simultaneously
  • 106. Making it work • Startup • Randomly initialize topics on each node (read from disk if already assigned - hotstart) • Sequential Monte Carlo for startup much faster • Aggregate changes on the fly • Failover • State constantly being written to disk (worst case we lose 1 iteration out of 1000) • Restart via standard startup routine • Achilles heel: need to restart from checkpoint if even a single machine dies.
  • 107. Easily extensible • Better language model (topical n-grams) can process millions of users (vs 1000s) • Conditioning on side information (upstream) estimate topic based on authorship, source, joint user model ... • Conditioning on dictionaries (downstream) integrate topics between different languages • Time dependent sampler for user model approximate inference per episode
  • 108. Google Mallet Irvine’08 Irvine’09 Yahoo LDA LDA Multicore no yes yes yes yes Cluster MPI no MPI point 2 point memcached dictionary separate joint State table separate separate split sparse sparse asynchronous synchronous synchronous synchronous asynchronous Schedule approximate exact exact exact exact messages
  • 109. Speed • 1M documents per day on 1 computer (1000 topics per doc, 1000 words per doc) • 350k documents per day per node (context switches memcached stray reducers) • 8 Million docs (Pubmed) (sampler does not burn in well - too short doc) • Irvine: 128 machines, 10 hours • Yahoo: 1 machine, 11 days • Yahoo: 20 machines, 9 hours • 20 Million docs (Yahoo! News Articles) • Yahoo: 100 machines, 12 hours
  • 110. Scalability 200k documents/computer 40 30 20 10 0 CPUs 1 10 20 50 100 Runtime (hours) Initial topics per word x10 Likelihood even improves with parallelism! -3.295 (1 node) -3.288 (10 nodes) -3.287 (20 nodes)
  • 111. The Competition Dataset size (millions) 50k 20 50000 15 Throughput/h 10 5 37500 0 Google Irvine Yahoo 25000 Cluster size 130 97.5 12500 65 6.4k 32.5 150 0 0 Google Irvine Yahoo Google Irvine Yahoo
  • 113. Variable Replication • Global shared variable computer x y z x y y’ z synchronize local copy • Make local copy • Distributed (key,value) storage table for global copy • Do all bookkeeping locally (store old versions) • Sync local copies asynchronously using message passing (no global locks are needed) • This is an approximation!
  • 114. Asymmetric Message Passing • Large global shared state space (essentially as large as the memory in computer) • Distribute global copy over several machines (distributed key,value storage) global state current copy old copy
  • 115. Out of core storage • Very large state space x y z • Gibbs sampling requires us to traverse the data sequentially many times (think 1000x) • Stream local data from disk and update coupling variable each time local data is accessed • This is exact tokens sampler sampler diagnostics file count output to sampler topics combiner sampler updater file optimization sampler topics
  • 116. Summary - Part 3 • Inference in graphical models • Clustering • Topic models • Sampling • Implementation details
  • 117. Part 4 - Advanced Modeling
  • 119. Problem • How many clusters should we pick? • How about a prior for infinitely many clusters? • Finite model n(y) + αy p(y|Y, α) = n + y αy • Infinite model Assume that the total smoother weight is constant n(y) α p(y|Y, α) = and p(new|Y, α) = n+ y αy n+α
  • 120. Chinese Restaurant Metaphor φ1 φ2 φ3 the rich get richer GeneraBve  Process -­‐For  data  point  xi   -­‐  Choose  table  j  ∝  mj        and    Sample  xi  ~  f(φj) -­‐  Choose  a  new  table    K+1  ∝  α   -­‐  Sample  φK+1  ~  G0      and  Sample  xi  ~  f(φK+1) Pitman; Antoniak; Ishwaran; Jordan et al.; Teh et al.;
  • 121. Evolutionary Clustering • Time series of objects, e.g. news stories • Stories appear / disappear • Want to keep track of clusters automatically
  • 122. Recurrent Chinese Restaurant Process T=1 φ1,1 φ2,1 φ3,1 m'1,1=2 m'2,1=3 m'3,1=1 T=2
  • 123. Recurrent Chinese Restaurant Process T=1 φ1,1 φ2,1 φ3,1 m'1,1=2 m'2,1=3 m'3,1=1 T=2 φ1,1 φ2,1 φ3,1
  • 124. Recurrent Chinese Restaurant Process T=1 φ1,1 φ2,1 φ3,1 m'1,1=2 m'2,1=3 m'3,1=1 T=2 φ1,1 φ2,1 φ3,1
  • 125. Recurrent Chinese Restaurant Process T=1 φ1,1 φ2,1 φ3,1 m'1,1=2 m'2,1=3 m'3,1=1 T=2 φ1,1 φ2,1 φ3,1 Sample  φ1,2  ~  P(.| φ1,1)
  • 126. Recurrent Chinese Restaurant Process T=1 φ1,1 φ2,1 φ3,1 m'1,1=2 m'2,1=3 m'3,1=1 T=2 φ1,1 φ2,1 φ3,1
  • 127. Recurrent Chinese Restaurant Process T=1 φ1,1 φ2,1 φ3,1 m'1,1=2 m'2,1=3 m'3,1=1 T=2 φ1,2 φ2,2 φ3,1 φ4,2 dead cluster new cluster
  • 128. Longer History φ1,1 φ2,1 φ3,1 T=1 m'1,1=2 m'2,1=3 m'3,1=1 T=2 φ1,2 φ2,2 φ3,1 φ4,2 m'2,3 T=3 φ1,2 φ2,2 φ4,2
  • 129. TDPM Generative Power DPM W= ∞ λ=∞ TDPM W=4 Powerlaw λ = .4 Independent DPMs W= 0 λ = ? (any) 37
  • 130. User modeling Baseball 0.3 Propotion 0.2 Finance Jobs 0.1 Dating 0 0 10 20 30 40 Day
  • 131. Buying a camera show ads now too late time
  • 132. User modeling Problem  formulaBon Movies Auto Car Theatre Price Deals Art Used gallery van inspecBon Diet Hiring job Calories Salary Hiring Recipe Diet diet chocolate calories Flight School London Supplies Hotel Loan weather college
  • 133. User modeling Problem  formulaBon CARS Art Movies Auto Car Theatre Price Deals Art Used gallery van inspecBon Jobs Diet Diet Hiring job Calories Salary Hiring Recipe Diet diet chocolate calories Travel Flight School finance London College Supplies Hotel Loan weather college
  • 134. User modeling Problem  formulaBon Input • Queries  issued  by  the  user  or  Tags  of  watched  content • Snippet  of  page  examined  by  user • Time  stamp  of  each  acBon  (day  resoluBon) Output •    Users’  daily  distribuBon  over  intents •    Dynamic  intent  representaBon Travel Flight School finance London College Supplies Hotel Loan weather college
  • 135. Time dependent models • LDA for topical model of users where • User interest distribution changes over time • Topics change over time • This is like a Kalman filter except that • Don’t know what to track (a priori) • Can’t afford a Rauch-Tung-Striebel smoother • Much more messy than plain LDA
  • 136. Graphical Model α αt−1 αt αt+1 time dependent plain θi t−1 θi t θi t+1 θi user interest LDA zij zij wij wij user actions φk φt−1 k φt k φt+1 k t−1 t t+1 actions per topic β β β β
  • 137. All μ3 month μ2 week Long-­‐term μ short-­‐term Prior  for  user   acBons  at  Bme  t food Food recipe Part-­‐Bme Kelly chicken Chicken job Opening recipe Pizza pizza hiring salary cuisine millage                t                                  t+1           Time Diet Cars Job Finance Recipe Car Bank job   Chocolate Blue Online Career Pizza Book Credit Business Food Kelley Card Assistant Chicken Prices debt   Hiring Milk Small por_olio Part-­‐Bme Buaer Speed Finance RecepBonist Powder large Chase
  • 138. At  0me  t At  0me  t+1 Car job   Bank Recipe AlBma Career Online Chocolate Accord Business Credit Pizza Blue Assistant Card Food Book Hiring debt   Chicken Kelley Part-­‐Bme por_olio Milk Prices RecepBoni Finance Buaer Small st Chase Powder Speed short-­‐term priors Food  Chicken Pizza    mileage GeneraBve  Process •  For  each  user  interacBon •  Choose  an  intent  from  local  distribuBon • Sample  word  from  the  topic’s  word-­‐distribuBon   Car  speed  offer •Choose  a  new  intent    ∝  α   Camry  accord  career • Sample  a  new  intent  from  the  global  distribuBon •  Sample  word  from  the  new  topic  word-­‐distribuBon  
  • 139. At  0me  t At  0me  t+1 At  0me  t+2 At  0me  t+3 Global m process m' n User  1 n' process User  2 process User  3 process
  • 140. Sample users 0.5 Baseball 0.3 0.4 Dating Propotion Baseball 0.3 0.2 Finance 0.2 Celebrity Jobs 0.1 0.1 Dating Health 0 0 0 10 20 30 40 0 10 20 30 40 Day Day Dating Baseball Celebrity Health Jobs Finance League   Snooki women   skin job   financial   baseball   Tom   men body   career Thomson   basketball,   Cruise dating   fingers   business chart   doublehead Katie singles   cells   assistant real   Bergesen Holmes   personals   toes   hiring Stock Griffey Pinkett seeking   wrinkle   part-­‐time Trading bullpen   Kudrow match layers receptionist currency Greinke Hollywood
  • 143. ROC score improvement Dataset−2 baseline 62 TLDA TLDA+Baseline 60 58 56 54 52 50 ] ] ] ] ] ] ] 0 0 00 40 20 60 00 00 00 00 2 ,6 0, 0, 0, ,4 ,2 ,1 1 00 [6 [4 0 00 00 00 [1 0 [6 [4 [2 [1
  • 144. LDA for user profiling Sample  Z Sample  Z Sample  Z Sample  Z For  users For  users For  users For  users Write  counts   Write  counts   Write  counts   Write  counts   to   to   to   to   memcached memcached memcached memcached Barrier Collect  counts   Do  nothing Do  nothing Do  nothing and  sample   Barrier Read   from   Read   from   Read   from   Read   from   memcached memcached memcached memcached
  • 145. News
  • 147. News Stream • Over 1 high quality news article per second • Multiple sources (Reuters, AP, CNN, ...) • Same story from multiple sources • Stories are related • Goals • Aggregate articles into a storyline • Analyze the storyline (topics, entities)
  • 148. Clustering / RCRP • Assume active story distribution at time t • Draw story indicator • Draw words from story distribution • Down-weight story counts for next day Ahmed Xing, 2008
  • 149. Clustering / RCRP • Pro • Nonparametric model of story generation (no need to model frequency of stories) • No fixed number of stories • Efficient inference via collapsed sampler • Con • We learn nothing! • No content analysis
  • 150. Latent Dirichlet Allocation • Generate topic distribution per article • Draw topics per word from topic distribution • Draw words from topic specific word distribution Blei, Ng, Jordan, 2003
  • 151. Latent Dirichlet Allocation • Pro • Topical analysis of stories • Topical analysis of words (meaning, saliency) • More documents improve estimates • Con • No clustering
  • 152. More Issues • Named entities are special, topics less (e.g. Tiger Woods and his mistresses) • Some stories are strange (topical mixture is not enough - dirty models) • Articles deviate from general story (Hierarchical DP)
  • 153. Storylines Amr Ahmed, Quirong Ho, Jake Eisenstein, Alex Smola, Choon Hui Teo, 2011
  • 154. Storylines Model • Topic model • Topics per cluster • RCRP for cluster • Hierarchical DP for article • Separate model for named entities • Story specific correction
  • 155. Storylines Model High-level Tightly-focused concepts 46
  • 156. The  Graphical  Model Storylines Model Tightly-­‐focused High-­‐level  concepts
  • 157. The  Graphical  Model Storylines Model Each  story  has: •DistribuBon  over  words •DistribuBon  over  topics •DistribuBon  over  named  enBtes  
  • 158. The  Graphical  Model Storylines Model •  Document’s  topic  mix  is  sampled  from  its  story  prior •  Words  inside  a  document  either  global  or  story  specific 49
  • 163. Estimation • Sequential Monte Carlo (Particle Filter) • For new time period draw stories s, topics z p(st+1 , zt+1 |x1...t+1 , s1...t , z1...t ) using Gibbs Sampling for each particle • Reweight particle via p(xt+1 |x1...t , s1...t , z1...t ) • Regenerate particles if l2 norm too heavy
  • 164. Numbers ... • TDT5 (Topic Detection and Tracking) macro-averaged minimum detection cost: 0.714 time entities topics story words 0.84 0.90 0.86 0.75 This is the best performance on TDT5! • Yahoo News data ... beats all other clustering algorithms
  • 166. %
  • 167. '
  • 168. #(
  • 169. )*
  • 170. !# +,#
  • 171. # # * !#$$
  • 173.
  • 174. ! #
  • 175. $$
  • 176.
  • 177. Detecting Ideologies Ahmed and Xing, 2010
  • 178. Problem  Statement Ideologies Build  a  model  to  describe  both   collecBons  of  data VisualizaBon •  How  does  each  ideology  view  mainstream  events? •  On  which  topics  do  they  differ? •  On  which  topics  do  they  agree?
  • 179. Problem  Statement Ideologies Build  a  model  to  describe  both   collecBons  of  data VisualizaBon ClassificaBon •Given  a  new  news  arBcle    or  a  blog  post,  the  system  should  infer •  From  which  side  it  was  wriaen •    JusBfy  its  answer  on  a  topical  level  (view  on  aborBon,  taxes,  health  care)
  • 180. Problem  Statement Ideologies Build  a  model  to  describe  both   collecBons  of  data VisualizaBon ClassificaBon Structured  browsing •Given  a  new  news  arBcle    or  a  blog  post,  the  user  can  ask  for  : •Examples  of  other  arBcles  from  the  same  ideology  about  the  same  topic •Documents  that  could  exemplify  alterna0ve  views  from  other  ideologies
  • 181. Building a factored model β1 φ1,1 φ2,1 β1 φ1,2 φ2,2 Ω1 Ω2 βk-­‐1 φ1,k φ2,k βk Ideology  1 Ideology  2 Views Views Topics
  • 182. Building a factored model β1 φ1,1 φ2,1 β2 φ1,2 φ2,2 Ω1 Ω2 βk-­‐1 φ1,k φ2,k βk Ideology  1 Ideology  2 Views Views Topics λ λ 1−λ 1−λ
  • 183. Datasets Data • BiAerlemons:   • Middle-­‐east  conflict,  document  wriaen  by  Israeli  and  PalesBnian  authors. •  ~300  documents  form  each  view  with  average  length  740 •  MulB  author  collecBon •  80-­‐20  split  for  test  and  train • Poli0cal  Blog-­‐1: •  American  poliBcal  blogs  (Democrat  and  Republican) •  2040  posts  with  average  post  length  =  100  words •  Follow  test  and  train  split  as  in  (Yano  et  al.,  2009) • Poli0cal  Blog-­‐2    (test  generalizaBon  to  a  new  wriBng  style) •  Same  as  1  but  6  blogs,  3  from  each  side •    ~14k  posts  with  ~200  words  per  post •  4  blogs  for  training  and  2  blogs  for  test
  • 184. Example:  Biaerlemons  corpus Bitterlemons dataset US    role powell minister colin visit arafat state leader roadmap Israeli election month iraq yasir bush US president american internal policy statement PalesBnian sharon administration prime express pro previous View senior involvement clinton pressure policy washington package work transfer View terrorism european Roadmap  process palestinian palestinian end settlement process force terrorism unit israeli israeli implementation obligation provide confidence element roadmap phase security Peace peace stop expansion commitment interim discussion union ceasefire state plan political year political fulfill unit illegal present succee point build positive international step authority occupation process previous assassination meet recognize present timetable process state forward end security end conflict right way government Arab  Involvement government need conflict people way track negotiation official time year peace strategic plo hizballah security leadership position force islamic neighbor territorial syria syrian negotiate lebanon withdrawal time victory negotiation radical iran relation think deal conference concession present second stand obviou countri mandate asad agreement regional circumstance represent greater conventional intifada october initiative relationship sense talk strategy issue affect jihad time participant parti negotiator
  • 186. GeneralizaBon  to  New  Blogs Generalization to new blogs
  • 187. Geqng  AlternaBve  View Finding alternate views -­‐ Given  a  document  wriaen  in  one  ideology,  retrieve  the  equivalent -­‐ Baseline:  SVM  +  cosine  similarity 144
  • 188. Can  We  use  Unlabeled  data? Unlabeled data •  In  theory  this  is  simple •Add  a  step  that  samples  the  document  view  (v) •Doesn’t  mix  in  pracBce  because  Bght  coupling  between  v  and  (x1,x2,z) •SoluBon •Sample    v  and  (x1,x2,z)    as  a  block    using  a  Metropolis-­‐HasBng  step •  This  is  a  huge  proposal!
  • 189. Summary - Part 4 • Chinese Restaurant Process • Recurrent CRP • User modeling • Storylines • Ideology detection