SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
A Tutorial on Spectral Clustering
            Part 2: Advanced/related Topics

                                                          Chris Ding
                                      Computational Research Division
                                   Lawrence Berkeley National Laboratory
                                          University of California




                                                                                    1
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Advanced/related Topics
         •     Spectral embedding: simplex cluster structure
         •     Perturbation analysis
         •      K-means clustering in embedded space
         •     Equivalence of K-means clustering and PCA
         •     Connectivity networks: scaled PCA & Green’s function
         •     Extension to bipartite graphs: Correspondence
               analysis
         •     Random talks and spectral clustering
         •     Semi-definite programming and spectral clustering
         •     Spectral ordering (distance-sensitive ordering)
         •     Webpage spectral ranking: Page-Rank and HITS
                                                                                    2
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Spectral Embedding:
                                          Simplex Cluster Structure
              • Compute K eigenvectors of the Laplacian.
              • Embed objects in the K-dim eigenspace
              What is the structure of the clusters?
          Simplex Embedding Theorem.
          Assume objects are well characterized by spectral
          clustering objective functions. In the embedded
          space, objects aggregate to K distinct centroids:
          • Centroids locate on K corners of a simplex
                  • Simplex consists K basis vectors + coordinate origin
                  • Simplex is rotated by an orthogonal transformation T
                  • Columns of T are eigenvectors of a K × K embedding
                  matrix Γ                                                                         3
                                                                                    (Ding, 2004)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
K-way Clustering Objectives

                                   s (C p , C q )            s(C p , Cq )                       s (C p , G − C q )
    J=      ∑     1≤ p < q ≤ K        ρ (C p )
                                                        +
                                                                ρ (C q )
                                                                                    =   ∑   k       ρ (C p )

                                                                                           for Ratio Cut
                                           nk =| Ck |
                               
                               
                     ρ (Ck ) = d (Ck ) =
                               
                                                     ∑
                                                     d
                                                i∈Ck i                                  for Normalized Cut
                                s (Ck , Ck ) =
                               
                                                          ∑   w
                                                    i∈Ck , j∈Ck ij
                                                                                        for MinMaxCut


                             G - Ck is the graph complement of Ck

                                                                                                                     4
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Simplex Spectral Embedding Theorem
             Simplex Orthogonal
              Transform Matrix
                                                                    T = (t1 , , t K )

             T are determined by:                                             Γt k = λ k t k
                                                                                                     −1         −1
            Spectral Perturbation Matrix                                                 Γ=Ω          2
                                                                                                           ΓΩ    2


                  h11    − s12                    − s1K                          s pq = s (C p , Cq )
                 − s                              − s2 K 
                                                                                            ∑
                           h22
             Γ =  21                                                              hkk =               s
                                                                                                p| p ≠ k kp
                                                   
                                                          
                 − s K 1 − s K 2                  hKK                     Ω = diag[ ρ (C1 ),, ρ (Ck )]


                                                                                                                     5
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Properties of Spectral Embedding

    • Original basis vectors:                              -nk

                                                 hk = (0l0,1l1,0l0) / nk

      • Dimension of embedding is                                                   K-1:   (q2, … qK)
             – q1=(1,…,1)T is constant  trivial
             – Eigenvalues of Γ (=eigenvalues of D-W)
             – Eigenvalues determine how well clustering objective
               function characterize the data
      • Exact solution for K=2
                                                                                                    6
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
2-way Spectral Embedding
                                         (Exact Solution)
    Eigenvalues
              s (C1 , C2 ) s (C1 , C 2 )           s(C1 , C 2 ) s (C1 , C 2 )          s (C1 , C 2 ) s (C1 , C 2 )
  λRcut =                 +              , λNcut =             +              , λMMC =              +
                 n(C1 )       n(C 2 )                d (C1 )      d (C 2 )             s (C1 , C1 ) s(C 2 , C 2 )

       Recover the original 2-way clustering objectives

    For Normalized Cut, orthogonal transform T rotates
                     h1 = (1l1,0 l 0)T , h2 = (0 l 0,1l1)T
      into
                     q1 = (1l1)T, q2 = ( a,l, a,−b,l,−b)
                                                         T

    Spectral clustering inherently consistent!                                        (Ding et al, KDD’01)
                                                                                                         7
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Perturbation Analysis
      Wq = λDq                         Wz = ( D −1 / 2WD −1 / 2 ) z = λz q = D −1 / 2 z
                                        ˆ

        Assume data has 3 dense clusters sparsely connected.

                                                                                         C2
                     W W W 
                       11  12  13                                C1

                     
                 W =  21 W22 W23
                      W          
                      31 W32 W33
                     W                                                       C3




    Off-diagonal blocks are between-cluster connections,
    assumed small and are treated as a perturbation
                                                                                    (Ding et al, KDD’01) 8
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Perturbation Analysis
            0th order:
                                                                            W ( 0)                     
                               W11                                          ˆ
                                                                             11                        
                              =                                                      ˆ ( 0)
                        (0)                                       Wˆ (0 ) =                            
                    W               W22                                               W 22
                                                                                              ˆ ( 0) 
                               
                                       W33 
                                            
                                                                            
                                                                                               W 33   


            1st order:
                                    W12 W13                              W − W (0)
                                                                             ˆ    ˆ        ˆ
                                                                                          W12          W13 
                                                                                                        ˆ
                                                                            11     11                            
                   W (1)        W
                              =  21     W23                     ˆ
                                                                 W (1)   =  W21ˆ               (0)
                                                                                                       W23 
                                                                                                        ˆ
                                                                                      W22 − W 22
                                                                                        ˆ     ˆ
                                                                                                                 
                                W31 W32
                                            
                                                                           W31ˆ          ˆ
                                                                                          W32              ˆ ( 0)
                                                                                                    W33 − W 33 
                                                                                                     ˆ
                                                                                                                 

               W pq = D pp W pq D qq
                       −1 / 2     −1 / 2
                ˆ (0)

               W pq = ( D p1 + D p 2 + D p 3 ) −1 / 2 W pq ( Dq1 + Dq 2 + Dq 3 ) −1 / 2
                ˆ

                                                                                                                      9
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
K-means clustering

         • Developed in 1960’s (Lloyd, MacQueen, etc)
         • Computationally Efficient (order-mN)
         • Widely used in practice
                – Benchmark to evaluate other algorithms


          Given n points in m-dim:                                           X = ( x1 , x2 ,l, xn )
                                                                             K
                K-means                               min J K =            ∑ ∑|| x − c
                                                                           k =1 i∈C k
                                                                                        i   k   ||2
                                                                                                      10
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
K-means Clustering in
                                     Spectral Embedded Space
    Simplex spectral embedding theorem provides
      theoretical basis for K-means clustering in the
      embedded eigenspace
            – Cluster centroids are well separated (corners of the
              simplex)
            – K-means clustering is invariant under (i) coordinate
              rotation x → Tx, and (ii) shift x → x + a
            – Thus orthogonal transform T in simplex embedding un-
              necessary
    • Many variants of K-means (Ng et al, Bach 
      Jordan, Zha et al, Shi  Xu, etc)
                                                                                    11
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
We have proved
                Spectral embedding + K-means clustering
                is the appropriate method


         We now show :
           K-means itself is solved by PCA


                                                                                    12
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Equivalence of K-means Clustering
                        and Principal Component Analysis
    • Cluster indicators specify the solution of K-means
      clustering
    • Principal components are eigenvectors of the
      Gram (Kernel) matrix = data projections in the
      principal directions of the covariance matrix
    • Optimal solution of K-means clustering:
          continuous solution of the discrete cluster
          indicators of K-means are given by
          Principal components
                                                                                    (Zha et al, NIPS’01; Ding  He, 2003)

                                                                                                                   13
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Principal Component Analysis (PCA)

    • Widely used in large number of different fields
            – Best low-rank approximation (SVD Theorem, Eckart-
              Young, 1930) : Noise reduction
            – Unsupervised dimension reduction
            – Many generalizations
    • Conventional perspective is inadequate to explain
      the effectiveness of PCA
    • New results: Principal components are cluster
      indicators for well-motivated clustering objective

                                                                                    14
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Principal Component Analysis

               n points in m-dim:                                      X = ( x1 , x2 ,l, xn )
         Principal directions:                             uk
                     Covariance                   S = XX T                          XX T u k = λk uk
          Principal components:                                  vk
         Gram (Kernel) matrix X T X                                                 X T Xvk = λk vk
                                                                                          m
        Singular Value Decomposition:                                               X=   ∑
                                                                                         k =1
                                                                                                λk u k v k
                                                                                                         T
                                                                                                        15
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
2-way K -means Clustering
   Cluster membership                                                + n2 / n1n
                                                                                            if i ∈ C1
                                                            q (i ) = 
         indicator:                                                  − n1 / n2 n           if i ∈ C2
                                                                     
                                                         n1n2     d (C1 , C2 ) d (C1, C1 ) d (C2 , C2 ) 
        J K = n〈 x 2 〉 − J D , J D =                             2 n n        −           −             
                                                          n           1 2          n12          2
                                                                                                n2       
         Define distance matrix: D = (dij ), d ij =|xi − x j|2
                                        ~
                    J D = − qT Dq = −qT Dq = 2qT X T Xq
                                ~
                                D           is the centered distance matrix
                                       min J K ⇒ max J D
                                                                                                       16
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
2-way K-means Clustering
   Cluster indicator satisfy:                                              ∑i q(i ) = 0, ∑i q 2 (i ) = 1
       Relax the restriction q (i) take discrete values.
       Let it take continuous values in [-1,1]. Solution
       for q is the eigenvector of the Gram matrix.

       Theorem: The (continuous) optimal solution of q
       is given by the principal component v1 .
        Clusters C1, C2 are determined by:
                      C1 = {i | v1 (i )  0}, C2 = {i | v1 (i ) ≥ 0}
          Once C1, C2 are computed, iterate K-mean to
          convergence                                                                              17
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Multi-way K-means Clustering

       Unsigned Cluster membership indicators h1, …, hK:

                        C1            C2 C3

                         1           0 0
                         1           0 0
                                         = (h1 , h2 , h3 )
                         0           1 0
                                        
                         0           0 1
                                                                                    18
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Multi-way K-means Clustering
                                                                                K
       For K ≥ 2,
                                                        ∑                     ∑ ∑
                                                                                   1
                                           JK =                  x2       −                            xiT x j
                                                                i i               n        i , j∈C k
                                                                              k =1 k

 (Unsigned) Cluster membership indicators h1, …, hK:
                -nk

     hk = (0 0,11,0 0)T / nk
                                                                                    K
                                          J K = ∑i x − ∑ hk X T Xhk
                                                          T           2
                                                                      i
                                                                                    k =1
      Let           H = (h1 ,, hK )
                          J K = ∑ xi2 − Tr ( H k X T XH k )
                                               T
                                                                                                                 19
                                            i
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Multi-way K-means Clustering
     Regularized Relaxation of K-means Clustering
                                                                      K
  Redundancy in h1,                                …,     hK:        ∑ n1/ 2 hk = e = (11m1)T
                                                                        k
                                                                     k =1


       Transform to signed indicator vectors q1 - qk via
       the k x k orthogonal matrix T:
               (q1 ,..., qk ) = (h1 ,, hk )T                                            Qk = H kT
         Require 1st column of T = ( n1
                                                                            1 /2
                                                                                    ,, n1/2)T / n1/2
                                                                                         k

              Thus               q1 = e /n1/2 = const
                                                                                                        20
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Regularized Relaxation of K-means Clustering

                     J K = Tr (Y T Y ) − Tr (Qk −1Y T YQk −1 )
                                              T


                                                                                    Qk −1 = (q2 ,..., qk )
      (Regularized relaxation)
      Theorem: The optimal solutions of q2 … qk are
      given by the principal components v2 … vk. JK is
      bounded below by total variance minus sum of
      K eigenvalues of covariance:
                                                     K −1
                                     n y − ∑ λk  min J K  n y 2
                                            2

                                                     k =1                                               21
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Scaled PCA
             similarity matrix S=(sij) (generated from XXT)
                                                             D = diag (d1 ,m, d n )                     di = si.
                                                      ~               −1            −1
   Nonlinear re-scaling:                              S = D SD , ~ = sij /(si.s j. )1/ 2
                                                                 sij   2             2

                                          ~
    Apply SVD on                          S⇒
              1 ~                                         
        S = D 2 S D 2 = D 2 ∑ zk λk zT D 2 = D ∑ qk λk qT  D
                    1     1              1

                                     k                   k
                            k                  k          
        qk = D-1/2 zk is the scaled principal component
   Subtract trivial component λ = 1, z = d 1 / 2 /s.., q                                                               =1
                                                                           0             0                         0

            ⇒              S − dd T /s.. = D ∑ qk λk qT D
                                                      k
                                                               k =1                          (Ding, et al, 2002)   22
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Optimality Properties of Scaled PCA
   Scaled principal components have optimality properties:
   Ordering
          – Adjacent objects along the order are similar
          – Far-away objects along the order are dissimilar
          – Optimal solution for the permutation index are given by
            scaled PCA.

   Clustering
          – Maximize within-cluster similarity
          – Minimize between-cluster similarity
          – Optimal solution for cluster membership indicators given
            by scaled PCA.
                                                                                    23
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Difficulty of K-way clustering
          • 2-way clustering uses a single eigenvector
          • K-way clustering uses several eigenvectors
          • How to recover 0-1 cluster indicators H?
            eigenvecto rs : Q = ( q1 ,..., q k )
               has both positive and negative entries
            indicators : H = ( h1 , l , hk )   Q = HT
        Avoid computing the transformation T:
        • Do K-means, which is invariant under T
        • Compute connectivity network QQT, which cancels T
                                                        24
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Connectivity Network
                                    1 if i, j belong to same cluster
                             Cij = 
                                    0            otherwise

                                                                          K
           SPCA provides                                    C≅D         ∑q λ
                                                                         k =1
                                                                                    k k        qT D
                                                                                                k

                                                                             K

                                                                           ∑q
                                                                                              1
       Green’s function :                                C ≈G=                                   qT
                                                                                           1 − λk k
                                                                                       k
                                                                           k =2
                                                                                 K
       Projection matrix:                                    C≈P≡             ∑qk =1
                                                                                           k   qT
                                                                                                k
                                                                                                      (Ding et al, 2002)
                                                                                                                   25
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Connectivity network
         •     Similar to Hopfield network
         •     Mathematical basis: projection matrix
         •     Show self-aggregation clearly
         •     Drawback: how to recover clusters
                – Apply K-means directly on C
                – Use linearized assignment with cluster crossing
                  and spectral ordering (ICML’04)

                                                                                    26
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Similarity matrix W
                           Connectivity network: Example 1
Connectivity




                                                                λ2 = 0.300, λ2 = 0.268
  matrix




                                                     Between-cluster connections suppressed
                                                     Within-cluster connections enhanced
                                                          Effects of self-aggregation    27
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Connectivity of Internet Newsgroups
                     NG2: comp.graphics                            100 articles from each group.
                     NG9: rec.motorcycles                          1000 words
                     NG10: rec.sport.baseball                      Tf.idf weight. Cosine similarity
                     NG15:sci.space
                     NG18:talk.politics.mideast
                                                                         Spectral Clustering 89%
                                                                         Direct K-means 66%

                   cosine similarity                                     Connectivity matrix




                                                                                                      28
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Spectral embedding is not
                                   topology preserving


               700 3-D data points form
               2 interlock rings



               In eigenspace, they
               shrink and separate



                                                                                    29
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Correspondence Analysis (CA)
      • Mainly used in graphical display of data
      • Popular in France (Benzécri, 1969)
      • Long history
             – Simultaneous row and column regression (Hirschfeld,
               1935)
             – Reciprocal averaging (Richardson  Kuder, 1933;
               Horst, 1935; Fisher, 1940; Hill, 1974)
             – Canonical correlations, dual scaling, etc.
      • Formulation is a bit complicated (“convoluted”
        Jolliffe, 2002, p.342)
      • “A neglected method”, (Hill, 1974)
                                                                                    30
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Scaled PCA on a Contingency Table
               ⇒ Correspondence Analysis
                         ~
   Nonlinear re-scaling: P = D−2 PD−2 , ~ = p /( p p )1/ 2
                                1    1

                              r    c    pij ij i. j.
                                                 ~
           Apply SVD on P                                        Subtract trivial component

             P − rcT / p.. = Dr ∑ f k λk g T Dc                                     r = ( p1.,l, pn. )T
                                           k
                                                       k =1
                                      −1                           −1
                                                                                    c = ( p.1,l, p.n )T
                       fk = Dr uk , gk = Dc vk
                                       2                            2



               are the scaled row and column principal
               component (standard coordinates in CA)                                              31
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Information Retrieval

          Bell Lab tech memos
          5 comp-sci and 4 applied-math memo titles:
          C1: Human machine interface for lab ABC computer applications
          C2: A survey of user opinion of computer system response time
          C3: The EPS user interface management system
          C4: System and human system engineering testing of EPS
          C5: Relation of user-perceived response time to error management
          M1: The generation of random, binary, unordered trees
          M2: The intersection graph of paths in trees
          M3: Graph minors IV: widths of trees and well-quasi-ordering
          M4: Graph minors: A survey

                                                                                    32
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Word-document matrix: row/col clustering
                 words docs                c4          c1 c3 c5 c2 m4 m3 m2 m1
                human                      1           1
                EPS                        1              1
                interface                              1 1
                system                     2              1     1
                computer                                        1
                user                                         1 1
                response                                     1 1
                time                                         1 1
                survey                                          1 1
                minors                                             1 1
                graph                                              1 1 1
                tree                                                  1 1 1 33
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Bipartite Graph: 3 types of Connectivity networks
                                      F FT                      FK GK 
                                                                     T
                                                                                                          F 
                              QK QK =  K KT
                                  T
                                                                     T
                                                                                    QK = (q1 ,m , q K ) =  K 
                                      GK FK                     GK GK                                   GK 


                                   e eT 2s11          0                               row-row clustering
                           FK FK =  r1 r1
                               T
                                                    T        
                                   
                                         0   e r2 e r2 2s22 
                                                             

                                     e c eT 2 s11          0      
                            GK G K =  1 1
                                           c
                                                                    Column-column clustering
                                 T
                                                         T
                                     
                                            0     e c2 e c2 2 s22 
                                                                   

                                     e eT 2 s11          0     
                             FK GK =  r1 c1                     Row-column association
                                 T
                                                       T
                                     
                                           0    e r2 e c2 2s22 
                                                                
                                                                                                                  34
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Example

                                                                                    column-column: GGT




                                                                                     row-row: FFT




                                           Original data matrix
      λ2 = 0.456, λ2 = 0.477
                                                                                    row-column: FGT

                                                                                                    35
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Internet Newsgroups




  Simultaneous clustering
  of documents and words




                                                                                    36
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Random Walks and Normalized Cut

       Similarity matrix W,                                     P = D −1W Stochastic matrix

          πTP =πT                           ⇒ equilibrium distribution:                  π =d

        Px = λx ⇒ Wx = λDx ⇒ ( D − W ) x = (1 − λ ) Dx

      Random walks between A,B:
                                                         P( A → B) P( B → A)
                                  J NormCut =                     +
                                                           π ( A)    π ( B)
                                                                                    (Meila  Shi, 2001)

                                             −1
        PageRank:                     P = αLDout + (1 − α )eeT                                   37
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Semi-definite Programming for
                                     Normalized Cut
                                                                                               -nk

      Normalized Cut :                                                       y k = D1/ 2 (0o 0,1o1,0o 0)T / || D1/ 2 hk ||
                                                                                                       ~
                                                                                                       W = D −1/ 2WD −1 / 2
                                  ~
     Optimize : min Tr (Y T ( I − W )Y ), subject to Y T Y = I
                                 Y
             ~                                                   ~
  ⇒ Tr[( I − W )YY T ]                            ⇒ min Tr[( I − W ) Z ]                            Z = YY T
                                                           Z

                           s.t. Z ≥ 0, Z  0 , Zd = d , Tr Z = K , Z = Z T
        Compute Z via SDP. Z=Y’Y’T.
        Y’’=D-1/2Y’. K-means on Y’’.
                                                                                         (Xing  Jordan, 2003)

           Z = connectivity network
                                                                                                                        38
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Spectral Ordering
         Hill, 1970, spectral embedding
             Find coordinate x to minimize
                                  J=       ∑(x − x ) w
                                             ij
                                                       i         j
                                                                     2
                                                                          ij   = xT ( D − W ) x

              Solution are eigenvectors of Laplacian
         Barnard, Pothen, Simon, 1993, envelop reduction of sparse
           matrix: find ordering such that the envelop is minimized

                 min      ∑ (i − j) w
                             ij
                                               2
                                                     ij    ⇒ min               ∑ (π
                                                                                ij
                                                                                      i   − π j ) 2 wij

                                                                                                          39
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Distance-sensitive ordering
           Ordering is determined by permutation indexes
            4-variable. For a given ordering, there are 3
            distance=1 pairs, two d=2 pairs, one d=3 pair.
              0 1 2 3                                π (1,l, n) = (π 1 ,l, π n )
               0 1 2
                                                                                    n−d
                                                J d (π ) = ∑i =1 sπ i ,π i+d
                     
                  0 1
                     
                    0

                            min J , J (π ) = ∑n−1 d 2 J d (π )
                                              d =1
                                π
 The larger distance, the larger weights. Large distance
 similarities reduced more than small distance similarities
                                                                                      (Ding  He, ICML’04) 40
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Distance-sensitive ordering
               Theorem. The continuous optimal solution for the
               discrete inverse permutation indexes are given by
               the scaled principal component q1.

          The shifted and scaled inverse permutation indexes
                    π i−1 − (n + 1) /2 1 − n 3 − n     n −1
               qi =                   ={    ,      ,,     }
                           n /2          n     n         n
            Relax the restriction on q. Allow it be continuous.
                      Solution for q becomes the eigenvector of
                                                (D − S)q = λ Dq
                                                                                    41
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Re-ordering of Genes and Tissues

                                                                                                           J (π )
                                                                                               r=
                                                                                                       J (random)


                                                                                                    r = 0.18


                                                                                                           J d =1 (π )
                                                                                              rd =1=
                                                                                                       J d =1 ( random)

                                                                                                  rd =1 = 3.39


                                                                                                               42
                                                                                    C. Ding, RECOMB 2002
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Webpage Spectral Ranking
                   Rank webpages from the hyperlink topology.
              L : adjacency matrix of the web subgraph
         PageRank (Page  Brin): rank according to
         principal eigenvector π (equilibrium distribution)
                                                 −1
                               πT = π , T = 0.8 Dout L + 0.2eeT
        HITS (Kleinberg): rank according to
        principal eigenvector of authority matrix

                                               (LT L)q = λ q
             Eigenvectors can be obtained in closed-form
                                                                                    43
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Webpage Spectral Ranking
                                     HITS (Kleinberg) ranking algorithm
             Assume web graph is fixed degree sequence
             random graph (Aiello, Chung, Lu, 2000)
              Theorem. Eigenvalues of LTL
                                                           d i2
                                      λ1  h1  λ2  h2  , hi = d i −
                                                          n −1
               Eigenvectors:          d1      d2          dn T
                             uk = (        ,        ,,         )
                                    λk − h1 λk − h2     λk − hn

             Principal eigenvector u1 is monotonic decreasing
             if               d1  d 2  d3  hh
             ⇒ HITS ranking is identical to indegree ranking
                                                                                    (Ding, et al, SIAM Review ’04) 44
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Webpage Spectral Ranking
                      PageRank: weight normalization
                      HITS : mutual reinforcement
         Combine PageRank and HITS. Generalize. ⇒
                                                     −1
         Ranking based on a similarity graph S = LT Dout L

          Random walks on this similarity graph
          has the equilibrium distribution:
                                                                                    (d1, d2,, dn )T / 2E
   PageRank ranking is identical to indegree ranking
   (1st order approximation, due to combination of PageRank  HITS)
                                                                                        (Ding, et al, SIGIR’02)
                                                                                                           45
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
PCA: a Unified Framework
                                    for clustering and ordering
       • PCA is equivalent to K-means Clustering
       • Scaled PCA has two optimality properties
          – Distance sensitive ordering
          – Min-max principle Clustering
       • SPCA on contingency table ⇒ Correspondence Analysis
          – Simultaneous ordering of rows and columns
          – Simultaneous clustering of rows and columns
       • Resolve open problems
              – Relationship between Correspondence Analysis and PCA (open
                problem since 1940s)
              – Relationship between PCA and K-means clustering (open
                problem since 1960s)
                                                                                    46
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Spectral Clustering:
       a rich spectrum of topics
       a comprehensive framework for learning



        A tutorial  review of spectral clustering
        Tutorial website will post all related papers (send your papers)
                                                                                    47
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
Acknowledgment

                                      Hongyuan Zha, Penn State
                                      Horst Simon, Lawrence Berkeley Lab
                                      Ming Gu, UC Berkeley
                                      Xiaofeng He, Lawrence Berkeley Lab
                                      Michael Jordan, UC Berkeley
                                      Michael Berry, U. Tennessee, Knoxville
                                      Inderjit Dhillon, UT Austin
                                      George Karypis, U. Minnesota
                                      Haesen Park, U. Minnesota

                          Work supported by Office of Science, Dept. of Energy



                                                                                    48
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California

Contenu connexe

Tendances

study Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Imagesstudy Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large ImagesChiamin Hsu
 
A new transformation into State Transition Algorithm for finding the global m...
A new transformation into State Transition Algorithm for finding the global m...A new transformation into State Transition Algorithm for finding the global m...
A new transformation into State Transition Algorithm for finding the global m...Michael_Chou
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
 
Solving connectivity problems via basic Linear Algebra
Solving connectivity problems via basic Linear AlgebraSolving connectivity problems via basic Linear Algebra
Solving connectivity problems via basic Linear Algebracseiitgn
 
Kernel for Chordal Vertex Deletion
Kernel for Chordal Vertex DeletionKernel for Chordal Vertex Deletion
Kernel for Chordal Vertex DeletionAkankshaAgrawal55
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
Ultrasound Modular Architecture
Ultrasound Modular ArchitectureUltrasound Modular Architecture
Ultrasound Modular ArchitectureJose Miguel Moreno
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Christopher Morris
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationChristopher Morris
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Taiji Suzuki
 
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Yandex
 
Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...
Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...
Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...Anax Fotopoulos
 
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...少华 白
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsRonald Teo
 
D. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in CosmologyD. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in CosmologySEENET-MTP
 
Non-interacting and interacting Graphene in a strong uniform magnetic field
Non-interacting and interacting Graphene in a strong uniform magnetic fieldNon-interacting and interacting Graphene in a strong uniform magnetic field
Non-interacting and interacting Graphene in a strong uniform magnetic fieldAnkurDas60
 
Modular representation theory of finite groups
Modular representation theory of finite groupsModular representation theory of finite groups
Modular representation theory of finite groupsSpringer
 

Tendances (20)

study Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Imagesstudy Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Images
 
A new transformation into State Transition Algorithm for finding the global m...
A new transformation into State Transition Algorithm for finding the global m...A new transformation into State Transition Algorithm for finding the global m...
A new transformation into State Transition Algorithm for finding the global m...
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Introduction to DFT Part 2
Introduction to DFT Part 2Introduction to DFT Part 2
Introduction to DFT Part 2
 
Solving connectivity problems via basic Linear Algebra
Solving connectivity problems via basic Linear AlgebraSolving connectivity problems via basic Linear Algebra
Solving connectivity problems via basic Linear Algebra
 
Kernel for Chordal Vertex Deletion
Kernel for Chordal Vertex DeletionKernel for Chordal Vertex Deletion
Kernel for Chordal Vertex Deletion
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Ultrasound Modular Architecture
Ultrasound Modular ArchitectureUltrasound Modular Architecture
Ultrasound Modular Architecture
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph Classification
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
Ilya Shkredov – Subsets of Z/pZ with small Wiener norm and arithmetic progres...
 
Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...
Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...
Semi-empirical Monte Carlo optical-gain modelling of Nuclear Imaging scintill...
 
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implicat...
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methods
 
D. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in CosmologyD. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in Cosmology
 
Non-interacting and interacting Graphene in a strong uniform magnetic field
Non-interacting and interacting Graphene in a strong uniform magnetic fieldNon-interacting and interacting Graphene in a strong uniform magnetic field
Non-interacting and interacting Graphene in a strong uniform magnetic field
 
Modular representation theory of finite groups
Modular representation theory of finite groupsModular representation theory of finite groups
Modular representation theory of finite groups
 

En vedette

طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلبطبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلبfaradars
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Izukun
 
Clusterin k means
Clusterin k meansClusterin k means
Clusterin k meanssinaexe
 
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral ClusteringDavide Eynard
 
Spectral clustering - Houston ML Meetup
Spectral clustering - Houston ML MeetupSpectral clustering - Houston ML Meetup
Spectral clustering - Houston ML MeetupYan Xu
 

En vedette (6)

Spectral graph theory
Spectral graph theorySpectral graph theory
Spectral graph theory
 
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلبطبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 
Clusterin k means
Clusterin k meansClusterin k means
Clusterin k means
 
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral Clustering
 
Spectral clustering - Houston ML Meetup
Spectral clustering - Houston ML MeetupSpectral clustering - Houston ML Meetup
Spectral clustering - Houston ML Meetup
 

Similaire à icml2004 tutorial on spectral clustering part II

fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdfgrssieee
 
VahidAkbariTalk.pdf
VahidAkbariTalk.pdfVahidAkbariTalk.pdf
VahidAkbariTalk.pdfgrssieee
 
VahidAkbariTalk_v3.pdf
VahidAkbariTalk_v3.pdfVahidAkbariTalk_v3.pdf
VahidAkbariTalk_v3.pdfgrssieee
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysisrik0
 
Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?
Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?
Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?Tuong Do
 
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCDNucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCDChristos Kallidonis
 
Chip Package Co-simulation
Chip Package Co-simulationChip Package Co-simulation
Chip Package Co-simulationMing-Chi Chen
 
Benchmark Calculations of Atomic Data for Modelling Applications
 Benchmark Calculations of Atomic Data for Modelling Applications Benchmark Calculations of Atomic Data for Modelling Applications
Benchmark Calculations of Atomic Data for Modelling ApplicationsAstroAtom
 
Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...
Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...
Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...Tuong Do
 
Probabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
Probabilistic Control of Uncertain Linear Systems Using Stochastic ReachabilityProbabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
Probabilistic Control of Uncertain Linear Systems Using Stochastic ReachabilityLeo Asselborn
 
SYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE AND LIE ALGEBRA
SYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE  AND LIE ALGEBRASYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE  AND LIE ALGEBRA
SYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE AND LIE ALGEBRABRNSS Publication Hub
 
The Effective Fragment Molecular Orbital Method
The Effective Fragment Molecular Orbital MethodThe Effective Fragment Molecular Orbital Method
The Effective Fragment Molecular Orbital Methodcsteinmann
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierRaj Sikarwar
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntuaIEEE NTUA SB
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control systemAlireza Mirzaei
 

Similaire à icml2004 tutorial on spectral clustering part II (20)

fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
 
VahidAkbariTalk.pdf
VahidAkbariTalk.pdfVahidAkbariTalk.pdf
VahidAkbariTalk.pdf
 
VahidAkbariTalk_v3.pdf
VahidAkbariTalk_v3.pdfVahidAkbariTalk_v3.pdf
VahidAkbariTalk_v3.pdf
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Basissets.pptx
Basissets.pptxBasissets.pptx
Basissets.pptx
 
Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?
Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?
Solar Cells Lecture 4: What is Different about Thin-Film Solar Cells?
 
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCDNucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
Nucleon electromagnetic form factors at high-momentum transfer from Lattice QCD
 
Chip Package Co-simulation
Chip Package Co-simulationChip Package Co-simulation
Chip Package Co-simulation
 
Benchmark Calculations of Atomic Data for Modelling Applications
 Benchmark Calculations of Atomic Data for Modelling Applications Benchmark Calculations of Atomic Data for Modelling Applications
Benchmark Calculations of Atomic Data for Modelling Applications
 
Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...
Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...
Solar Cells Lecture 3: Modeling and Simulation of Photovoltaic Devices and Sy...
 
MOLECULAR MODELLING
MOLECULAR MODELLINGMOLECULAR MODELLING
MOLECULAR MODELLING
 
Probabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
Probabilistic Control of Uncertain Linear Systems Using Stochastic ReachabilityProbabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
Probabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
 
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
 
SYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE AND LIE ALGEBRA
SYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE  AND LIE ALGEBRASYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE  AND LIE ALGEBRA
SYMMETRIC BILINEAR CRYPTOGRAPHY ON ELLIPTIC CURVE AND LIE ALGEBRA
 
The Effective Fragment Molecular Orbital Method
The Effective Fragment Molecular Orbital MethodThe Effective Fragment Molecular Orbital Method
The Effective Fragment Molecular Orbital Method
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
How to design a linear control system
How to design a linear control systemHow to design a linear control system
How to design a linear control system
 
ECNG 6503 #1
ECNG 6503 #1 ECNG 6503 #1
ECNG 6503 #1
 

Plus de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Plus de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Dernier

Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 

Dernier (20)

Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 

icml2004 tutorial on spectral clustering part II

  • 1. A Tutorial on Spectral Clustering Part 2: Advanced/related Topics Chris Ding Computational Research Division Lawrence Berkeley National Laboratory University of California 1 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 2. Advanced/related Topics • Spectral embedding: simplex cluster structure • Perturbation analysis • K-means clustering in embedded space • Equivalence of K-means clustering and PCA • Connectivity networks: scaled PCA & Green’s function • Extension to bipartite graphs: Correspondence analysis • Random talks and spectral clustering • Semi-definite programming and spectral clustering • Spectral ordering (distance-sensitive ordering) • Webpage spectral ranking: Page-Rank and HITS 2 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 3. Spectral Embedding: Simplex Cluster Structure • Compute K eigenvectors of the Laplacian. • Embed objects in the K-dim eigenspace What is the structure of the clusters? Simplex Embedding Theorem. Assume objects are well characterized by spectral clustering objective functions. In the embedded space, objects aggregate to K distinct centroids: • Centroids locate on K corners of a simplex • Simplex consists K basis vectors + coordinate origin • Simplex is rotated by an orthogonal transformation T • Columns of T are eigenvectors of a K × K embedding matrix Γ 3 (Ding, 2004) Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 4. K-way Clustering Objectives s (C p , C q ) s(C p , Cq ) s (C p , G − C q ) J= ∑ 1≤ p < q ≤ K ρ (C p ) + ρ (C q ) = ∑ k ρ (C p )  for Ratio Cut  nk =| Ck |   ρ (Ck ) = d (Ck ) =  ∑ d i∈Ck i for Normalized Cut  s (Ck , Ck ) =   ∑ w i∈Ck , j∈Ck ij for MinMaxCut G - Ck is the graph complement of Ck 4 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 5. Simplex Spectral Embedding Theorem Simplex Orthogonal Transform Matrix T = (t1 , , t K ) T are determined by: Γt k = λ k t k −1 −1 Spectral Perturbation Matrix Γ=Ω 2 ΓΩ 2  h11 − s12 − s1K  s pq = s (C p , Cq ) − s − s2 K  ∑ h22 Γ =  21  hkk = s p| p ≠ k kp     − s K 1 − s K 2 hKK  Ω = diag[ ρ (C1 ),, ρ (Ck )] 5 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 6. Properties of Spectral Embedding • Original basis vectors: -nk hk = (0l0,1l1,0l0) / nk • Dimension of embedding is K-1: (q2, … qK) – q1=(1,…,1)T is constant trivial – Eigenvalues of Γ (=eigenvalues of D-W) – Eigenvalues determine how well clustering objective function characterize the data • Exact solution for K=2 6 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 7. 2-way Spectral Embedding (Exact Solution) Eigenvalues s (C1 , C2 ) s (C1 , C 2 ) s(C1 , C 2 ) s (C1 , C 2 ) s (C1 , C 2 ) s (C1 , C 2 ) λRcut = + , λNcut = + , λMMC = + n(C1 ) n(C 2 ) d (C1 ) d (C 2 ) s (C1 , C1 ) s(C 2 , C 2 ) Recover the original 2-way clustering objectives For Normalized Cut, orthogonal transform T rotates h1 = (1l1,0 l 0)T , h2 = (0 l 0,1l1)T into q1 = (1l1)T, q2 = ( a,l, a,−b,l,−b) T Spectral clustering inherently consistent! (Ding et al, KDD’01) 7 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 8. Perturbation Analysis Wq = λDq Wz = ( D −1 / 2WD −1 / 2 ) z = λz q = D −1 / 2 z ˆ Assume data has 3 dense clusters sparsely connected. C2 W W W  11 12 13 C1  W =  21 W22 W23 W   31 W32 W33 W  C3 Off-diagonal blocks are between-cluster connections, assumed small and are treated as a perturbation (Ding et al, KDD’01) 8 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 9. Perturbation Analysis 0th order: W ( 0)  W11  ˆ  11  =  ˆ ( 0) (0) Wˆ (0 ) =   W W22 W 22    ˆ ( 0)    W33     W 33   1st order:  W12 W13  W − W (0) ˆ ˆ ˆ W12 W13  ˆ  11 11  W (1) W =  21 W23  ˆ W (1) =  W21ˆ (0) W23  ˆ  W22 − W 22 ˆ ˆ   W31 W32     W31ˆ ˆ W32 ˆ ( 0) W33 − W 33  ˆ   W pq = D pp W pq D qq −1 / 2 −1 / 2 ˆ (0) W pq = ( D p1 + D p 2 + D p 3 ) −1 / 2 W pq ( Dq1 + Dq 2 + Dq 3 ) −1 / 2 ˆ 9 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 10. K-means clustering • Developed in 1960’s (Lloyd, MacQueen, etc) • Computationally Efficient (order-mN) • Widely used in practice – Benchmark to evaluate other algorithms Given n points in m-dim: X = ( x1 , x2 ,l, xn ) K K-means min J K = ∑ ∑|| x − c k =1 i∈C k i k ||2 10 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 11. K-means Clustering in Spectral Embedded Space Simplex spectral embedding theorem provides theoretical basis for K-means clustering in the embedded eigenspace – Cluster centroids are well separated (corners of the simplex) – K-means clustering is invariant under (i) coordinate rotation x → Tx, and (ii) shift x → x + a – Thus orthogonal transform T in simplex embedding un- necessary • Many variants of K-means (Ng et al, Bach Jordan, Zha et al, Shi Xu, etc) 11 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 12. We have proved Spectral embedding + K-means clustering is the appropriate method We now show : K-means itself is solved by PCA 12 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 13. Equivalence of K-means Clustering and Principal Component Analysis • Cluster indicators specify the solution of K-means clustering • Principal components are eigenvectors of the Gram (Kernel) matrix = data projections in the principal directions of the covariance matrix • Optimal solution of K-means clustering: continuous solution of the discrete cluster indicators of K-means are given by Principal components (Zha et al, NIPS’01; Ding He, 2003) 13 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 14. Principal Component Analysis (PCA) • Widely used in large number of different fields – Best low-rank approximation (SVD Theorem, Eckart- Young, 1930) : Noise reduction – Unsupervised dimension reduction – Many generalizations • Conventional perspective is inadequate to explain the effectiveness of PCA • New results: Principal components are cluster indicators for well-motivated clustering objective 14 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 15. Principal Component Analysis n points in m-dim: X = ( x1 , x2 ,l, xn ) Principal directions: uk Covariance S = XX T XX T u k = λk uk Principal components: vk Gram (Kernel) matrix X T X X T Xvk = λk vk m Singular Value Decomposition: X= ∑ k =1 λk u k v k T 15 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 16. 2-way K -means Clustering Cluster membership + n2 / n1n  if i ∈ C1 q (i ) =  indicator: − n1 / n2 n if i ∈ C2  n1n2  d (C1 , C2 ) d (C1, C1 ) d (C2 , C2 )  J K = n〈 x 2 〉 − J D , J D = 2 n n − −  n  1 2 n12 2 n2  Define distance matrix: D = (dij ), d ij =|xi − x j|2 ~ J D = − qT Dq = −qT Dq = 2qT X T Xq ~ D is the centered distance matrix min J K ⇒ max J D 16 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 17. 2-way K-means Clustering Cluster indicator satisfy: ∑i q(i ) = 0, ∑i q 2 (i ) = 1 Relax the restriction q (i) take discrete values. Let it take continuous values in [-1,1]. Solution for q is the eigenvector of the Gram matrix. Theorem: The (continuous) optimal solution of q is given by the principal component v1 . Clusters C1, C2 are determined by: C1 = {i | v1 (i ) 0}, C2 = {i | v1 (i ) ≥ 0} Once C1, C2 are computed, iterate K-mean to convergence 17 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 18. Multi-way K-means Clustering Unsigned Cluster membership indicators h1, …, hK: C1 C2 C3 1 0 0 1 0 0   = (h1 , h2 , h3 ) 0 1 0   0 0 1 18 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 19. Multi-way K-means Clustering K For K ≥ 2, ∑ ∑ ∑ 1 JK = x2 − xiT x j i i n i , j∈C k k =1 k (Unsigned) Cluster membership indicators h1, …, hK: -nk hk = (0 0,11,0 0)T / nk K J K = ∑i x − ∑ hk X T Xhk T 2 i k =1 Let H = (h1 ,, hK ) J K = ∑ xi2 − Tr ( H k X T XH k ) T 19 i Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 20. Multi-way K-means Clustering Regularized Relaxation of K-means Clustering K Redundancy in h1, …, hK: ∑ n1/ 2 hk = e = (11m1)T k k =1 Transform to signed indicator vectors q1 - qk via the k x k orthogonal matrix T: (q1 ,..., qk ) = (h1 ,, hk )T Qk = H kT Require 1st column of T = ( n1 1 /2 ,, n1/2)T / n1/2 k Thus q1 = e /n1/2 = const 20 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 21. Regularized Relaxation of K-means Clustering J K = Tr (Y T Y ) − Tr (Qk −1Y T YQk −1 ) T Qk −1 = (q2 ,..., qk ) (Regularized relaxation) Theorem: The optimal solutions of q2 … qk are given by the principal components v2 … vk. JK is bounded below by total variance minus sum of K eigenvalues of covariance: K −1 n y − ∑ λk min J K n y 2 2 k =1 21 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 22. Scaled PCA similarity matrix S=(sij) (generated from XXT) D = diag (d1 ,m, d n ) di = si. ~ −1 −1 Nonlinear re-scaling: S = D SD , ~ = sij /(si.s j. )1/ 2 sij 2 2 ~ Apply SVD on S⇒ 1 ~   S = D 2 S D 2 = D 2 ∑ zk λk zT D 2 = D ∑ qk λk qT  D 1 1 1 k k k k  qk = D-1/2 zk is the scaled principal component Subtract trivial component λ = 1, z = d 1 / 2 /s.., q =1 0 0 0 ⇒ S − dd T /s.. = D ∑ qk λk qT D k k =1 (Ding, et al, 2002) 22 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 23. Optimality Properties of Scaled PCA Scaled principal components have optimality properties: Ordering – Adjacent objects along the order are similar – Far-away objects along the order are dissimilar – Optimal solution for the permutation index are given by scaled PCA. Clustering – Maximize within-cluster similarity – Minimize between-cluster similarity – Optimal solution for cluster membership indicators given by scaled PCA. 23 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 24. Difficulty of K-way clustering • 2-way clustering uses a single eigenvector • K-way clustering uses several eigenvectors • How to recover 0-1 cluster indicators H? eigenvecto rs : Q = ( q1 ,..., q k ) has both positive and negative entries indicators : H = ( h1 , l , hk ) Q = HT Avoid computing the transformation T: • Do K-means, which is invariant under T • Compute connectivity network QQT, which cancels T 24 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 25. Connectivity Network  1 if i, j belong to same cluster Cij =   0 otherwise K SPCA provides C≅D ∑q λ k =1 k k qT D k K ∑q 1 Green’s function : C ≈G= qT 1 − λk k k k =2 K Projection matrix: C≈P≡ ∑qk =1 k qT k (Ding et al, 2002) 25 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 26. Connectivity network • Similar to Hopfield network • Mathematical basis: projection matrix • Show self-aggregation clearly • Drawback: how to recover clusters – Apply K-means directly on C – Use linearized assignment with cluster crossing and spectral ordering (ICML’04) 26 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 27. Similarity matrix W Connectivity network: Example 1 Connectivity λ2 = 0.300, λ2 = 0.268 matrix Between-cluster connections suppressed Within-cluster connections enhanced Effects of self-aggregation 27 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 28. Connectivity of Internet Newsgroups NG2: comp.graphics 100 articles from each group. NG9: rec.motorcycles 1000 words NG10: rec.sport.baseball Tf.idf weight. Cosine similarity NG15:sci.space NG18:talk.politics.mideast Spectral Clustering 89% Direct K-means 66% cosine similarity Connectivity matrix 28 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 29. Spectral embedding is not topology preserving 700 3-D data points form 2 interlock rings In eigenspace, they shrink and separate 29 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 30. Correspondence Analysis (CA) • Mainly used in graphical display of data • Popular in France (Benzécri, 1969) • Long history – Simultaneous row and column regression (Hirschfeld, 1935) – Reciprocal averaging (Richardson Kuder, 1933; Horst, 1935; Fisher, 1940; Hill, 1974) – Canonical correlations, dual scaling, etc. • Formulation is a bit complicated (“convoluted” Jolliffe, 2002, p.342) • “A neglected method”, (Hill, 1974) 30 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 31. Scaled PCA on a Contingency Table ⇒ Correspondence Analysis ~ Nonlinear re-scaling: P = D−2 PD−2 , ~ = p /( p p )1/ 2 1 1 r c pij ij i. j. ~ Apply SVD on P Subtract trivial component P − rcT / p.. = Dr ∑ f k λk g T Dc r = ( p1.,l, pn. )T k k =1 −1 −1 c = ( p.1,l, p.n )T fk = Dr uk , gk = Dc vk 2 2 are the scaled row and column principal component (standard coordinates in CA) 31 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 32. Information Retrieval Bell Lab tech memos 5 comp-sci and 4 applied-math memo titles: C1: Human machine interface for lab ABC computer applications C2: A survey of user opinion of computer system response time C3: The EPS user interface management system C4: System and human system engineering testing of EPS C5: Relation of user-perceived response time to error management M1: The generation of random, binary, unordered trees M2: The intersection graph of paths in trees M3: Graph minors IV: widths of trees and well-quasi-ordering M4: Graph minors: A survey 32 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 33. Word-document matrix: row/col clustering words docs c4 c1 c3 c5 c2 m4 m3 m2 m1 human 1 1 EPS 1 1 interface 1 1 system 2 1 1 computer 1 user 1 1 response 1 1 time 1 1 survey 1 1 minors 1 1 graph 1 1 1 tree 1 1 1 33 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 34. Bipartite Graph: 3 types of Connectivity networks F FT FK GK  T F  QK QK =  K KT T T QK = (q1 ,m , q K ) =  K  GK FK GK GK  GK  e eT 2s11 0  row-row clustering FK FK =  r1 r1 T T    0 e r2 e r2 2s22   e c eT 2 s11 0  GK G K =  1 1 c  Column-column clustering T T   0 e c2 e c2 2 s22   e eT 2 s11 0  FK GK =  r1 c1  Row-column association T T   0 e r2 e c2 2s22   34 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 35. Example column-column: GGT row-row: FFT Original data matrix λ2 = 0.456, λ2 = 0.477 row-column: FGT 35 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 36. Internet Newsgroups Simultaneous clustering of documents and words 36 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 37. Random Walks and Normalized Cut Similarity matrix W, P = D −1W Stochastic matrix πTP =πT ⇒ equilibrium distribution: π =d Px = λx ⇒ Wx = λDx ⇒ ( D − W ) x = (1 − λ ) Dx Random walks between A,B: P( A → B) P( B → A) J NormCut = + π ( A) π ( B) (Meila Shi, 2001) −1 PageRank: P = αLDout + (1 − α )eeT 37 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 38. Semi-definite Programming for Normalized Cut -nk Normalized Cut : y k = D1/ 2 (0o 0,1o1,0o 0)T / || D1/ 2 hk || ~ W = D −1/ 2WD −1 / 2 ~ Optimize : min Tr (Y T ( I − W )Y ), subject to Y T Y = I Y ~ ~ ⇒ Tr[( I − W )YY T ] ⇒ min Tr[( I − W ) Z ] Z = YY T Z s.t. Z ≥ 0, Z 0 , Zd = d , Tr Z = K , Z = Z T Compute Z via SDP. Z=Y’Y’T. Y’’=D-1/2Y’. K-means on Y’’. (Xing Jordan, 2003) Z = connectivity network 38 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 39. Spectral Ordering Hill, 1970, spectral embedding Find coordinate x to minimize J= ∑(x − x ) w ij i j 2 ij = xT ( D − W ) x Solution are eigenvectors of Laplacian Barnard, Pothen, Simon, 1993, envelop reduction of sparse matrix: find ordering such that the envelop is minimized min ∑ (i − j) w ij 2 ij ⇒ min ∑ (π ij i − π j ) 2 wij 39 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 40. Distance-sensitive ordering Ordering is determined by permutation indexes 4-variable. For a given ordering, there are 3 distance=1 pairs, two d=2 pairs, one d=3 pair. 0 1 2 3 π (1,l, n) = (π 1 ,l, π n )  0 1 2 n−d J d (π ) = ∑i =1 sπ i ,π i+d    0 1    0 min J , J (π ) = ∑n−1 d 2 J d (π ) d =1 π The larger distance, the larger weights. Large distance similarities reduced more than small distance similarities (Ding He, ICML’04) 40 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 41. Distance-sensitive ordering Theorem. The continuous optimal solution for the discrete inverse permutation indexes are given by the scaled principal component q1. The shifted and scaled inverse permutation indexes π i−1 − (n + 1) /2 1 − n 3 − n n −1 qi = ={ , ,, } n /2 n n n Relax the restriction on q. Allow it be continuous. Solution for q becomes the eigenvector of (D − S)q = λ Dq 41 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 42. Re-ordering of Genes and Tissues J (π ) r= J (random) r = 0.18 J d =1 (π ) rd =1= J d =1 ( random) rd =1 = 3.39 42 C. Ding, RECOMB 2002 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 43. Webpage Spectral Ranking Rank webpages from the hyperlink topology. L : adjacency matrix of the web subgraph PageRank (Page Brin): rank according to principal eigenvector π (equilibrium distribution) −1 πT = π , T = 0.8 Dout L + 0.2eeT HITS (Kleinberg): rank according to principal eigenvector of authority matrix (LT L)q = λ q Eigenvectors can be obtained in closed-form 43 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 44. Webpage Spectral Ranking HITS (Kleinberg) ranking algorithm Assume web graph is fixed degree sequence random graph (Aiello, Chung, Lu, 2000) Theorem. Eigenvalues of LTL d i2 λ1 h1 λ2 h2 , hi = d i − n −1 Eigenvectors: d1 d2 dn T uk = ( , ,, ) λk − h1 λk − h2 λk − hn Principal eigenvector u1 is monotonic decreasing if d1 d 2 d3 hh ⇒ HITS ranking is identical to indegree ranking (Ding, et al, SIAM Review ’04) 44 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 45. Webpage Spectral Ranking PageRank: weight normalization HITS : mutual reinforcement Combine PageRank and HITS. Generalize. ⇒ −1 Ranking based on a similarity graph S = LT Dout L Random walks on this similarity graph has the equilibrium distribution: (d1, d2,, dn )T / 2E PageRank ranking is identical to indegree ranking (1st order approximation, due to combination of PageRank HITS) (Ding, et al, SIGIR’02) 45 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 46. PCA: a Unified Framework for clustering and ordering • PCA is equivalent to K-means Clustering • Scaled PCA has two optimality properties – Distance sensitive ordering – Min-max principle Clustering • SPCA on contingency table ⇒ Correspondence Analysis – Simultaneous ordering of rows and columns – Simultaneous clustering of rows and columns • Resolve open problems – Relationship between Correspondence Analysis and PCA (open problem since 1940s) – Relationship between PCA and K-means clustering (open problem since 1960s) 46 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 47. Spectral Clustering: a rich spectrum of topics a comprehensive framework for learning A tutorial review of spectral clustering Tutorial website will post all related papers (send your papers) 47 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
  • 48. Acknowledgment Hongyuan Zha, Penn State Horst Simon, Lawrence Berkeley Lab Ming Gu, UC Berkeley Xiaofeng He, Lawrence Berkeley Lab Michael Jordan, UC Berkeley Michael Berry, U. Tennessee, Knoxville Inderjit Dhillon, UT Austin George Karypis, U. Minnesota Haesen Park, U. Minnesota Work supported by Office of Science, Dept. of Energy 48 Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California