SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
FAST MATRIX COMPUTATIONS FOR
                     PAIRWISE AND COLUMN-WISE KATZ
                          SCORES AND COMMUTE TIMES
                                                               David F. Gleich
                                                            Purdue University

                                                            Emory University
                                                            Math/CS Seminar
                                                             January 20, 2012

                           With Pooya Esfandiar, Francesco Bonchi, Chen Greif,
                                    Laks V. S. Lakshmanan, and Byung-Won On


David F. Gleich (Purdue)            Emory Math/CS Seminar                1 / 47
MAIN RESULTS
A – adjacency matrix                          For Katz
L – Laplacian matrix                          Compute one       fast
                                              Compute top       fast
Katz score :
 
Commute time:
    

For Commute
Compute one                fast
Estimate  


David F. Gleich (Purdue)          Emory Math/CS Seminar                2 of 47
OUTLINE
Why study these measures?
Katz Rank and Commute Time
How else do people compute them?
Quadrature rules for pairwise scores
Sparse linear systems solves for top-k
As many results as we have time for…

David F. Gleich (Purdue)   Emory Math/CS Seminar   3 of 47
WHY? LINK PREDICTION
                                                                                Neighborhood based




                                                                                         Path based


                           Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient
David F. Gleich (Purdue)                        Emory Math/CS Seminar                                             4 of 47
NOTE


All graphs are undirected

All graphs are connected




David F. Gleich (Purdue)   Emory Math/CS Seminar   5 of 47
LEO KATZ




David F. Gleich (Purdue)   Emory Math/CS Seminar   6 of 47
NOT QUITE, WIKIPEDIA
    : adjacency,                     : random walk

                 PageRank                      

                           Katz                

These are equivalent if     has constant
degree

David F. Gleich (Purdue)          Emory Math/CS Seminar   7 of 47
WHAT KATZ ACTUALLY SAID
                                           “we assume that each link independently has the
                                           same probability of being effective” …

                                           “we conceive a constant     , depending
                                           on the group and the context of the particular
                                           investigation, which has the force of a probability
                                           of effectiveness of a single link. A k-step chain
                                           then, has probability       of being effective.”

                                           “We wish to find the column sums of the matrix”




                           Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43
David F. Gleich (Purdue)                        Emory Math/CS Seminar                                            8 of 47
A MODERN TAKE
The Katz score (node-based) is

                                     
The Katz score (edge-based) is

                                     


David F. Gleich (Purdue)     Emory Math/CS Seminar   9 of 47
RETURNING TO THE MATRIX
                                           

                                                      



                Carl Neumann                       
                                                 
David F. Gleich (Purdue)             Emory Math/CS Seminar   10 of 47
Carl Neumann
                           I’ve heard the Neumann series called the “von Neumann”
                           series more than I’d like! In fact, the von Neumann kernel
                               of a graph should be named the “Neumann” kernel!


                                                                                        Wikipedia page
David F. Gleich (Purdue)                     Emory Math/CS Seminar                              11 / 47
PROPERTIES OF KATZ’S MATRIX
  is symmetric

  exists when  

                           is spd when  

Note that                       1/max-degree suffices


David F. Gleich (Purdue)           Emory Math/CS Seminar   12 of 47
COMMUTE TIME
                           Picture taken from Google images, seems to be
                                Bay Bridge Traffic by Jim M. Goldstein.


David F. Gleich (Purdue)               Emory Math/CS Seminar               13 / 47
COMMUTE TIME
Consider a uniform random walk on a
graph                    
                                                   Also called the hitting
                                                   time from node i to j, or
                                                   the first transition time
                            
                                 




David F. Gleich (Purdue)   Emory Math/CS Seminar                      14 of 47
SKIPPING DETAILS

                              : graph Laplacian

 

                           is the only null-vector



David F. Gleich (Purdue)             Emory Math/CS Seminar   15 of 47
WHAT DO OTHER PEOPLE DO?
1) Just work with the linear algebra formulations

2) For Katz, Truncate the Neumann series as a few (3-5)
   terms, or MMQ (Benzi & Boito)

3) Use low-rank approximations from EVD(A) or EVD(L)

4) For commute, use Johnson-Lindenstrauss inspired
   random sampling

5) Approximately decompose into smaller problems

                                                    Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,
                           Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007
David F. Gleich (Purdue)                 Emory Math/CS Seminar                                        16 of 47
THE PROBLEM

                             All of these techniques are
                            preprocessing based because
                           most people’s goal is to compute
                                     all the scores.

                                  We want to avoid
                               preprocessing the graph.

                            There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse
David F. Gleich (Purdue)                               Emory Math/CS Seminar                                               17 of 47
WHY NO PREPROCESSING?




                           The graph is constantly changing
                                as I rate new movies.
David F. Gleich (Purdue)              Emory Math/CS Seminar   18 of 47
WHY NO PREPROCESSING?

                                                      Top-k predicted “links”
                                                      are movies to watch!




                           Pairwise scores give
                                  user similarity



David F. Gleich (Purdue)      Emory Math/CS Seminar                             19 of 47
PAIR-WISE ALGORITHMS


David F. Gleich (Purdue)   Emory Math/CS Seminar   20 / 47
PAIRWISE ALGORITHMS
                Katz        

Commute  

                               Golub and Meurant
                                 to the rescue!




David F. Gleich (Purdue)         Emory Math/CS Seminar   21 of 47
MMQ - THE BIG IDEA
Quadratic form                                             Think  
      


Weighted sum                                               A is s.p.d. use EVD
      


Stieltjes integral                                         “A tautology”

      

Quadrature approximation  
          


Matrix equation                                            Lanczos
David F. Gleich (Purdue)           Emory Math/CS Seminar               22 of 47
LANCZOS
    , k-steps of the Lanczos method
produce

                                      and  


                                         =           
                                                                 




David F. Gleich (Purdue)          Emory Math/CS Seminar             23 of 47
PRACTICAL LANCZOS

Only need to store the last 2 vectors in  

Updating requires O(matvec) work

            is not orthogonal



David F. Gleich (Purdue)   Emory Math/CS Seminar   24 of 47
MMQ PROCEDURE
Goal  
Given  

1. Run k-steps of Lanczos on   starting with  
2. Compute   ,   with an additional eigenvalue at   ,
        set                                   Correspond to a Gauss-Radau rule, with
                                              u as a prescribed node
3. Compute   ,   with an additional eigenvalue at   , set
                                              Correspond to a Gauss-Radau rule, with
                                              l as a prescribed node
4. Output                  as lower and upper bounds on  



David F. Gleich (Purdue)       Emory Math/CS Seminar                                   25 of 47
PRACTICAL MMQ
Increase k to become more accurate

Bad eigenvalue bounds yield worse results

            and            are easy to compute


    not required, we can iteratively
update it’s LU factorization
David F. Gleich (Purdue)         Emory Math/CS Seminar   26 of 47
PRACTICAL MMQ




David F. Gleich (Purdue)   Emory Math/CS Seminar   27 of 47
ONE LAST STEP FOR KATZ
                      Katz    

                                      

 
                                   



David F. Gleich (Purdue)         Emory Math/CS Seminar   28 of 47
COLUMN-WISE
           ALGORITHMS

David F. Gleich (Purdue)   Emory Math/CS Seminar   29 / 47
COLUMN-WISE COMMUTE-TIME
 
              requires entire diagonal of   


                                            =           
                                                                           


                                                            Each vector   is
                                                            computed by the a
                                                            Lanczos based CG algorithm.

                                                                              Paige and Saunders, 1975
David F. Gleich (Purdue)            Emory Math/CS Seminar                                      30 of 47
COLUMN-WISE COMMUTE TIME




                                                                     following
                                                   Hein et al. 2010 is a
                                                   MUCH better and faster
                                                   approximation.
David F. Gleich (Purdue)   Emory Math/CS Seminar                                 31
KATZ SCORES ARE LOCALIZED

                                                   Up to 50 neighbors is
                                                   99.65% of the total
                                                   mass




David F. Gleich (Purdue)   Emory Math/CS Seminar                       32 of 47
PARTICIPATION RATIOS

Participation
   Ratios

 “effective
non-zeros”
in a vector


 




    David F. Gleich (Purdue)   Emory Math/CS Seminar   33 of 47
TOP-K ALGORITHM FOR KATZ

Approximate  
               
where   is sparse

Keep   sparse too
Ideally, don’t “touch” all of  


David F. Gleich (Purdue)   Emory Math/CS Seminar   34 of 47
INSPIRATION - PAGERANK

 Approximate  
                
 where   is sparse

 Keep   sparse too? YES!
 Ideally, don’t “touch” all of   ? YES!

McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.
 David F. Gleich (Purdue)                         Emory Math/CS Seminar                                         35 of 47
THE ALGORITHM - MCSHERRY
For  
Start with the Richardson iteration
         
Rewrite
             

Richardson converges if  


David F. Gleich (Purdue)   Emory Math/CS Seminar   36 of 47
THE ALGORITHM
Note   is sparse.

If                         , then              is sparse.

Idea
  only add one component of                                   to  



David F. Gleich (Purdue)              Emory Math/CS Seminar          37 of 47
THE ALGORITHM
For  

Init:  
 
 

How to pick   ?

David F. Gleich (Purdue)   Emory Math/CS Seminar   38 of 47
THE ALGORITHM FOR KATZ
For  

Init:  
 
 

Pick   as max                              
    Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more
David F. Gleich (Purdue)                            Emory Math/CS Seminar                                           39 of 47
CONVERGENCE?
If    1/max-degree then       is sub-
stochastic and the PageRank based proof
applies because the matrix is diagonally
dominant

For            , then for symmetric   , this
algorithm is the Gauss-Southwell
procedure and it still converges.


David F. Gleich (Purdue)   Emory Math/CS Seminar   40 of 47
NONSYMMETRIC ALGORITHM?
Was this algorithm known in the non-
symmetric case?

YES! It’s just Gauss-Seidel/Gauss-
Southwell, but with a column vs. row
orientation.

Called dynamic residual to AMGers

David F. Gleich (Purdue)   Emory Math/CS Seminar   41 of 47
RESULTS – DATA, PARAMETERS




                All unweighted, connected graphs
                     Easy                                    
                       Hard                            
David F. Gleich (Purdue)         Emory Math/CS Seminar          42 of 47
KATZ BOUND CONVERGENCE




David F. Gleich (Purdue)   Emory Math/CS Seminar   43 of 47
COMMUTE BOUND CONVERG.




David F. Gleich (Purdue)   Emory Math/CS Seminar   44 of 47
KATZ SET CONVERGENCE




                                                   For arXiv graph.
David F. Gleich (Purdue)   Emory Math/CS Seminar           45 of 47
TIMING




David F. Gleich (Purdue)   Emory Math/CS Seminar   46 of 47
CONCLUSIONS
These algorithms are faster than many
alternatives.

For pairwise commute, stopping criteria
are simpler with bounds

For top-k problems, we often need less
than 1 matvec for good enough results

David F. Gleich (Purdue)   Emory Math/CS Seminar   47 of 47
Paper at WAW2010 and J. Internet Mathematics

                                                    Slides online

                                                    Code is online already
                                                        www.cs.purdue.edu/
                                                        homes/dgleich/codes/fast-katz-2011




        By AngryDogDesign on DeviantArt



David F. Gleich (Purdue)                  Emory Math/CS Seminar                              48

Contenu connexe

En vedette

MATRICES
MATRICESMATRICES
MATRICESfaijmsk
 
introduction to graph theory
introduction to graph theoryintroduction to graph theory
introduction to graph theoryChuckie Balbuena
 
Inversion of Control in MVC
Inversion of Control in MVCInversion of Control in MVC
Inversion of Control in MVCSunny Sharma
 
Real Time Data Visualization using asp.net / SignalR + D3.js
Real Time Data Visualization using asp.net / SignalR + D3.jsReal Time Data Visualization using asp.net / SignalR + D3.js
Real Time Data Visualization using asp.net / SignalR + D3.jsSunny Sharma
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part Ijakehofman
 
Matrices And Application Of Matrices
Matrices And Application Of MatricesMatrices And Application Of Matrices
Matrices And Application Of Matricesmailrenuka
 
Ivanna's Baby Boutique
Ivanna's Baby BoutiqueIvanna's Baby Boutique
Ivanna's Baby BoutiqueIllimedellin
 
Bill gates
Bill gatesBill gates
Bill gatesjb124578
 
La Comunicación. Módulo 4
La Comunicación. Módulo 4La Comunicación. Módulo 4
La Comunicación. Módulo 4AnaCastagnetto78
 
Pablo soria de lachica los beneficios de convertirse en un scout
Pablo soria de lachica  los beneficios de convertirse en un scoutPablo soria de lachica  los beneficios de convertirse en un scout
Pablo soria de lachica los beneficios de convertirse en un scoutPablo Soria
 
Xl carrera kilómetros de solidaridad – con los niños de mali
Xl carrera kilómetros de solidaridad – con los niños de maliXl carrera kilómetros de solidaridad – con los niños de mali
Xl carrera kilómetros de solidaridad – con los niños de maliFamilia Sánchez
 

En vedette (20)

MATRICES
MATRICESMATRICES
MATRICES
 
introduction to graph theory
introduction to graph theoryintroduction to graph theory
introduction to graph theory
 
Inversion of Control in MVC
Inversion of Control in MVCInversion of Control in MVC
Inversion of Control in MVC
 
Real Time Data Visualization using asp.net / SignalR + D3.js
Real Time Data Visualization using asp.net / SignalR + D3.jsReal Time Data Visualization using asp.net / SignalR + D3.js
Real Time Data Visualization using asp.net / SignalR + D3.js
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part I
 
Matrices And Application Of Matrices
Matrices And Application Of MatricesMatrices And Application Of Matrices
Matrices And Application Of Matrices
 
Diodo
DiodoDiodo
Diodo
 
Trabajo de música
Trabajo de músicaTrabajo de música
Trabajo de música
 
Las tic’s
Las tic’sLas tic’s
Las tic’s
 
Problemas disoluciones
Problemas disolucionesProblemas disoluciones
Problemas disoluciones
 
Halloweeen
HalloweeenHalloweeen
Halloweeen
 
Seo
SeoSeo
Seo
 
Ivanna's Baby Boutique
Ivanna's Baby BoutiqueIvanna's Baby Boutique
Ivanna's Baby Boutique
 
Halloween parte 1
Halloween parte 1Halloween parte 1
Halloween parte 1
 
Bill gates
Bill gatesBill gates
Bill gates
 
LAS TIC
LAS TICLAS TIC
LAS TIC
 
La Comunicación. Módulo 4
La Comunicación. Módulo 4La Comunicación. Módulo 4
La Comunicación. Módulo 4
 
El sermon-de-las-siete-palabras
El sermon-de-las-siete-palabrasEl sermon-de-las-siete-palabras
El sermon-de-las-siete-palabras
 
Pablo soria de lachica los beneficios de convertirse en un scout
Pablo soria de lachica  los beneficios de convertirse en un scoutPablo soria de lachica  los beneficios de convertirse en un scout
Pablo soria de lachica los beneficios de convertirse en un scout
 
Xl carrera kilómetros de solidaridad – con los niños de mali
Xl carrera kilómetros de solidaridad – con los niños de maliXl carrera kilómetros de solidaridad – con los niños de mali
Xl carrera kilómetros de solidaridad – con los niños de mali
 

Plus de David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 

Plus de David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 

Dernier

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Dernier (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

Fast pair-wise and node-wise algorithms for commute times and Katz scores

  • 1. FAST MATRIX COMPUTATIONS FOR PAIRWISE AND COLUMN-WISE KATZ SCORES AND COMMUTE TIMES David F. Gleich Purdue University Emory University Math/CS Seminar January 20, 2012 With Pooya Esfandiar, Francesco Bonchi, Chen Greif, Laks V. S. Lakshmanan, and Byung-Won On David F. Gleich (Purdue) Emory Math/CS Seminar 1 / 47
  • 2. MAIN RESULTS A – adjacency matrix For Katz L – Laplacian matrix Compute one       fast Compute top       fast Katz score :   Commute time:   For Commute Compute one   fast Estimate   David F. Gleich (Purdue) Emory Math/CS Seminar 2 of 47
  • 3. OUTLINE Why study these measures? Katz Rank and Commute Time How else do people compute them? Quadrature rules for pairwise scores Sparse linear systems solves for top-k As many results as we have time for… David F. Gleich (Purdue) Emory Math/CS Seminar 3 of 47
  • 4. WHY? LINK PREDICTION Neighborhood based Path based Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient David F. Gleich (Purdue) Emory Math/CS Seminar 4 of 47
  • 5. NOTE All graphs are undirected All graphs are connected David F. Gleich (Purdue) Emory Math/CS Seminar 5 of 47
  • 6. LEO KATZ David F. Gleich (Purdue) Emory Math/CS Seminar 6 of 47
  • 7. NOT QUITE, WIKIPEDIA     : adjacency,                     : random walk PageRank               Katz               These are equivalent if     has constant degree David F. Gleich (Purdue) Emory Math/CS Seminar 7 of 47
  • 8. WHAT KATZ ACTUALLY SAID “we assume that each link independently has the same probability of being effective” … “we conceive a constant     , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link. A k-step chain then, has probability       of being effective.” “We wish to find the column sums of the matrix” Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43 David F. Gleich (Purdue) Emory Math/CS Seminar 8 of 47
  • 9. A MODERN TAKE The Katz score (node-based) is            The Katz score (edge-based) is            David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47
  • 10. RETURNING TO THE MATRIX                                         Carl Neumann                             David F. Gleich (Purdue) Emory Math/CS Seminar 10 of 47
  • 11. Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! Wikipedia page David F. Gleich (Purdue) Emory Math/CS Seminar 11 / 47
  • 12. PROPERTIES OF KATZ’S MATRIX   is symmetric   exists when     is spd when   Note that   1/max-degree suffices David F. Gleich (Purdue) Emory Math/CS Seminar 12 of 47
  • 13. COMMUTE TIME Picture taken from Google images, seems to be Bay Bridge Traffic by Jim M. Goldstein. David F. Gleich (Purdue) Emory Math/CS Seminar 13 / 47
  • 14. COMMUTE TIME Consider a uniform random walk on a graph                     Also called the hitting time from node i to j, or the first transition time                                                       David F. Gleich (Purdue) Emory Math/CS Seminar 14 of 47
  • 15. SKIPPING DETAILS   : graph Laplacian     is the only null-vector David F. Gleich (Purdue) Emory Math/CS Seminar 15 of 47
  • 16. WHAT DO OTHER PEOPLE DO? 1) Just work with the linear algebra formulations 2) For Katz, Truncate the Neumann series as a few (3-5) terms, or MMQ (Benzi & Boito) 3) Use low-rank approximations from EVD(A) or EVD(L) 4) For commute, use Johnson-Lindenstrauss inspired random sampling 5) Approximately decompose into smaller problems Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009, Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007 David F. Gleich (Purdue) Emory Math/CS Seminar 16 of 47
  • 17. THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse David F. Gleich (Purdue) Emory Math/CS Seminar 17 of 47
  • 18. WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies. David F. Gleich (Purdue) Emory Math/CS Seminar 18 of 47
  • 19. WHY NO PREPROCESSING? Top-k predicted “links” are movies to watch! Pairwise scores give user similarity David F. Gleich (Purdue) Emory Math/CS Seminar 19 of 47
  • 20. PAIR-WISE ALGORITHMS David F. Gleich (Purdue) Emory Math/CS Seminar 20 / 47
  • 21. PAIRWISE ALGORITHMS Katz   Commute   Golub and Meurant to the rescue! David F. Gleich (Purdue) Emory Math/CS Seminar 21 of 47
  • 22. MMQ - THE BIG IDEA Quadratic form   Think     Weighted sum   A is s.p.d. use EVD   Stieltjes integral   “A tautology”   Quadrature approximation     Matrix equation   Lanczos David F. Gleich (Purdue) Emory Math/CS Seminar 22 of 47
  • 23. LANCZOS   , k-steps of the Lanczos method produce   and          =               David F. Gleich (Purdue) Emory Math/CS Seminar 23 of 47
  • 24. PRACTICAL LANCZOS Only need to store the last 2 vectors in   Updating requires O(matvec) work   is not orthogonal David F. Gleich (Purdue) Emory Math/CS Seminar 24 of 47
  • 25. MMQ PROCEDURE Goal   Given   1. Run k-steps of Lanczos on   starting with   2. Compute   ,   with an additional eigenvalue at   , set   Correspond to a Gauss-Radau rule, with u as a prescribed node 3. Compute   ,   with an additional eigenvalue at   , set   Correspond to a Gauss-Radau rule, with l as a prescribed node 4. Output   as lower and upper bounds on   David F. Gleich (Purdue) Emory Math/CS Seminar 25 of 47
  • 26. PRACTICAL MMQ Increase k to become more accurate Bad eigenvalue bounds yield worse results   and   are easy to compute   not required, we can iteratively update it’s LU factorization David F. Gleich (Purdue) Emory Math/CS Seminar 26 of 47
  • 27. PRACTICAL MMQ David F. Gleich (Purdue) Emory Math/CS Seminar 27 of 47
  • 28. ONE LAST STEP FOR KATZ Katz          David F. Gleich (Purdue) Emory Math/CS Seminar 28 of 47
  • 29. COLUMN-WISE ALGORITHMS David F. Gleich (Purdue) Emory Math/CS Seminar 29 / 47
  • 30. COLUMN-WISE COMMUTE-TIME     requires entire diagonal of              =                        Each vector   is computed by the a   Lanczos based CG algorithm. Paige and Saunders, 1975 David F. Gleich (Purdue) Emory Math/CS Seminar 30 of 47
  • 31. COLUMN-WISE COMMUTE TIME   following Hein et al. 2010 is a MUCH better and faster approximation. David F. Gleich (Purdue) Emory Math/CS Seminar 31
  • 32. KATZ SCORES ARE LOCALIZED Up to 50 neighbors is 99.65% of the total mass David F. Gleich (Purdue) Emory Math/CS Seminar 32 of 47
  • 33. PARTICIPATION RATIOS Participation Ratios “effective non-zeros” in a vector   David F. Gleich (Purdue) Emory Math/CS Seminar 33 of 47
  • 34. TOP-K ALGORITHM FOR KATZ Approximate     where   is sparse Keep   sparse too Ideally, don’t “touch” all of   David F. Gleich (Purdue) Emory Math/CS Seminar 34 of 47
  • 35. INSPIRATION - PAGERANK Approximate     where   is sparse Keep   sparse too? YES! Ideally, don’t “touch” all of   ? YES! McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too. David F. Gleich (Purdue) Emory Math/CS Seminar 35 of 47
  • 36. THE ALGORITHM - MCSHERRY For   Start with the Richardson iteration   Rewrite   Richardson converges if   David F. Gleich (Purdue) Emory Math/CS Seminar 36 of 47
  • 37. THE ALGORITHM Note   is sparse. If   , then   is sparse. Idea only add one component of   to   David F. Gleich (Purdue) Emory Math/CS Seminar 37 of 47
  • 38. THE ALGORITHM For   Init:       How to pick   ? David F. Gleich (Purdue) Emory Math/CS Seminar 38 of 47
  • 39. THE ALGORITHM FOR KATZ For   Init:       Pick   as max   Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more David F. Gleich (Purdue) Emory Math/CS Seminar 39 of 47
  • 40. CONVERGENCE? If   1/max-degree then   is sub- stochastic and the PageRank based proof applies because the matrix is diagonally dominant For   , then for symmetric   , this algorithm is the Gauss-Southwell procedure and it still converges. David F. Gleich (Purdue) Emory Math/CS Seminar 40 of 47
  • 41. NONSYMMETRIC ALGORITHM? Was this algorithm known in the non- symmetric case? YES! It’s just Gauss-Seidel/Gauss- Southwell, but with a column vs. row orientation. Called dynamic residual to AMGers David F. Gleich (Purdue) Emory Math/CS Seminar 41 of 47
  • 42. RESULTS – DATA, PARAMETERS All unweighted, connected graphs Easy                                     Hard                             David F. Gleich (Purdue) Emory Math/CS Seminar 42 of 47
  • 43. KATZ BOUND CONVERGENCE David F. Gleich (Purdue) Emory Math/CS Seminar 43 of 47
  • 44. COMMUTE BOUND CONVERG. David F. Gleich (Purdue) Emory Math/CS Seminar 44 of 47
  • 45. KATZ SET CONVERGENCE For arXiv graph. David F. Gleich (Purdue) Emory Math/CS Seminar 45 of 47
  • 46. TIMING David F. Gleich (Purdue) Emory Math/CS Seminar 46 of 47
  • 47. CONCLUSIONS These algorithms are faster than many alternatives. For pairwise commute, stopping criteria are simpler with bounds For top-k problems, we often need less than 1 matvec for good enough results David F. Gleich (Purdue) Emory Math/CS Seminar 47 of 47
  • 48. Paper at WAW2010 and J. Internet Mathematics Slides online Code is online already www.cs.purdue.edu/ homes/dgleich/codes/fast-katz-2011 By AngryDogDesign on DeviantArt David F. Gleich (Purdue) Emory Math/CS Seminar 48