SlideShare a Scribd company logo
1 of 47
Download to read offline
FAST MATRIX COMPUTATIONS FOR
                    PAIRWISE AND COLUMN-WISE KATZ
                         SCORES AND COMMUTE TIMES
                                                                   David F. Gleich
                                                                Purdue University

                                                            University of Chicago
                                   Statistical and Scientific Computing Seminar
                                                               October 6th, 2011

                           With Pooya Esfandiar, Francesco Bonchi, Chen Grief,
                                    Laks V. S. Lakshmanan, and Byung-Won On


David F. Gleich (Purdue)           Univ. Chicago SSCS Seminar                1 / 47
MAIN RESULTS
A – adjacency matrix                      For Katz
L – Laplacian matrix                      Compute one       fast
                                          Compute top       fast
Katz score :
                         
Commute time:
                  

For Commute
Compute one       fast



David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar              2 of 47
OUTLINE
Why study these measures?
Katz Rank and Commute Time
How else do people compute them?
Quadrature rules for pairwise scores
Sparse linear systems solves for top-k
As many results as we have time for…

David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   3 of 47
WHY? LINK PREDICTION
                                                                                 Neighborhood based




                                                                                         Path based


                           Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient
David F. Gleich (Purdue)                       Univ. Chicago SSCS Seminar                                         4 of 47
NOTE


All graphs are undirected

All graphs are connected




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   5 of 47
LEO KATZ




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   6 of 47
NOT QUITE, WIKIPEDIA
    : adjacency,                     : random walk

                 PageRank                      

                           Katz                

These are equivalent if     has constant
degree

David F. Gleich (Purdue)          Univ. Chicago SSCS Seminar   7 of 47
WHAT KATZ ACTUALLY SAID
                                           “we assume that each link independently has the
                                           same probability of being effective” …

                                           “we conceive a constant     , depending
                                           on the group and the context of the particular
                                           investigation, which has the force of a probability
                                           of effectiveness of a single link. A k-step chain
                                           then, has probability       of being effective.”

                                           “We wish to find the column sums of the matrix”




                           Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43
David F. Gleich (Purdue)                       Univ. Chicago SSCS Seminar                                        8 of 47
A MODERN TAKE
The Katz score (node-based) is

                                     
The Katz score (edge-based) is

                                     


David F. Gleich (Purdue)     Univ. Chicago SSCS Seminar   9 of 47
RETURNING TO THE MATRIX
                                           

                                                      



                Carl Neumann                        
                                                 
David F. Gleich (Purdue)            Univ. Chicago SSCS Seminar   10 of 47
Carl Neumann
                           I’ve heard the Neumann series called the “von Neumann”
                           series more than I’d like! In fact, the von Neumann kernel
                               of a graph should be named the “Neumann” kernel!


                                                                                        Wikipedia page
David F. Gleich (Purdue)                   Univ. Chicago SSCS Seminar                           11 / 47
PROPERTIES OF KATZ’S MATRIX
    is symmetric

    exists when                      

                is spd when                      

Note that         1/max-degree suffices


David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   12 of 47
COMMUTE TIME
                           Picture taken from Google images, seems to be
                                Bay Bridge Traffic by Jim M. Goldstein.


David F. Gleich (Purdue)             Univ. Chicago SSCS Seminar            13 / 47
COMMUTE TIME
Consider a uniform random walk on a
graph                    
                                                        Also called the hitting
                                                        time from node i to j, or
                                                        the first transition time
                            
                                 




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar                      14 of 47
SKIPPING DETAILS

                    : graph Laplacian

                               

              is the only null-vector



David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   15 of 47
WHAT DO OTHER PEOPLE DO?
1) Just work with the linear algebra formulations

2) For Katz, Truncate the Neumann series as a few (3-5)
   terms

3) Use low-rank approximations from EVD(A) or EVD(L)

4) For commute, use Johnson-Lindenstrauss inspired
   random sampling

5) Approximately decompose into smaller problems

                                                    Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,
                           Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007,Wang et al. ICDM2007
David F. Gleich (Purdue)                Univ. Chicago SSCS Seminar                                    16 of 47
THE PROBLEM

                             All of these techniques are
                            preprocessing based because
                           most people’s goal is to compute
                                     all the scores.

                                   We want to avoid
                                preprocessing the graph.

                            There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse
David F. Gleich (Purdue)                              Univ. Chicago SSCS Seminar                                           17 of 47
WHY NO PREPROCESSING?




                           The graph is constantly changing
                                as I rate new movies.
David F. Gleich (Purdue)             Univ. Chicago SSCS Seminar   18 of 47
WHY NO PREPROCESSING?

                                                          Top-k predicted “links”
                                                          are movies to watch!




                           Pairwise scores give
                                  user similarity



David F. Gleich (Purdue)     Univ. Chicago SSCS Seminar                             19 of 47
PAIR-WISE ALGORITHMS


David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   20 / 47
PAIRWISE ALGORITHMS
                Katz                  

Commute                                        

                              Golub and Meurant
                                to the rescue!




David F. Gleich (Purdue)       Univ. Chicago SSCS Seminar   21 of 47
MMQ - THE BIG IDEA
Quadratic form                                               Think                    
       


Weighted sum                                                 A is s.p.d. use EVD

       


Stieltjes integral                                           “A tautology”

       


Quadrature approximation                                          
        

Matrix equation                                              Lanczos
David F. Gleich (Purdue)    Univ. Chicago SSCS Seminar                          22 of 47
LANCZOS
        , k-steps of the Lanczos method
produce

                                    and                      


                                          =            
                                                                     




David F. Gleich (Purdue)         Univ. Chicago SSCS Seminar             23 of 47
PRACTICAL LANCZOS

Only need to store the last 2 vectors in      

Updating requires O(matvec) work

      is not orthogonal



David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   24 of 47
MMQ PROCEDURE
Goal                        
Given                        

1. Run k-steps of Lanczos on     starting with    
2. Compute       ,     with an additional eigenvalue at     ,
        set                                       Correspond to a Gauss-Radau rule, with
                                                  u as a prescribed node
3. Compute     ,     with an additional eigenvalue at   , set
                                                  Correspond to a Gauss-Radau rule, with
                                                  l as a prescribed node
4. Output               as lower and upper bounds on    



David F. Gleich (Purdue)        Univ. Chicago SSCS Seminar                                 25 of 47
PRACTICAL MMQ
Increase k to become more accurate

Bad eigenvalue bounds yield worse results

      and     are easy to compute


    not required, we can iteratively
update it’s LU factorization
David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   26 of 47
PRACTICAL MMQ




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   27 of 47
ONE LAST STEP FOR KATZ
                     Katz              

                                   

                                  
                                      



David F. Gleich (Purdue)        Univ. Chicago SSCS Seminar   28 of 47
COLUMN-WISE
          ALGORITHMS

David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   29 / 47
COLUMN-WISE COMMUTE-TIME
                
      requires entire diagonal of     



                                           =            
                                                                      



                                                              Each vector     is
                                                              computed by the a
                                                              Lanczos based CG algorithm.

                                                                          Paige and Saunders, 1975
David F. Gleich (Purdue)         Univ. Chicago SSCS Seminar                                30 of 47
COLUMN-WISE COMMUTE TIME




                                                                     following
                                                   Hein et al. 2010 is a
                                                   MUCH better and faster
                                                   approximation.
David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar                            31
KATZ SCORES ARE LOCALIZED

                                                        Up to 50 neighbors is
                                                        99.65% of the total
                                                        mass




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar                       32 of 47
PARTICIPATION RATIOS

Participation
   Ratios

 “effective
non-zeros”
in a vector


      



  David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   33 of 47
TOP-K ALGORITHM FOR KATZ

Approximate    
                             
where     is sparse

Keep     sparse too
Ideally, don’t “touch” all of    


David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   34 of 47
INSPIRATION - PAGERANK

 Approximate    
                              
 where     is sparse

 Keep     sparse too? YES!
 Ideally, don’t “touch” all of     ? YES!

McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.
 David F. Gleich (Purdue)                        Univ. Chicago SSCS Seminar                                     35 of 47
THE ALGORITHM - MCSHERRY
For              
Start with the Richardson iteration
                                      
Rewrite
                                     

Richardson converges if                        


David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   36 of 47
THE ALGORITHM
Note     is sparse.

If                 , then         is sparse.

Idea
  only add one component of         to        



David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   37 of 47
THE ALGORITHM
For              

Init:                                
                
                

How to pick   ?

David F. Gleich (Purdue)      Univ. Chicago SSCS Seminar   38 of 47
THE ALGORITHM FOR KATZ
For                            

Init:                                  
               
                      

Pick   as max                               
   Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more
David F. Gleich (Purdue)                         Univ. Chicago SSCS Seminar                                        39 of 47
CONVERGENCE?
If         1/max-degree then       is sub-
stochastic and the PageRank based proof
applies because the matrix is diagonally
dominant

For                       , then for symmetric     ,
this algorithm is the Gauss-Southwell
procedure and it still converges.


David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   40 of 47
RESULTS – DATA, PARAMETERS




               All unweighted, connected graphs
                    Easy                                    
                      Hard                            
David F. Gleich (Purdue)       Univ. Chicago SSCS Seminar      41 of 47
KATZ BOUND CONVERGENCE




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   42 of 47
COMMUTE BOUND CONVERG.




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   43 of 47
KATZ SET CONVERGENCE




                                                        For arXiv graph.
David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar           44 of 47
TIMING




David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   45 of 47
CONCLUSIONS
These algorithms are faster than many
alternatives.

For pairwise commute, stopping criteria
are simpler with bounds

For top-k problems, we often need less
than 1 matvec for good enough results

David F. Gleich (Purdue)   Univ. Chicago SSCS Seminar   46 of 47
Paper at WAW2010, J. Internet Mathematics

                                                    Slides should be online soon

                                                    Code is online already
                                                       www.cs.purdue.edu/homes/dgleich/
                                                       /codes/fast-katz-2011




        By AngryDogDesign on DeviantArt



David F. Gleich (Purdue)              Univ. Chicago SSCS Seminar                            47

More Related Content

Viewers also liked

The Future in One Picture
The Future in One PictureThe Future in One Picture
The Future in One PictureTim O'Reilly
 
Digital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the worldDigital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the worldOReillyStrata
 
Yusuf mapping the creative industries in jordan 15 11 2012
Yusuf mapping the creative industries in jordan 15 11 2012Yusuf mapping the creative industries in jordan 15 11 2012
Yusuf mapping the creative industries in jordan 15 11 2012Yusuf Mansur
 
Visual Conversations on Urban Futures - DRS 2016
Visual Conversations on Urban Futures - DRS 2016Visual Conversations on Urban Futures - DRS 2016
Visual Conversations on Urban Futures - DRS 2016serena pollastri
 
The DiSo Project and the Open Web
The DiSo Project and the Open WebThe DiSo Project and the Open Web
The DiSo Project and the Open WebChris Messina
 
Beyond the Buzz Words: Social Media & Mobile Fundraising
Beyond the Buzz Words: Social Media & Mobile FundraisingBeyond the Buzz Words: Social Media & Mobile Fundraising
Beyond the Buzz Words: Social Media & Mobile FundraisingArtez Interactive
 
Big Data in Texas: Then, Now, and Ahead
Big Data in Texas: Then, Now, and AheadBig Data in Texas: Then, Now, and Ahead
Big Data in Texas: Then, Now, and AheadPaco Nathan
 
Seattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and BeyondSeattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and BeyondPaco Nathan
 
When Ruby Meets Java - The Power of Torquebox
When Ruby Meets Java - The Power of TorqueboxWhen Ruby Meets Java - The Power of Torquebox
When Ruby Meets Java - The Power of Torqueboxrockyjaiswal
 
Insight from CloverPoint - 3D Asset and Facilities Management
Insight from CloverPoint - 3D Asset and Facilities ManagementInsight from CloverPoint - 3D Asset and Facilities Management
Insight from CloverPoint - 3D Asset and Facilities ManagementCloverpoint
 
Kotter Eight Step Plan
Kotter Eight Step PlanKotter Eight Step Plan
Kotter Eight Step PlanCPA Australia
 
Just Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You ShouldJust Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You ShouldOReillyWhere20
 
Brand Manage Camp: Winning With Social Media
Brand Manage Camp: Winning With Social MediaBrand Manage Camp: Winning With Social Media
Brand Manage Camp: Winning With Social MediaCharlene Li
 
Global Considerations for sCRM Strategy
Global Considerations for sCRM StrategyGlobal Considerations for sCRM Strategy
Global Considerations for sCRM StrategyJesus Hoyos
 

Viewers also liked (18)

Word clouds
Word cloudsWord clouds
Word clouds
 
The Future in One Picture
The Future in One PictureThe Future in One Picture
The Future in One Picture
 
Digital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the worldDigital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the world
 
Yusuf mapping the creative industries in jordan 15 11 2012
Yusuf mapping the creative industries in jordan 15 11 2012Yusuf mapping the creative industries in jordan 15 11 2012
Yusuf mapping the creative industries in jordan 15 11 2012
 
Visual Conversations on Urban Futures - DRS 2016
Visual Conversations on Urban Futures - DRS 2016Visual Conversations on Urban Futures - DRS 2016
Visual Conversations on Urban Futures - DRS 2016
 
The DiSo Project and the Open Web
The DiSo Project and the Open WebThe DiSo Project and the Open Web
The DiSo Project and the Open Web
 
Beyond the Buzz Words: Social Media & Mobile Fundraising
Beyond the Buzz Words: Social Media & Mobile FundraisingBeyond the Buzz Words: Social Media & Mobile Fundraising
Beyond the Buzz Words: Social Media & Mobile Fundraising
 
Cio Exchange08
Cio Exchange08Cio Exchange08
Cio Exchange08
 
Big Data in Texas: Then, Now, and Ahead
Big Data in Texas: Then, Now, and AheadBig Data in Texas: Then, Now, and Ahead
Big Data in Texas: Then, Now, and Ahead
 
Seattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and BeyondSeattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and Beyond
 
When Ruby Meets Java - The Power of Torquebox
When Ruby Meets Java - The Power of TorqueboxWhen Ruby Meets Java - The Power of Torquebox
When Ruby Meets Java - The Power of Torquebox
 
Insight from CloverPoint - 3D Asset and Facilities Management
Insight from CloverPoint - 3D Asset and Facilities ManagementInsight from CloverPoint - 3D Asset and Facilities Management
Insight from CloverPoint - 3D Asset and Facilities Management
 
Kotter Eight Step Plan
Kotter Eight Step PlanKotter Eight Step Plan
Kotter Eight Step Plan
 
Stanford Ee380
Stanford Ee380Stanford Ee380
Stanford Ee380
 
Just Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You ShouldJust Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You Should
 
Brand Manage Camp: Winning With Social Media
Brand Manage Camp: Winning With Social MediaBrand Manage Camp: Winning With Social Media
Brand Manage Camp: Winning With Social Media
 
Publishers “in” Libraries: New Agents, New Roles, New Challenges
Publishers “in” Libraries:New Agents, New Roles, New ChallengesPublishers “in” Libraries:New Agents, New Roles, New Challenges
Publishers “in” Libraries: New Agents, New Roles, New Challenges
 
Global Considerations for sCRM Strategy
Global Considerations for sCRM StrategyGlobal Considerations for sCRM Strategy
Global Considerations for sCRM Strategy
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 

More from David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 

Recently uploaded

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Recently uploaded (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 

Fast matrix computations for pair-wise and column-wise Katz scores and commute times

  • 1. FAST MATRIX COMPUTATIONS FOR PAIRWISE AND COLUMN-WISE KATZ SCORES AND COMMUTE TIMES David F. Gleich Purdue University University of Chicago Statistical and Scientific Computing Seminar October 6th, 2011 With Pooya Esfandiar, Francesco Bonchi, Chen Grief, Laks V. S. Lakshmanan, and Byung-Won On David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 1 / 47
  • 2. MAIN RESULTS A – adjacency matrix For Katz L – Laplacian matrix Compute one       fast Compute top       fast Katz score :                          Commute time:                 For Commute Compute one       fast David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 2 of 47
  • 3. OUTLINE Why study these measures? Katz Rank and Commute Time How else do people compute them? Quadrature rules for pairwise scores Sparse linear systems solves for top-k As many results as we have time for… David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 3 of 47
  • 4. WHY? LINK PREDICTION Neighborhood based Path based Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 4 of 47
  • 5. NOTE All graphs are undirected All graphs are connected David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 5 of 47
  • 6. LEO KATZ David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 6 of 47
  • 7. NOT QUITE, WIKIPEDIA     : adjacency,                     : random walk PageRank               Katz               These are equivalent if     has constant degree David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 7 of 47
  • 8. WHAT KATZ ACTUALLY SAID “we assume that each link independently has the same probability of being effective” … “we conceive a constant     , depending on the group and the context of the particular investigation, which has the force of a probability of effectiveness of a single link. A k-step chain then, has probability       of being effective.” “We wish to find the column sums of the matrix” Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 8 of 47
  • 9. A MODERN TAKE The Katz score (node-based) is            The Katz score (edge-based) is            David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 9 of 47
  • 10. RETURNING TO THE MATRIX                                         Carl Neumann                             David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 10 of 47
  • 11. Carl Neumann I’ve heard the Neumann series called the “von Neumann” series more than I’d like! In fact, the von Neumann kernel of a graph should be named the “Neumann” kernel! Wikipedia page David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 11 / 47
  • 12. PROPERTIES OF KATZ’S MATRIX     is symmetric     exists when                                       is spd when                       Note that         1/max-degree suffices David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 12 of 47
  • 13. COMMUTE TIME Picture taken from Google images, seems to be Bay Bridge Traffic by Jim M. Goldstein. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 13 / 47
  • 14. COMMUTE TIME Consider a uniform random walk on a graph                     Also called the hitting time from node i to j, or the first transition time                                                       David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 14 of 47
  • 15. SKIPPING DETAILS                     : graph Laplacian                                               is the only null-vector David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 15 of 47
  • 16. WHAT DO OTHER PEOPLE DO? 1) Just work with the linear algebra formulations 2) For Katz, Truncate the Neumann series as a few (3-5) terms 3) Use low-rank approximations from EVD(A) or EVD(L) 4) For commute, use Johnson-Lindenstrauss inspired random sampling 5) Approximately decompose into smaller problems Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009, Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007,Wang et al. ICDM2007 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 16 of 47
  • 17. THE PROBLEM All of these techniques are preprocessing based because most people’s goal is to compute all the scores. We want to avoid preprocessing the graph. There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 17 of 47
  • 18. WHY NO PREPROCESSING? The graph is constantly changing as I rate new movies. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 18 of 47
  • 19. WHY NO PREPROCESSING? Top-k predicted “links” are movies to watch! Pairwise scores give user similarity David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 19 of 47
  • 20. PAIR-WISE ALGORITHMS David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 20 / 47
  • 21. PAIRWISE ALGORITHMS Katz             Commute                                         Golub and Meurant to the rescue! David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 21 of 47
  • 22. MMQ - THE BIG IDEA Quadratic form                 Think                        Weighted sum           A is s.p.d. use EVD    Stieltjes integral           “A tautology”    Quadrature approximation              Matrix equation         Lanczos David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47
  • 23. LANCZOS         , k-steps of the Lanczos method produce                                     and                              =               David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 23 of 47
  • 24. PRACTICAL LANCZOS Only need to store the last 2 vectors in       Updating requires O(matvec) work       is not orthogonal David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 24 of 47
  • 25. MMQ PROCEDURE Goal                         Given                         1. Run k-steps of Lanczos on     starting with     2. Compute       ,     with an additional eigenvalue at     , set            Correspond to a Gauss-Radau rule, with u as a prescribed node 3. Compute     ,     with an additional eigenvalue at   , set           Correspond to a Gauss-Radau rule, with l as a prescribed node 4. Output               as lower and upper bounds on     David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
  • 26. PRACTICAL MMQ Increase k to become more accurate Bad eigenvalue bounds yield worse results       and     are easy to compute     not required, we can iteratively update it’s LU factorization David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 26 of 47
  • 27. PRACTICAL MMQ David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 27 of 47
  • 28. ONE LAST STEP FOR KATZ Katz                                                                              David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 28 of 47
  • 29. COLUMN-WISE ALGORITHMS David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 29 / 47
  • 30. COLUMN-WISE COMMUTE-TIME                        requires entire diagonal of             =                                      Each vector     is computed by the a                        Lanczos based CG algorithm. Paige and Saunders, 1975 David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 30 of 47
  • 31. COLUMN-WISE COMMUTE TIME   following Hein et al. 2010 is a MUCH better and faster approximation. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 31
  • 32. KATZ SCORES ARE LOCALIZED Up to 50 neighbors is 99.65% of the total mass David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
  • 33. PARTICIPATION RATIOS Participation Ratios “effective non-zeros” in a vector        David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 33 of 47
  • 34. TOP-K ALGORITHM FOR KATZ Approximate                    where     is sparse Keep     sparse too Ideally, don’t “touch” all of     David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47
  • 35. INSPIRATION - PAGERANK Approximate                    where     is sparse Keep     sparse too? YES! Ideally, don’t “touch” all of     ? YES! McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 35 of 47
  • 36. THE ALGORITHM - MCSHERRY For               Start with the Richardson iteration                             Rewrite                     Richardson converges if                         David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 36 of 47
  • 37. THE ALGORITHM Note     is sparse. If                 , then         is sparse. Idea only add one component of         to         David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 37 of 47
  • 38. THE ALGORITHM For               Init:                                                                   How to pick   ? David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 38 of 47
  • 39. THE ALGORITHM FOR KATZ For                             Init:                                                                          Pick   as max     Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 39 of 47
  • 40. CONVERGENCE? If         1/max-degree then       is sub- stochastic and the PageRank based proof applies because the matrix is diagonally dominant For                       , then for symmetric     , this algorithm is the Gauss-Southwell procedure and it still converges. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 40 of 47
  • 41. RESULTS – DATA, PARAMETERS All unweighted, connected graphs Easy                                     Hard                             David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 41 of 47
  • 42. KATZ BOUND CONVERGENCE David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 42 of 47
  • 43. COMMUTE BOUND CONVERG. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 43 of 47
  • 44. KATZ SET CONVERGENCE For arXiv graph. David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 44 of 47
  • 45. TIMING David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 45 of 47
  • 46. CONCLUSIONS These algorithms are faster than many alternatives. For pairwise commute, stopping criteria are simpler with bounds For top-k problems, we often need less than 1 matvec for good enough results David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 46 of 47
  • 47. Paper at WAW2010, J. Internet Mathematics Slides should be online soon Code is online already www.cs.purdue.edu/homes/dgleich/ /codes/fast-katz-2011 By AngryDogDesign on DeviantArt David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 47