SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Skew-symmetric matrix
completion for rank
aggregation !
and other matrix computations
DAVID F. GLEICH
PURDUE UNIVERSITY
COMPUTER SCIENCE DEPARTMENT




                                                                          1/40
February 24 th , 12pm
                               Purdue ML Seminar
 David Gleich, Purdue
Skew-symmetric matrix
completion for rank
aggregation !
and other matrix computations
DAVID F. GLEICH
PURDUE UNIVERSITY
COMPUTER SCIENCE DEPARTMENT




                                                                          2/40
February 24 th , 12pm
                               Purdue ML Seminar
 David Gleich, Purdue
Skew-symmetric matrix
completion for rank
aggregation !
and other matrix computations
DAVID F. GLEICH
PURDUE UNIVERSITY
COMPUTER SCIENCE DEPARTMENT




                                                                          3/40
January 24 th , 12pm
                               Purdue ML Seminar
 David Gleich, Purdue
4/40
Images copyright by their
respective owners
Matrix
 computations
  are the heart
       (and not
     brains) of
many methods
 of computing.




                                                             5/40
                  Purdue ML Seminar
 David Gleich, Purdue
Matrix computations
         Physics
        Statistics
       Engineering
        Graphics
       Databases
           …
     Machine learning




                                                       6/40
            Purdue ML Seminar
 David Gleich, Purdue
Matrix computations
         2                                                   3
             A1,1   A1,2        ···             A1,n
     6                                           . 7
                                                 . 7
     6 A2,1         A2,2        ···              . 7
   A=6 .
     6                                                7
     4 .            ..          ..
        .              .           .           Am 1,n 5
       Am,1         ···     Am,n        1       Am,n

  Ax = b            min kAx         bk             Ax = x




                                                                      7/40
Linear systems
      Least squares
                Eigenvalues
                           Purdue ML Seminar
 David Gleich, Purdue
NETWORK and
MATRIX COMPUTATIONS

  Why looking at networks of data as a matrix
      is a powerful and successful paradigm.
A new matrix-based sensitivity
analysis of Google’s PageRank.

   PageRank (I ↵P)x = (1 ↵)v
       SimRank
         Presented at" RAPr on Wikipedia
                             DiffusionRank
   WAW2007, WWW2010 
            E [x(A)]                                          Std [x(A)]

    BlockRank
             
            Published in the 
                                 United States                   IsoRank
          United States
                                 C:Living people                                   C:Living people
    TrustRank
   J. Internet Mathematics
                           
     France                         ItemRank
          C:Main topic classif.


   ObjectRank
                                                ProteinRank
      Led to new results on      United Kingdom                                    C:Contents

uncertainty quantification in     Germany                                           C:Ctgs. by country

    HostRank
        physical simulations
published in SIAM J. Matrix
                                 England
                                 Canada
                                                           SocialPageRank
         United Kingdom
                                                                                   France

 Random walk with
      Analysis and SIAM J.
      Scientific Computing.
                                 Japan
                                 Poland
                                                                FoodRank
          C:Fundamental
                                                                                   England

      restart
            Patent Pending
                           
     Australia
                                                               FutureRank
         C:Ctgs. by topic


    GeneRank
                                                  TwitterRank
                                                      Improved web-spam detection!
                                  Gleich (Stanford)           Random sensitivity                      Ph.D. Defense   23 / 41




               Collaborators Paul Constantine, Gianluca Iaccarino (physical simulation)
j

                                        Square                    s
2 F.L (Purdue)
vid Gleich
                         r
                                       Network alignment                          INFORMS Semina

= (t, )
twork alignment
 = t                                                t
                                  t
    mm                   40             60                 80                100
                             A                L                   B
                    NETWORK ALIGNMENT
X                       m ximize           wT x + 2 xT Sx
                                      T x + 1 xT Sx
     40 j               subject to Axw e, 2 {0, 1}
                         m ximize 
                              x             2
S           $           subject to            Ax  e
ng
ry                                             Network alignment
                                                2 {0, 1} problems
                                                    Sparse




                                                                                           10/40
                             Bayati, Gerritsen, Gleich, Saberi, and Wang, ICDM2009
UADRATIC                ASSIGNMENT       Bayati, Gleich, Saberi and Wang,often ignore
                                                          Sparse L Submitted
 60
                                          Southeast Ranking few exceptions).
                                                Purdue ML Seminar
 David Gleich, Purdue
    Network alignment                                       Workshop 11 / 29
Overlapping clusters!
                                 for distributed computation
                                 Andersen, Gleich, and Mirrokni, WSDM2012

                 2
                                          Swapping Probability (usroads)
                                          PageRank Communication (usroads)
                                          Swapping Probability (web−Google)
                1.5
                                          PageRank Communication (web−Google)
Relative Work




                 1                                                 Metis Partitioner




                0.5


                 0
                  1        1.1      1.2   1.3     1.4      1.5     1.6           1.7
                                          Volume Ratio




                                                                                             11/40
                      How much more of the graph we need to store.

                                                  Purdue ML Seminar
 David Gleich, Purdue
Local methods for massive                                                                                 Twee



    network analysis
 RESULTS – SLIDE THRE   Gleich et al. "
                 MAIN J. Internet Mathematics, to appear.
TOP-K ALGORITHM FOR KATZ

Approximate      
                                                          
where       is sparse

Keep       sparse too
Ideally, don’t “touch” all of      


David F. Gleich (Purdue)           Univ. Chicago SSCS Seminar                34 of 47


                  Can solve these problemsGleich milliseconds even withICME la/opt seminar
                                       David F. in (Sandia)              100M edges!




                                                                                                           12/40
                                                                Purdue ML Seminar
 David Gleich, Purdue
DAVID F. GLEICH (PURDUE) &
LEK-HENG LIM (UNIV. CHICAGO)




Rank
aggregation


                                                                           13
                                Purdue ML Seminar
 David Gleich, Purdue
Which is a better list of good DVDs?
Lord of the Rings 3: The Return of …
    Lord of the Rings 3: The Return of …
Lord of the Rings 1: The Fellowship 
    Lord of the Rings 1: The Fellowship 
Lord of the Rings 2: The Two Towers
     Lord of the Rings 2: The Two Towers
Lost: Season 1
                          Star Wars V: Empire Strikes Back
Battlestar Galactica: Season 1
          Raiders of the Lost Ark
Fullmetal Alchemist
                     Star Wars IV: A New Hope
Trailer Park Boys: Season 4
             Shawshank Redemption
Trailer Park Boys: Season 3
             Star Wars VI: Return of the Jedi
Tenchi Muyo!
                            Lord of the Rings 3: Bonus DVD
Shawshank Redemption
                    The Godfather
              Standard "                             Nuclear Norm "
           rank aggregation"                     based rank aggregation
           (the mean rating)
                  (not matrix completion on the




                                                                                   14/40
                                                    netflix rating matrix)

                                        Purdue ML Seminar
 David Gleich, Purdue
Rank Aggregation
 
 Given partial orders on subsets of items, rank aggregation
 is the problem of finding an overall ordering.
 
 Voting Find the winning candidate
 
 Program committees Find the best papers given reviews
 
 Dining Find the best restaurant in Chicago




                                                                        15/40
                             Purdue ML Seminar
 David Gleich, Purdue
Ranking is really hard
                            John Kemeny
                       Dwork, Kumar, Naor, !
   Ken Arrow
                                                           Sivikumar




All rank aggregations
involve some measure of   A good ranking is the
compromise
               “average” ranking under a        NP hard to compute
                                                           Kemeny’s ranking




                                                                                  16/40
                          permutation distance


                                      Purdue ML Seminar
 David Gleich, Purdue
Embody chair!
                                            John Cantrell (flickr)


Given a hard problem,
what do you do?!
!
Numerically relax!!
!
It’ll probably be easier.




                                                                     17/40
                         Purdue ML Seminar
 David Gleich, Purdue
Suppose we had scores
Suppose we had scores
Let    be the score of the ith movie/song/paper/team to rank

Suppose we can compare the ith to jth:

                                 

Then                                is skew-symmetric, rank 2.

Also works for                 with an extra log.

              Numerical ranking is intimately intertwined
              with skew-symmetric matrices




                                                                                                        18/40
                                      Kemeny and Snell, Mathematical Models in Social Sciences (1978)
David F. Gleich (Purdue)                Purdue
                                    KDD 2011      ML Seminar
 David Gleich, Purdue
 6/20
Using ratings as comparisons




                                                        Arithmetic Mean
Ratings induce
various skew-
symmetric matrices.
                                    Log-odds





                                                                           19/40
From David 1988 – The
Method of Paired Comparisons

                               Purdue ML Seminar
 David Gleich, Purdue
Extracting the scores
Extracting the scores

Given    with all entries, then                             107

            is the Borda




                                              Movie Pairs
                                                            105
  count, the least-squares
  solution to   

How many                   do we have?                      101
  Most.
                                                                    101           105
                                                                  Number of Comparisons
Do we trust all              ?
  Not really.                                                     Netflix data 17k movies,
                                                                  500k users, 100M ratings–
                                                                  99.17% filled




                                                                                                   20/40
David F. Gleich (Purdue)                   Purdue
                                         KDD 2011           ML Seminar
 David Gleich, Purdue
   8/20
Onlypartial info? COMPLETE IT!
Only partial info? Complete it!
Let             be known for                     We trust these scores.

Goal Find the simplest skew-symmetric matrix that matches
  the data   



                             
       noiseless




        noisy
                        




                                                                                           21/40
                                Both of these are NP-hard too.
                                             Purdue ML Seminar
 David Gleich, Purdue
David F. Gleich (Purdue)                   KDD 2011                                     9/20
Solution GO NUCLEAR!




                                                                                               22/40
From a French nuclear test in 1970, imagePurdue ML Seminar
 David Gleich, Purdue
                                          from http://picdit.wordpress.com/2008/07/21/8-
                                                                 insane-nuclear-explosions/
The nuclear norm
 The nuclear norm!
      The analog the 1-norm or    -norm for matrices 
     The analog of of the 1-norm or ℓ������1for matrices.

For vectors                           For matrices

                                      Let                be the SVD.

is NP-hard while                        

  

is convex and gives the same                    best convex under-
   answer “under appropriate               estimator of rank on unit ball.
   circumstances”




                                                                               23/40
                                    Purdue ML Seminar
 David Gleich, Purdue
Only partial info? COMPLETE IT!
 Only partial info? Complete it!
Let       be known for        We trust these scores.

Goal Find the simplest skew-symmetric matrix that matches
  the data   



                     
         NP hard




                                                                       Heuristic
                     
         Convex




                                                                              24/40
                            Purdue ML Seminar
 David Gleich, Purdue
Solving the !
Solving theproblem
nuclear norm nuclear            norm problem
Use a LASSO formulation         1.   
                                2. REPEAT
                                3.                 = rank-k SVD of
                                         

                                4.
                                5.   
                                         

                                6. UNTIL   
Jain et al. propose SVP for
   this problem without
     




                                                                         25/40
                              Purdue ML Seminar
 David Gleich, Purdue
Skew-symmetric SVDs
Skew-symmetric SVD
Let         be an                skew-symmetric matrix with
  eigenvalues                                      ,
  where                          and       . Then the SVD of    is
  given by




                             
for    and    given in the proof.
Proof Use the Murnaghan-Wintner form and the SVD of a
   2x2 skew-symmetric block

                 This means that SVP will give us the skew-




                                                                                      26/40
                 symmetric constraint “for free”
David F. Gleich (Purdue)            KDD 2011                                      14/20

                                       Purdue ML Seminar
 David Gleich, Purdue
Only partial info? Complete it!
         Let             be known for                        We trust these score
Matrix completion
         Goal Find the simplest skew-symmetric matrix that

          the data   

A fundamental
question is matrix
                                      
              NP hard

completion is 
when do these
problems have the
                                      
               Convex
same solution?




                                                                                    27/40
         David F. Gleich (Purdue)                      KDD 2011

                                         Purdue ML Seminar
 David Gleich, Purdue
indices. Instead we view the following theorem as providing




                                                                           Fraction of trials recovered
                                                                                                           1
       intuition for the noisy problem.
                                                                                                          0.8

Exact recovery results
         Consider the operator basis for Hermitian matrices:
Exact recovery results
             H = S [ K [ D where                                          0.4
                                                                                                          0.6

                     p
             S = {1/ 2(ei eT + ej eT ) : 1  i < j  n};
David Gross showed how to recover Hermitian matrices. 0.2
                    p
                             j       i

             K = {ı/ 2(ei eT ej eT ) : 1we get n}; exact   
  i.e. the conditions under which  i < j the
                            j       i                                       0
                                                                              2
                                                                            10
                      T
             D = {ei ei : 1  i  n}.                  Gross, arXiv, 2010
Note that       is Hermitian. Thus our new result!
                                                                           Figure
                                                  T
         Theorem 5. Let s be centered, i.e., s e = 0. Let Y =              ity of
       seT    esT where ✓ = maxi s2 /(sT s) and ⇢ = ((maxi si )
                                      i                                    about
       (mini si ))/ksk. Also, let ⌦ ⇢ H be a random set of elements        both th
       with size |⌦| O(2n⌫(1 + )(log n)2 ) where ⌫ = max((n✓ +             §6.1 fo
       1)/4, n⇢2 ). Then the solution of
                                                                           6.1 R
         minimize    kXk⇤
                                                                             The fi
                             ⇤                   ⇤
         subject to trace(X W i ) = trace((ıY ) W i ), W i 2 ⌦             ability o
                                                                           the nois




                                                                                                          28/40
       is equal to ıY with probability at least 1 n .                      with un
                                                                           These a
       The proof of this theorem follows directly by Theorem 4 if
                                   Purdue ML Seminar
 David Gleich, Purdue
Y = se
Recovery Discussion and Experiments
Confession If            , then just look at differences from
   a connected set. Constants? Not very good.
                           Intuition for the truth.
                                                  




                                                                         29/40
                              Purdue ML Seminar
 David Gleich, Purdue
Recovery Discussion and Experiments
  Recovery
Confession If                   Experiments
 just look at differences from
                                       , then
     a connected set. Constants? Not very good.
                                          Intuition for the truth.
                                                                  




                                                                                                30/40
David F. Gleich (Purdue)                    KDD 2011                                    16/20
                                                 Purdue ML Seminar
 David Gleich, Purdue
The ranking algorithm
 Algorithm
        The Ranking
       0. INPUT    (ratings data) and c
          (for trust on comparisons)
       1. Compute    from   
       2. Discard entries with fewer than
          c comparisons
       3. Set      to be indices and
          values of what’s left
       4.         = SVP(           )
       5. OUTPUT   




                                                                    31/40
                         Purdue ML Seminar
 David Gleich, Purdue
Item Response Model
Synthetic evaluation
The synthetic results came from a model inspired by Ho and
  Quinn [2008].

                                      

                           - center rating for user $i$
                                - sensitivity of user $i$
                             - value of item $j$
                              - error level in ratings

Sample ratings uniformly at random such that there
  for expected ratings per user.




                                                                                                32/40
David F. Gleich (Purdue)                         Purdue
                                             KDD 2011     ML Seminar
 David Gleich, Purdue
                                                                                        21/20
Evaluation
                                 Nuclear norm ranking
                                         Mean rating
                        1                                                             1
Median Kendall’s Tau




                                                              Median Kendall’s Tau
                       0.9                                                           0.9

                       0.8                                                           0.8

                                         20
                       0.7                                                           0.7
                                         10
                                          5
                       0.6                2                                          0.6
                                         1.5
                       0.5                                                           0.5
                             0     0.2   0.4 0.6    0.8   1                                0      0.2   0.4 0.6   0.8   1
                                           Error                                                          Error




                                                                                                                            33/40
Figure 3: The performance of our algorithmPurdue
                       Purdue ML Seminar
 David Gleich,
                                                        (left)
Conclusions and Future Work
 Our motto
                           
       “aggregate, then complete”
    
 
                                    1.  Additional comparison
 Rank aggregation with "              2.  Noisy recovery! More
 the nuclear norm is
                     realistic sampling.
     principled
                      3.  Skew-symmetric Lanczos
                                          based SVD?
     easy to compute
 The results are much better than
 simple approaches.




                                                                            34/40
 
 
                               Purdue ML Seminar
 David Gleich, Purdue
Current research



                                                   35
        Purdue ML Seminar
 David Gleich, Purdue
Data driven surrogate functions
Beyond spectral methods for UQ




                                                                36/40
                     Purdue ML Seminar
 David Gleich, Purdue
Graph spectra
Graph spectra




                                                            37/40
                 Purdue ML Seminar
 David Gleich, Purdue
1.33 (two!)
Spectral spikes
                                                 1.5, 0.5




                                                                  1.5


                   0.565741"
   1.833
          1.767592

                                          0.725708"
                                          1.607625
          1.5 (two)




                                                                          38/40
                               Purdue ML Seminar
 David Gleich, Purdue
Google nuclear ranking gleich




                                                             39/40


                 Purdue ML Seminar
 David Gleich, Purdue

Contenu connexe

Similaire à Skew-symmetric matrix completion for rank aggregation

What the matrix can tell us about the social network.
What the matrix can tell us about the social network.What the matrix can tell us about the social network.
What the matrix can tell us about the social network.David Gleich
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsDavid Gleich
 
Spectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputsSpectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputsDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Two numerical graph algorithms
Two numerical graph algorithmsTwo numerical graph algorithms
Two numerical graph algorithmsDavid Gleich
 
The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrumDavid Gleich
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignmentDavid Gleich
 
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...David Gleich
 
The Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to KaggleThe Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to KaggleKrishna Sankar
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringWaqas Nawaz
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Big Data Challenges at NASA
Big Data Challenges at NASABig Data Challenges at NASA
Big Data Challenges at NASADataWorks Summit
 

Similaire à Skew-symmetric matrix completion for rank aggregation (20)

What the matrix can tell us about the social network.
What the matrix can tell us about the social network.What the matrix can tell us about the social network.
What the matrix can tell us about the social network.
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific Datasets
 
Spectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputsSpectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputs
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Two numerical graph algorithms
Two numerical graph algorithmsTwo numerical graph algorithms
Two numerical graph algorithms
 
The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrum
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...
 
The Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to KaggleThe Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to Kaggle
 
Project3.ppt
Project3.pptProject3.ppt
Project3.ppt
 
V. Batagelj - Big data Networks from data bases
V. Batagelj - Big data Networks from data basesV. Batagelj - Big data Networks from data bases
V. Batagelj - Big data Networks from data bases
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph Clustering
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum Computation
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Big Data Challenges at NASA
Big Data Challenges at NASABig Data Challenges at NASA
Big Data Challenges at NASA
 

Plus de David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 

Plus de David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 

Dernier

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Skew-symmetric matrix completion for rank aggregation

  • 1. Skew-symmetric matrix completion for rank aggregation ! and other matrix computations DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT 1/40 February 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
  • 2. Skew-symmetric matrix completion for rank aggregation ! and other matrix computations DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT 2/40 February 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
  • 3. Skew-symmetric matrix completion for rank aggregation ! and other matrix computations DAVID F. GLEICH PURDUE UNIVERSITY COMPUTER SCIENCE DEPARTMENT 3/40 January 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
  • 4. 4/40 Images copyright by their respective owners
  • 5. Matrix computations are the heart (and not brains) of many methods of computing. 5/40 Purdue ML Seminar David Gleich, Purdue
  • 6. Matrix computations Physics Statistics Engineering Graphics Databases … Machine learning 6/40 Purdue ML Seminar David Gleich, Purdue
  • 7. Matrix computations 2 3 A1,1 A1,2 ··· A1,n 6 . 7 . 7 6 A2,1 A2,2 ··· . 7 A=6 . 6 7 4 . .. .. . . . Am 1,n 5 Am,1 ··· Am,n 1 Am,n Ax = b min kAx bk Ax = x 7/40 Linear systems Least squares Eigenvalues Purdue ML Seminar David Gleich, Purdue
  • 8. NETWORK and MATRIX COMPUTATIONS Why looking at networks of data as a matrix is a powerful and successful paradigm.
  • 9. A new matrix-based sensitivity analysis of Google’s PageRank. PageRank (I ↵P)x = (1 ↵)v SimRank Presented at" RAPr on Wikipedia DiffusionRank WAW2007, WWW2010 E [x(A)] Std [x(A)] BlockRank Published in the United States IsoRank United States C:Living people C:Living people TrustRank J. Internet Mathematics France ItemRank C:Main topic classif. ObjectRank ProteinRank Led to new results on United Kingdom C:Contents uncertainty quantification in Germany C:Ctgs. by country HostRank physical simulations published in SIAM J. Matrix England Canada SocialPageRank United Kingdom France Random walk with Analysis and SIAM J. Scientific Computing. Japan Poland FoodRank C:Fundamental England restart Patent Pending Australia FutureRank C:Ctgs. by topic GeneRank TwitterRank Improved web-spam detection! Gleich (Stanford) Random sensitivity Ph.D. Defense 23 / 41 Collaborators Paul Constantine, Gianluca Iaccarino (physical simulation)
  • 10. j Square s 2 F.L (Purdue) vid Gleich r Network alignment INFORMS Semina = (t, ) twork alignment = t t t mm 40 60 80 100 A L B NETWORK ALIGNMENT X m ximize wT x + 2 xT Sx T x + 1 xT Sx 40 j subject to Axw e, 2 {0, 1} m ximize  x 2 S $ subject to Ax  e ng ry Network alignment 2 {0, 1} problems Sparse 10/40 Bayati, Gerritsen, Gleich, Saberi, and Wang, ICDM2009 UADRATIC ASSIGNMENT Bayati, Gleich, Saberi and Wang,often ignore Sparse L Submitted 60 Southeast Ranking few exceptions). Purdue ML Seminar David Gleich, Purdue Network alignment Workshop 11 / 29
  • 11. Overlapping clusters! for distributed computation Andersen, Gleich, and Mirrokni, WSDM2012 2 Swapping Probability (usroads) PageRank Communication (usroads) Swapping Probability (web−Google) 1.5 PageRank Communication (web−Google) Relative Work 1 Metis Partitioner 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Volume Ratio 11/40 How much more of the graph we need to store. Purdue ML Seminar David Gleich, Purdue
  • 12. Local methods for massive Twee network analysis RESULTS – SLIDE THRE Gleich et al. " MAIN J. Internet Mathematics, to appear. TOP-K ALGORITHM FOR KATZ Approximate                                                 where       is sparse Keep       sparse too Ideally, don’t “touch” all of       David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47 Can solve these problemsGleich milliseconds even withICME la/opt seminar David F. in (Sandia) 100M edges! 12/40 Purdue ML Seminar David Gleich, Purdue
  • 13. DAVID F. GLEICH (PURDUE) & LEK-HENG LIM (UNIV. CHICAGO) Rank aggregation 13 Purdue ML Seminar David Gleich, Purdue
  • 14. Which is a better list of good DVDs? Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of … Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two Towers Lost: Season 1 Star Wars V: Empire Strikes Back Battlestar Galactica: Season 1 Raiders of the Lost Ark Fullmetal Alchemist Star Wars IV: A New Hope Trailer Park Boys: Season 4 Shawshank Redemption Trailer Park Boys: Season 3 Star Wars VI: Return of the Jedi Tenchi Muyo! Lord of the Rings 3: Bonus DVD Shawshank Redemption The Godfather Standard " Nuclear Norm " rank aggregation" based rank aggregation (the mean rating) (not matrix completion on the 14/40 netflix rating matrix) Purdue ML Seminar David Gleich, Purdue
  • 15. Rank Aggregation Given partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering. Voting Find the winning candidate Program committees Find the best papers given reviews Dining Find the best restaurant in Chicago 15/40 Purdue ML Seminar David Gleich, Purdue
  • 16. Ranking is really hard John Kemeny Dwork, Kumar, Naor, ! Ken Arrow Sivikumar All rank aggregations involve some measure of A good ranking is the compromise “average” ranking under a NP hard to compute Kemeny’s ranking 16/40 permutation distance Purdue ML Seminar David Gleich, Purdue
  • 17. Embody chair! John Cantrell (flickr) Given a hard problem, what do you do?! ! Numerically relax!! ! It’ll probably be easier. 17/40 Purdue ML Seminar David Gleich, Purdue
  • 18. Suppose we had scores Suppose we had scores Let    be the score of the ith movie/song/paper/team to rank Suppose we can compare the ith to jth:    Then    is skew-symmetric, rank 2. Also works for    with an extra log. Numerical ranking is intimately intertwined with skew-symmetric matrices 18/40 Kemeny and Snell, Mathematical Models in Social Sciences (1978) David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 6/20
  • 19. Using ratings as comparisons Arithmetic Mean Ratings induce various skew- symmetric matrices. Log-odds 19/40 From David 1988 – The Method of Paired Comparisons Purdue ML Seminar David Gleich, Purdue
  • 20. Extracting the scores Extracting the scores Given    with all entries, then 107    is the Borda Movie Pairs 105 count, the least-squares solution to    How many    do we have? 101 Most. 101 105 Number of Comparisons Do we trust all    ? Not really. Netflix data 17k movies, 500k users, 100M ratings– 99.17% filled 20/40 David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 8/20
  • 21. Onlypartial info? COMPLETE IT! Only partial info? Complete it! Let    be known for    We trust these scores. Goal Find the simplest skew-symmetric matrix that matches the data       noiseless noisy    21/40 Both of these are NP-hard too. Purdue ML Seminar David Gleich, Purdue David F. Gleich (Purdue) KDD 2011 9/20
  • 22. Solution GO NUCLEAR! 22/40 From a French nuclear test in 1970, imagePurdue ML Seminar David Gleich, Purdue from http://picdit.wordpress.com/2008/07/21/8- insane-nuclear-explosions/
  • 23. The nuclear norm The nuclear norm! The analog the 1-norm or    -norm for matrices The analog of of the 1-norm or ℓ������1for matrices. For vectors For matrices Let    be the SVD. is NP-hard while       is convex and gives the same    best convex under- answer “under appropriate estimator of rank on unit ball. circumstances” 23/40 Purdue ML Seminar David Gleich, Purdue
  • 24. Only partial info? COMPLETE IT! Only partial info? Complete it! Let    be known for    We trust these scores. Goal Find the simplest skew-symmetric matrix that matches the data       NP hard Heuristic    Convex 24/40 Purdue ML Seminar David Gleich, Purdue
  • 25. Solving the ! Solving theproblem nuclear norm nuclear norm problem Use a LASSO formulation 1.    2. REPEAT    3.    = rank-k SVD of       4. 5.       6. UNTIL    Jain et al. propose SVP for this problem without    25/40 Purdue ML Seminar David Gleich, Purdue
  • 26. Skew-symmetric SVDs Skew-symmetric SVD Let    be an    skew-symmetric matrix with eigenvalues    , where    and    . Then the SVD of    is given by    for    and    given in the proof. Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block This means that SVP will give us the skew- 26/40 symmetric constraint “for free” David F. Gleich (Purdue) KDD 2011 14/20 Purdue ML Seminar David Gleich, Purdue
  • 27. Only partial info? Complete it! Let    be known for    We trust these score Matrix completion Goal Find the simplest skew-symmetric matrix that the data    A fundamental question is matrix    NP hard completion is when do these problems have the    Convex same solution? 27/40 David F. Gleich (Purdue) KDD 2011 Purdue ML Seminar David Gleich, Purdue
  • 28. indices. Instead we view the following theorem as providing Fraction of trials recovered 1 intuition for the noisy problem. 0.8 Exact recovery results Consider the operator basis for Hermitian matrices: Exact recovery results H = S [ K [ D where 0.4 0.6 p S = {1/ 2(ei eT + ej eT ) : 1  i < j  n}; David Gross showed how to recover Hermitian matrices. 0.2 p j i K = {ı/ 2(ei eT ej eT ) : 1we get n}; exact    i.e. the conditions under which  i < j the j i 0 2 10 T D = {ei ei : 1  i  n}. Gross, arXiv, 2010 Note that    is Hermitian. Thus our new result! Figure T Theorem 5. Let s be centered, i.e., s e = 0. Let Y = ity of seT esT where ✓ = maxi s2 /(sT s) and ⇢ = ((maxi si ) i about (mini si ))/ksk. Also, let ⌦ ⇢ H be a random set of elements both th with size |⌦| O(2n⌫(1 + )(log n)2 ) where ⌫ = max((n✓ + §6.1 fo 1)/4, n⇢2 ). Then the solution of 6.1 R minimize kXk⇤ The fi ⇤ ⇤ subject to trace(X W i ) = trace((ıY ) W i ), W i 2 ⌦ ability o the nois 28/40 is equal to ıY with probability at least 1 n . with un These a The proof of this theorem follows directly by Theorem 4 if    Purdue ML Seminar David Gleich, Purdue Y = se
  • 29. Recovery Discussion and Experiments Confession If    , then just look at differences from a connected set. Constants? Not very good.    Intuition for the truth.       29/40 Purdue ML Seminar David Gleich, Purdue
  • 30. Recovery Discussion and Experiments Recovery Confession If    Experiments just look at differences from , then a connected set. Constants? Not very good.    Intuition for the truth.       30/40 David F. Gleich (Purdue) KDD 2011 16/20 Purdue ML Seminar David Gleich, Purdue
  • 31. The ranking algorithm Algorithm The Ranking 0. INPUT    (ratings data) and c (for trust on comparisons) 1. Compute    from    2. Discard entries with fewer than c comparisons 3. Set    to be indices and values of what’s left 4.    = SVP(   ) 5. OUTPUT    31/40 Purdue ML Seminar David Gleich, Purdue
  • 32. Item Response Model Synthetic evaluation The synthetic results came from a model inspired by Ho and Quinn [2008].       - center rating for user $i$    - sensitivity of user $i$    - value of item $j$    - error level in ratings Sample ratings uniformly at random such that there for expected ratings per user. 32/40 David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 21/20
  • 33. Evaluation Nuclear norm ranking Mean rating 1 1 Median Kendall’s Tau Median Kendall’s Tau 0.9 0.9 0.8 0.8 20 0.7 0.7 10 5 0.6 2 0.6 1.5 0.5 0.5 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Error Error 33/40 Figure 3: The performance of our algorithmPurdue Purdue ML Seminar David Gleich, (left)
  • 34. Conclusions and Future Work Our motto “aggregate, then complete” 1.  Additional comparison Rank aggregation with " 2.  Noisy recovery! More the nuclear norm is realistic sampling. principled 3.  Skew-symmetric Lanczos based SVD? easy to compute The results are much better than simple approaches. 34/40 Purdue ML Seminar David Gleich, Purdue
  • 35. Current research 35 Purdue ML Seminar David Gleich, Purdue
  • 36. Data driven surrogate functions Beyond spectral methods for UQ 36/40 Purdue ML Seminar David Gleich, Purdue
  • 37. Graph spectra Graph spectra 37/40 Purdue ML Seminar David Gleich, Purdue
  • 38. 1.33 (two!) Spectral spikes 1.5, 0.5 1.5 0.565741" 1.833 1.767592 0.725708" 1.607625 1.5 (two) 38/40 Purdue ML Seminar David Gleich, Purdue
  • 39. Google nuclear ranking gleich 39/40 Purdue ML Seminar David Gleich, Purdue