SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Models and Algorithms for
                      PageRank Sensitivity

                           David F. Gleich
                            Stanford University

                              Ph.D. Oral Defense
                           Institute for Computational
                         and Mathematical Engineering

                                May 26, 2009




Gleich (Stanford)                                        Ph.D. Defense   1 / 41
Outline
                     PageRank intro

                        Sensitivity

                    Random sensitivity

                       Inner-Outer

                        Summary




Gleich (Stanford)                        Ph.D. Defense   2 / 41
Five years!
                      2004           2009
                    Firefox 1.0    Firefox 3.5
                    Wikipedia?     Wikipedia! YouTube! Hulu!
                    Facebook?      Facebook! flickr! Twitter!
                      Gmail?         Gmail! Google Maps!
                     Yahoo!         Yahoo?
                     3.0 GHz       3.0 GHz × 4
                     Google         Google




Gleich (Stanford)                                Ph.D. Defense   3 / 41
PageRank intro


                         Sensitivity

PageRank intro   Random sensitivity
Slide 4 of 41
                        Inner-Outer


                         Summary
A cartoon websearch primer

1. Crawl webpages
2. Analyze webpage text (information retrieval)
3. Analyze webpage links
4. Fit measures to human evaluations
5. Produce rankings
6. Continually update




    Gleich (Stanford)     PageRank intro          Ph.D. Defense   5 / 41
1
                                                         2
                                     to

                                              3




Gleich (Stanford)   PageRank intro            Ph.D. Defense   6 / 41
PageRank by Google
                         The places we find the
                         surfer most often are im-
                         portant pages.


              3

                        The Model
2                   5   1. follow edges uniformly with
              4            probability α, and
                        2. randomly jump with probability
1                   6      1 − α, we’ll assume everywhere
                           is equally likely



Gleich (Stanford)        PageRank intro           Ph.D. Defense   7 / 41
Some PageRank details
                        3



                                                                                     
                    2          5                      1/ 6   1/ 2    0      0     0   0
                        4
                                                     1/ 6    0      0     1/ 3   0   0
                                                                                              P j ≥0
                                         →           1/ 6   1/ 2    0     1/ 3   0   0
                                                     1/ 6    0     1/ 2    0     0   0     eT P=eT
                                                      1/ 6    0     1/ 2   1/ 3   0   1
                                                      1/ 6    0      0      0     1   0
                    1          6
                                                                      P


                                                                              T                 ≥0
                            “jump”       →          v=[1
                                                       n
                                                                ...   1
                                                                      n   ]                  eT v=1

Markov chain                  αP + (1 − α)veT x = x
                               unique x ⇒ j ≥ 0, eT x = 1.
Linear system                ( − αP)x = (1 − α)v
Small detail                 dangling nodes patched back to v
Gleich (Stanford)                  PageRank intro                                      Ph.D. Defense   8 / 41
Other uses for PageRank
                                                             What else people use PageRank to do


                                    GeneRank                                                 ProteinRank
             NM_003748
             NM_003862
        Contig32125_RC
                U82987
              AB037863
             NM_020974
        Contig55377_RC
             NM_003882
             NM_000849
        Contig48328_RC




                                                                                                IsoRank
        Contig46223_RC
             NM_006117
             NM_003239
             NM_018401
              AF257175
              AF201951
             NM_001282
        Contig63102_RC
             NM_000286
        Contig34634_RC
             NM_000320
              AB033007
              AL355708
             NM_000017
             NM_006763
              AF148505
            Contig57595
             NM_001280
              AJ224741
                U45975
        Contig49670_RC
          Contig753_RC
        Contig25055_RC
        Contig53646_RC
        Contig42421_RC
        Contig51749_RC
              AL137514
             NM_004911
             NM_000224
             NM_013262
        Contig41887_RC
             NM_004163
              AB020689
             NM_015416
        Contig43747_RC
             NM_012429
              AB033043
              AL133619
             NM_016569
             NM_004480
             NM_004798
        Contig37063_RC
             NM_000507
              AB037745
        Contig50802_RC
             NM_001007
        Contig53742_RC
             NM_018104
            Contig51963
        Contig53268_RC
             NM_012261
             NM_020244
        Contig55813_RC
        Contig27312_RC
        Contig44064_RC
             NM_002570
             NM_002900
              AL050090
             NM_015417
        Contig47405_RC
             NM_016337
        Contig55829_RC
            Contig37598
        Contig45347_RC
             NM_020675
             NM_003234
              AL080110
              AL137295
        Contig17359_RC
             NM_013296
             NM_019013
              AF052159
        Contig55313_RC
             NM_002358
             NM_004358
        Contig50106_RC
             NM_005342
             NM_014754
                U58033
            Contig64688
             NM_001827
         Contig3902_RC
        Contig41413_RC
             NM_015434
             NM_014078
             NM_018120
             NM_001124
                 L27560
        Contig45816_RC
              AL050021
             NM_006115
             NM_001333
             NM_005496
        Contig51519_RC
         Contig1778_RC
             NM_014363
             NM_001905
             NM_018454
             NM_002811
             NM_004603
              AB032973
             NM_006096
                D25328
        Contig46802_RC
                 X94232
             NM_018004
         Contig8581_RC




                                                                                              Clustering
        Contig55188_RC
            Contig50410
        Contig53226_RC
             NM_012214
             NM_006201
             NM_006372
        Contig13480_RC
              AL137502
        Contig40128_RC
             NM_003676
             NM_013437
         Contig2504_RC
              AL133603
             NM_012177
            R70506_RC
             NM_003662
             NM_018136
             NM_000158
             NM_018410
        Contig21812_RC
             NM_004052
             Contig4595
        Contig60864_RC
             NM_003878
                U96131
             NM_005563
             NM_018455
        Contig44799_RC
             NM_003258
             NM_004456
             NM_003158
             NM_014750
        Contig25343_RC
             NM_005196
        Contig57864_RC
             NM_014109
             NM_002808
        Contig58368_RC
        Contig46653_RC
             NM_004504
                M21551
             NM_014875
             NM_001168
             NM_003376
             NM_018098
              AF161553
             NM_020166
             NM_017779
             NM_018265
              AF155117
             NM_004701
             NM_006281
        Contig44289_RC
             NM_004336
        Contig33814_RC




                                                                                           (graph partitioning)
             NM_003600
             NM_006265
             NM_000291
             NM_000096
             NM_001673
             NM_001216
             NM_014968
             NM_018354
             NM_007036
             NM_004702
         Contig2399_RC
             NM_001809
        Contig20217_RC
             NM_003981
             NM_007203
             NM_006681
              AF055033
             NM_014889
             NM_020386
             NM_000599
        Contig56457_RC
             NM_005915
        Contig24252_RC
        Contig55725_RC
             NM_002916
             NM_014321
             NM_006931
              AL080079
        Contig51464_RC
             NM_000788
             NM_016448
                 X05610
             NM_014791
        Contig40831_RC
              AK000745
             NM_015984
             NM_016577
        Contig32185_RC
              AF052162
              AF073519
             NM_003607
             NM_006101
             NM_003875
            Contig25991
        Contig35251_RC
             NM_004994
             NM_000436
             NM_002073
             NM_002019
             NM_000127
             NM_020188
              AL137718
        Contig28552_RC
        Contig38288_RC
          AA555029_RC
             NM_016359
        Contig46218_RC
        Contig63649_RC
              AL080059
                          10   20   30   40   50   60   70




                                                                                           Sports ranking
Use ( − αGD−1 )x = w to
find “nearby” important
genes.                                                                                         Teaching



                                                                                                   Morrison et al. GeneRank, 2005.
  Gleich (Stanford)                                                       PageRank intro                         Ph.D. Defense   9 / 41
My other projects
Prior PageRank




                 Parallel Krylov Methods Approximate Personal
                 Gleich, Zhukov, and Berkhin , Yahoo! Research Labs PageRank
                 Technical Report, YRL-2004-038; Gleich and Zhukov,           Gleich and Polito, Internet Math. 3(3):257 294,
                 SuperComputing poster, 2005.                                 2007.
                  Does existing software work for computing PageRank              Can you build a web search engine on your PC?
                 on a cluster?

                 Parameterized Matrix
Ongoing




                                                                              Network Alignment
                 Problems   Come back here for                                (with Mohsen Bay-            j          Square
                                                                                                                               j

                                                                                                                                       s
                                                                                                                                           r




                 (with Paul Constantine)        his defense on Monday,        ati, Margot Gerritsen,
                                                June 1st at 1:30pm!           Amin Saberi, and Ying
                 A(s)x(s) = b(s)                                              Wang)                             t
                                                                                                                                   t
My Software




                 Packages                                                     Publications
                 MatlabBGL               vismatrix                            Random α PageRank
                 libbvg                  parameterized                        Inner-Outer PageRank
                                         matrix package
                 gaimc
                                         (with Paul)

                     Gleich (Stanford)                           PageRank intro                                     Ph.D. Defense      10 / 41
PageRank intro


                         Sensitivity

Sensitivity      Random sensitivity
Slide 11 of 41
                        Inner-Outer


                         Summary
Which sensitivity?

Sensitivity to the links : examined and understood



Sensitivity to the jump : examined, understood, and useful



Sensitivity to α : less well understood




 Gleich (Stanford)        Sensitivity           Ph.D. Defense   12 / 41
PageRank on Wikipedia
α = 0.50                     α = 0.85                       α = 0.99
United States                United States                  C:Contents
C:Living people              C:Main topic classif.          C:Main topic classif.
France                       C:Contents                     C:Fundamental
Germany                      C:Living people                United States
England                      C:Ctgs. by country             C:Wikipedia admin.
United Kingdom               United Kingdom                 P:List of portals
Canada                       C:Fundamental                  P:Contents/Portals
Japan                        C:Ctgs. by topic               C:Portals
Poland                       C:Wikipedia admin.             C:Society
Australia                    France                         C:Ctgs. by topic

Note   Top 10 articles on Wikipedia with highest PageRank

        Gleich (Stanford)              Sensitivity                      Ph.D. Defense   13 / 41
The PageRank function
Look at the PageRank vector as a function of α
                                   ( − αP)x(α) = (1 − α)v
and examine its derivative.
My Contributions
Gleich, Glynn, Golub, Greif, Dagstuhl proceedings, 2007.          Others
Compute the derivative with just                                  PageRank becomes
simple PageRank solves.                                           more sensitive as α → 1.
Empirically evaluated the                                         PageRank vector at
derivative as a rank change                                       α = 1 well defined.
predictor.

                                              α matters!

                             Golub and Greif, 2004; Boldi et al., 2005; Berkhin, 2005; Langville and Meyer, 2006.
         Gleich (Stanford)                          Sensitivity                               Ph.D. Defense   14 / 41
PageRank intro



Random
                         Sensitivity



sensitivity      Random sensitivity


Slide 15 of 41          Inner-Outer


                         Summary
What is alpha?
           Author                                  α
           Brin and Page (1998)                    0.85
           Najork et al. (2007)                    0.85
           Litvak et al. (2006)                    0.5
           Experiment (slide 20)                   0.375
           Algorithms (...)                        ≥ 0.85



For you, α is clear
Google wants PageRank for everyone




     Gleich (Stanford)        Random sensitivity            Ph.D. Defense   16 / 41
Multiple surfers
             Each person picks α from distribution A




                                                            ...

               ↓                                     ↓
            x(E [A])                             E [x(A)]

                       x(E [A]) = E [x(A)]

Gleich (Stanford)           Random sensitivity              Ph.D. Defense   17 / 41
Random alpha PageRank
                                        RAPr




Model PageRank as the random variables

                                        x(A)

and look at
                         E [x(A)] and Std [x(A)] .




                            Gleich and Constantine, Workshop on Algorithms on the Web Graph, 2007
     Gleich (Stanford)               Random sensitivity                        Ph.D. Defense   18 / 41
What is A?
             Beta(0,0,0.6,0.9)
             Beta(2,16,0,1)
             Beta(1,1,0.1,0.9)
             Beta(−0.5,−0.5,0.2,0.7)




  0                                                               1


                                       Bet ( , b, , r)



Gleich (Stanford)                         Random sensitivity   Ph.D. Defense   19 / 41
Alpha is
                        2
                                                                                Histogram
                       1.8                                                      Density Fit
                                                                                Beta(1.5,0.5)
                       1.6
                                                                      mean            0.375
                       1.4
                                                                      mode            0.25
                       1.2
             density




                        1

                       0.8

                       0.6

                       0.4

                       0.2

                        0
                         0   0.1   0.2   0.3     0.4      0.5       0.6   0.7   0.8    0.9      1
                                                           α




                                    Data provided by Abraham Flaxman and Asela Gunawardana at Microsoft.
Gleich (Stanford)                              Random sensitivity                                   Ph.D. Defense   20 / 41
Example
                                x1


                    3           x
                                    2



    2                   5       x
                                    3

                    4
                                x4

    1                   6
                                x
                                    5



                                x
                                    6


                                  0                          0.5




Gleich (Stanford)             Random sensitivity   Ph.D. Defense   21 / 41
What changes?
             x(A)        A ∼ Bet ( , b, , r) with 0 ≤ < r ≤ 1


1. E [ (A)] ≥ 0 and             E [x(A)] = 1;
  thus E [x(A)] is a probability distribution.

                        ∞
2. E [x(A)] =           ℓ=0
                            E   Aℓ − Aℓ+1 Pℓ v;
  thus we can interpret E [x(A)] in length-ℓ paths.

3. for page with no in-links,                   (A) = (1 − A) ;
  thus E [ (A)] =               (E [A]) and Std [ (A)] =          Std [A]
  But is this one useful?


    Gleich (Stanford)                Random sensitivity              Ph.D. Defense   22 / 41
RAPr on Wikipedia
E [x(A)]                                  Std [x(A)]
United States                             United States
C:Living people                           C:Living people
France                                    C:Main topic classif.
United Kingdom                            C:Contents
Germany                                   C:Ctgs. by country
England                                   United Kingdom
Canada                                    France
Japan                                     C:Fundamental
Poland                                    England
Australia                                 C:Ctgs. by topic

 Gleich (Stanford)   Random sensitivity                      Ph.D. Defense   23 / 41
Std vs. PageRank
                                                                    Does it tell us more than just PageRank?
                                                              uk2006 — 77M nodes and 2B edges

                                                    1             k   1
isim(k) =                                           k              =1 2
                                                                          |Diff[Y(1: ), Z(1: )]|

 Disjoint                           1
                                                                                      Std[x(A )] vs. x(0.85)
                                                                                             1
                                                                                      Std[x(A2)] vs. x(0.5)
                                                                                                               Kendall’s τ
                                   0.8
                                                                                                               τ(x(E1 ), S1 ) = +0.3
     Intersection Similarity (k)




                                                                                      Std[x(A )] vs. x(0.85)
                                                                                             3


                                   0.6
                                                                                                               τ(x(E2 ), S2 ) = −0.5

                                   0.4
                                                                                                               τ(x(0.85), S3 ) = −0.2

                                   0.2


Identical                           0 0                       2              4           6
                                    10                   10                 10        10
                                                                            k


                                            A1 ∼ Bet (2, 16, [0, 1])                                 A2 ∼ Bet (1, 1, [0, 1])
                                                                          A3 ∼ Bet (0.5, 1.5, [0, 1])
                                          Gleich (Stanford)                         Random sensitivity                       Ph.D. Defense   24 / 41
Computation
1. monte carlo
              1              N
   E [x(A)] = N              =1
                                x(α   )         α ∼A


2. path damping
              N
   E [x(A)] ≈ =0 E A − A +1 P v


3. quadrature
                        r                            N
   E [x(A)] =               x(α) dρ(α) ≈             =1
                                                        x(ζ    )ω




    Gleich (Stanford)                     Random sensitivity        Ph.D. Defense   25 / 41
Time
                     cnr2000 — 325k nodes and 3M edges

 0
10


 −5
10


 −10
10
                   Monte Carlo
                   Path Damping
                   Quadrature
 −15
10     −2                 −1       0               1         2    3               4
     10                  10       10         10             10   10           10
                                          Time (sec)




     Gleich (Stanford)                 Random sensitivity             Ph.D. Defense   26 / 41
Convergence theory
Method                     Conv. Work Required              What is N?
                             1                              number of
Monte Carlo                        N PageRank systems
                              N                             samples from A
Path Damping
                           r N+2   N + 1 matrix vector      terms of
(without
                           N1+     products                 Neumann series
Std [x(A)])
                                                            number of
Gaussian
                           r 2N    N PageRank systems       quadrature
Quadrature
                                                            points


                        and r are parameters from Bet ( , b, , r)




    Gleich (Stanford)                  Random sensitivity           Ph.D. Defense   27 / 41
Webspam application
    Hosts of uk-2006 are labeled as spam, not-spam, other

                            P            R                        f       FP          FN
 Baseline                   0.694        0.558                    0.618   0.034       0.442

 Beta(0.5,1.5)              0.695        0.561                    0.621   0.034       0.439
 Beta(1,1)                  0.698        0.562                    0.622   0.033       0.438
 Beta(2,16)                 0.699        0.562                    0.623   0.033       0.438



Note Bagged (10) J48 decision tree classifier in Weka, mean of 50 repetitions from
10-fold cross-validation of 4948 non-spam and 674 spam hosts (5622 total).
                                         Becchetti et al. Link analysis for Web spam detection, 2008.
        Gleich (Stanford)                    Random sensitivity                     Ph.D. Defense   28 / 41
PageRank intro


                         Sensitivity

Inner-Outer      Random sensitivity
Slide 29 of 41
                        Inner-Outer


                         Summary
Motivation
                        Why another PageRank algorithm?

For the RAPr codes, we need
 1. reliable code
 2. fast code over a range of α’s                            fancy
         → Use Matlab’s “”
 3. code for big problems
         → Use a Gauss-Seidel or
           custom Richardson method
 4. code with only matvec products
        → Use the inner-outer iteration
 5. code with only 2 vectors of memory
        → Use the power method                               simple




    Gleich (Stanford)               Inner-Outer           Ph.D. Defense   30 / 41
Inner-Outer
         Note           PageRank is easier when α is smaller
         Thus           Solve PageRank with itself using β < α!

Outer      ( − βP)x(k+1) = (α − β)Px(k) + (1 − α)v ≡ f(k)

Inner      y(j+1) = βPy(j) + (α − β)Px(k) + (1 − α)v

  A new parameter? What is β?                      0.5
  How many inner iterations?                       Until a residual of 10−2




                                                                  Gray, Greif, Lau, 2007.
    Gleich (Stanford)                Inner-Outer                      Ph.D. Defense   31 / 41
Inner-Outer algorithm
 Input: P, v, α, τ, (β = 0.5, η = 10−2 )
 Output: x                                                 if 0 ≤ β ≤ α,
 1: x ← v                                                  convergence with
 2: y ← Px                                                 any η
 3: while αy + (1 − α)v − x 1 ≥ τ
                                                           uses only three
 4:     f ← (α − β)y + (1 − α)v
                                                           vectors of memory
 5:     repeat
 6:         x ← f + βy                                     β = 0.5, η = 10−2
 7:         y ← Px                                         often faster than the
 8:     until f + βy − x 1 < η                             power method
 9: end while                                              (or just a titch slower)
 10: x ← αy + (1 − α)v


Note   Note that the inner-loop checks its condition after doing one iteration.

       Gleich (Stanford)                Inner-Outer                       Ph.D. Defense   32 / 41
Performance
                                     wb−edu, α = 0.85                                                               wb−edu, α = 0.99
            0
           10                                                                                    0
                                                                                                10


            −1                                          0
           10                                        10                                          −1
                                                                                                10                               10
                                                                                                                                       0




            −2
           10                                           −2                                       −2
                                                     10                                         10                               10
                                                                                                                                       −2

                                                              5   10 15 20                                                                        20    40
            −3                                                                                   −3
           10                                                                                   10
Residual




                                                                                     Residual
            −4                                                                                   −4
           10                                                                                   10


            −5                                                                                   −5
           10                                                                                   10


            −6                                                                                   −6
           10                                                                                   10
                       power                                                                          power
                       inout                                                                          inout
            −7                                                                                   −7
           10                                                                                   10
                     10        20    30        40        50       60   70    80                       200     400          600              800        1000    1200
                                          Multiplication                                                              Multiplication


                                     τ = 10−7 , β = 0.5, η = 10−2 ;
                                wb-edu graph (9.8M nodes, 57.M edges)


                 Gleich (Stanford)                                           Inner-Outer                                                           Ph.D. Defense   33 / 41
Extensions

1. A large scale shared-memory parallel version on
   compressed web graphs
2. A Gauss-Seidel variant
3. A BiCG-STAB preconditioner
4. A conjecture about the performance of the iteration
5. Showed the algorithm converges for “any” β, η




                                           Gleich, Gray, Greif, Lau, submitted.
    Gleich (Stanford)       Inner-Outer                    Ph.D. Defense   34 / 41
Convergence Result
Sketch of convergence result
1. error after j steps of the inner iteration
                                                            j−1
                                                  α−β
                        f(j) =   αβj−1 Pj +                       βℓ Pℓ   f(0)
                                                        β   ℓ=1

2. upper bound error by

                                     (α − β) + (1 − α)βj
                            f(j) ≤                                 f(0) .
                                               1−β

3. notice
                                     f(j) ≤ α f(0) , j ≥ 1
4. hence, convergence as long as β ≤ α


    Gleich (Stanford)                     Inner-Outer                            Ph.D. Defense   35 / 41
PageRank intro


                         Sensitivity

Summary          Random sensitivity
Slide 36 of 41
                        Inner-Outer


                         Summary
Conclusions


α matters
sensitivity is useful
everything is just PageRank




 Gleich (Stanford)       Summary   Ph.D. Defense   37 / 41
Contributions
 1. Derivative
 Gleich, Glynn, Golub, Greif, 2007.

        New technique to compute the derivative using just PageRank

2. RAPr                                                  3. Inner-Outer
Constantine and Gleich, 2007; Constantine, Gleich,
                                                         Gleich, Gray, Greif, Lau, submitted.
and Iaccarino, submitted.

       New PageRank model and                                   Improved convergence
       sensitivity measure                                      analysis
       Range of algorithms and                                  Gauss-Seidel and
       algorithmic analysis                                     preconditioning variants
       Empirically helpful for                                  Shared-memory parallel
       spam identification                                       implementation

       Robust software                                          Robust software


           Gleich (Stanford)                         Summary                                    Ph.D. Defense   38 / 41
Thanks!


                    Michael Saunders (My Advisor)
                        Hector Garcia-Molina
                              Chen Greif
                              Art Owen
                             Amin Saberi




Gleich (Stanford)               Summary             Ph.D. Defense   39 / 41
Thanks Gene!
Margot Gerritsen    Debbie Heimowitz
Peter Glynn         Jason Azicri
Walter Murray       Steven Fan
Reid Andersen       Paul Constantine
Pavel Berkhin       Michael Atkinson
Kevin Lang          Jeremy Kozdon
Amy Langville       Esteban Arcaute
Matthew Rasmussen
Sebastiano Vigna
                    Adam Guetz
                    Will Fong              THANK
Leonid Zhukov       Andrew Bradley
Indira Choudhury
Seth Tornborg
                    Nick Henderson
                    Chris Maes
                                            YOU
Brian Tempero       Nicole Taheri
Prisilla Williams   Ying Wang
Deb Michael         Nick West
Mayita Romero       Kaustuv's Rum
Les Fletcher        Saeco Coffee Machine
Hugh Fletcher       Napa Valley
Lindsey Fletcher    Matlab
Jane Fletcher       superlu
Ph.D. Defense: Models and Algorithms for PageRank sensitivity

Contenu connexe

Plus de David Gleich

Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 

Plus de David Gleich (20)

Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Ph.D. Defense: Models and Algorithms for PageRank sensitivity

  • 1. Models and Algorithms for PageRank Sensitivity David F. Gleich Stanford University Ph.D. Oral Defense Institute for Computational and Mathematical Engineering May 26, 2009 Gleich (Stanford) Ph.D. Defense 1 / 41
  • 2. Outline PageRank intro Sensitivity Random sensitivity Inner-Outer Summary Gleich (Stanford) Ph.D. Defense 2 / 41
  • 3. Five years! 2004 2009 Firefox 1.0 Firefox 3.5 Wikipedia? Wikipedia! YouTube! Hulu! Facebook? Facebook! flickr! Twitter! Gmail? Gmail! Google Maps! Yahoo! Yahoo? 3.0 GHz 3.0 GHz × 4 Google Google Gleich (Stanford) Ph.D. Defense 3 / 41
  • 4. PageRank intro Sensitivity PageRank intro Random sensitivity Slide 4 of 41 Inner-Outer Summary
  • 5. A cartoon websearch primer 1. Crawl webpages 2. Analyze webpage text (information retrieval) 3. Analyze webpage links 4. Fit measures to human evaluations 5. Produce rankings 6. Continually update Gleich (Stanford) PageRank intro Ph.D. Defense 5 / 41
  • 6. 1 2 to 3 Gleich (Stanford) PageRank intro Ph.D. Defense 6 / 41
  • 7. PageRank by Google The places we find the surfer most often are im- portant pages. 3 The Model 2 5 1. follow edges uniformly with 4 probability α, and 2. randomly jump with probability 1 6 1 − α, we’ll assume everywhere is equally likely Gleich (Stanford) PageRank intro Ph.D. Defense 7 / 41
  • 8. Some PageRank details 3   2 5 1/ 6 1/ 2 0 0 0 0 4  1/ 6 0 0 1/ 3 0 0 P j ≥0 →  1/ 6 1/ 2 0 1/ 3 0 0  1/ 6 0 1/ 2 0 0 0 eT P=eT 1/ 6 0 1/ 2 1/ 3 0 1 1/ 6 0 0 0 1 0 1 6 P T ≥0 “jump” → v=[1 n ... 1 n ] eT v=1 Markov chain αP + (1 − α)veT x = x unique x ⇒ j ≥ 0, eT x = 1. Linear system ( − αP)x = (1 − α)v Small detail dangling nodes patched back to v Gleich (Stanford) PageRank intro Ph.D. Defense 8 / 41
  • 9. Other uses for PageRank What else people use PageRank to do GeneRank ProteinRank NM_003748 NM_003862 Contig32125_RC U82987 AB037863 NM_020974 Contig55377_RC NM_003882 NM_000849 Contig48328_RC IsoRank Contig46223_RC NM_006117 NM_003239 NM_018401 AF257175 AF201951 NM_001282 Contig63102_RC NM_000286 Contig34634_RC NM_000320 AB033007 AL355708 NM_000017 NM_006763 AF148505 Contig57595 NM_001280 AJ224741 U45975 Contig49670_RC Contig753_RC Contig25055_RC Contig53646_RC Contig42421_RC Contig51749_RC AL137514 NM_004911 NM_000224 NM_013262 Contig41887_RC NM_004163 AB020689 NM_015416 Contig43747_RC NM_012429 AB033043 AL133619 NM_016569 NM_004480 NM_004798 Contig37063_RC NM_000507 AB037745 Contig50802_RC NM_001007 Contig53742_RC NM_018104 Contig51963 Contig53268_RC NM_012261 NM_020244 Contig55813_RC Contig27312_RC Contig44064_RC NM_002570 NM_002900 AL050090 NM_015417 Contig47405_RC NM_016337 Contig55829_RC Contig37598 Contig45347_RC NM_020675 NM_003234 AL080110 AL137295 Contig17359_RC NM_013296 NM_019013 AF052159 Contig55313_RC NM_002358 NM_004358 Contig50106_RC NM_005342 NM_014754 U58033 Contig64688 NM_001827 Contig3902_RC Contig41413_RC NM_015434 NM_014078 NM_018120 NM_001124 L27560 Contig45816_RC AL050021 NM_006115 NM_001333 NM_005496 Contig51519_RC Contig1778_RC NM_014363 NM_001905 NM_018454 NM_002811 NM_004603 AB032973 NM_006096 D25328 Contig46802_RC X94232 NM_018004 Contig8581_RC Clustering Contig55188_RC Contig50410 Contig53226_RC NM_012214 NM_006201 NM_006372 Contig13480_RC AL137502 Contig40128_RC NM_003676 NM_013437 Contig2504_RC AL133603 NM_012177 R70506_RC NM_003662 NM_018136 NM_000158 NM_018410 Contig21812_RC NM_004052 Contig4595 Contig60864_RC NM_003878 U96131 NM_005563 NM_018455 Contig44799_RC NM_003258 NM_004456 NM_003158 NM_014750 Contig25343_RC NM_005196 Contig57864_RC NM_014109 NM_002808 Contig58368_RC Contig46653_RC NM_004504 M21551 NM_014875 NM_001168 NM_003376 NM_018098 AF161553 NM_020166 NM_017779 NM_018265 AF155117 NM_004701 NM_006281 Contig44289_RC NM_004336 Contig33814_RC (graph partitioning) NM_003600 NM_006265 NM_000291 NM_000096 NM_001673 NM_001216 NM_014968 NM_018354 NM_007036 NM_004702 Contig2399_RC NM_001809 Contig20217_RC NM_003981 NM_007203 NM_006681 AF055033 NM_014889 NM_020386 NM_000599 Contig56457_RC NM_005915 Contig24252_RC Contig55725_RC NM_002916 NM_014321 NM_006931 AL080079 Contig51464_RC NM_000788 NM_016448 X05610 NM_014791 Contig40831_RC AK000745 NM_015984 NM_016577 Contig32185_RC AF052162 AF073519 NM_003607 NM_006101 NM_003875 Contig25991 Contig35251_RC NM_004994 NM_000436 NM_002073 NM_002019 NM_000127 NM_020188 AL137718 Contig28552_RC Contig38288_RC AA555029_RC NM_016359 Contig46218_RC Contig63649_RC AL080059 10 20 30 40 50 60 70 Sports ranking Use ( − αGD−1 )x = w to find “nearby” important genes. Teaching Morrison et al. GeneRank, 2005. Gleich (Stanford) PageRank intro Ph.D. Defense 9 / 41
  • 10. My other projects Prior PageRank Parallel Krylov Methods Approximate Personal Gleich, Zhukov, and Berkhin , Yahoo! Research Labs PageRank Technical Report, YRL-2004-038; Gleich and Zhukov, Gleich and Polito, Internet Math. 3(3):257 294, SuperComputing poster, 2005. 2007. Does existing software work for computing PageRank Can you build a web search engine on your PC? on a cluster? Parameterized Matrix Ongoing Network Alignment Problems Come back here for (with Mohsen Bay- j Square j s r (with Paul Constantine) his defense on Monday, ati, Margot Gerritsen, June 1st at 1:30pm! Amin Saberi, and Ying A(s)x(s) = b(s) Wang) t t My Software Packages Publications MatlabBGL vismatrix Random α PageRank libbvg parameterized Inner-Outer PageRank matrix package gaimc (with Paul) Gleich (Stanford) PageRank intro Ph.D. Defense 10 / 41
  • 11. PageRank intro Sensitivity Sensitivity Random sensitivity Slide 11 of 41 Inner-Outer Summary
  • 12. Which sensitivity? Sensitivity to the links : examined and understood Sensitivity to the jump : examined, understood, and useful Sensitivity to α : less well understood Gleich (Stanford) Sensitivity Ph.D. Defense 12 / 41
  • 13. PageRank on Wikipedia α = 0.50 α = 0.85 α = 0.99 United States United States C:Contents C:Living people C:Main topic classif. C:Main topic classif. France C:Contents C:Fundamental Germany C:Living people United States England C:Ctgs. by country C:Wikipedia admin. United Kingdom United Kingdom P:List of portals Canada C:Fundamental P:Contents/Portals Japan C:Ctgs. by topic C:Portals Poland C:Wikipedia admin. C:Society Australia France C:Ctgs. by topic Note Top 10 articles on Wikipedia with highest PageRank Gleich (Stanford) Sensitivity Ph.D. Defense 13 / 41
  • 14. The PageRank function Look at the PageRank vector as a function of α ( − αP)x(α) = (1 − α)v and examine its derivative. My Contributions Gleich, Glynn, Golub, Greif, Dagstuhl proceedings, 2007. Others Compute the derivative with just PageRank becomes simple PageRank solves. more sensitive as α → 1. Empirically evaluated the PageRank vector at derivative as a rank change α = 1 well defined. predictor. α matters! Golub and Greif, 2004; Boldi et al., 2005; Berkhin, 2005; Langville and Meyer, 2006. Gleich (Stanford) Sensitivity Ph.D. Defense 14 / 41
  • 15. PageRank intro Random Sensitivity sensitivity Random sensitivity Slide 15 of 41 Inner-Outer Summary
  • 16. What is alpha? Author α Brin and Page (1998) 0.85 Najork et al. (2007) 0.85 Litvak et al. (2006) 0.5 Experiment (slide 20) 0.375 Algorithms (...) ≥ 0.85 For you, α is clear Google wants PageRank for everyone Gleich (Stanford) Random sensitivity Ph.D. Defense 16 / 41
  • 17. Multiple surfers Each person picks α from distribution A ... ↓ ↓ x(E [A]) E [x(A)] x(E [A]) = E [x(A)] Gleich (Stanford) Random sensitivity Ph.D. Defense 17 / 41
  • 18. Random alpha PageRank RAPr Model PageRank as the random variables x(A) and look at E [x(A)] and Std [x(A)] . Gleich and Constantine, Workshop on Algorithms on the Web Graph, 2007 Gleich (Stanford) Random sensitivity Ph.D. Defense 18 / 41
  • 19. What is A? Beta(0,0,0.6,0.9) Beta(2,16,0,1) Beta(1,1,0.1,0.9) Beta(−0.5,−0.5,0.2,0.7) 0 1 Bet ( , b, , r) Gleich (Stanford) Random sensitivity Ph.D. Defense 19 / 41
  • 20. Alpha is 2 Histogram 1.8 Density Fit Beta(1.5,0.5) 1.6 mean 0.375 1.4 mode 0.25 1.2 density 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α Data provided by Abraham Flaxman and Asela Gunawardana at Microsoft. Gleich (Stanford) Random sensitivity Ph.D. Defense 20 / 41
  • 21. Example x1 3 x 2 2 5 x 3 4 x4 1 6 x 5 x 6 0 0.5 Gleich (Stanford) Random sensitivity Ph.D. Defense 21 / 41
  • 22. What changes? x(A) A ∼ Bet ( , b, , r) with 0 ≤ < r ≤ 1 1. E [ (A)] ≥ 0 and E [x(A)] = 1; thus E [x(A)] is a probability distribution. ∞ 2. E [x(A)] = ℓ=0 E Aℓ − Aℓ+1 Pℓ v; thus we can interpret E [x(A)] in length-ℓ paths. 3. for page with no in-links, (A) = (1 − A) ; thus E [ (A)] = (E [A]) and Std [ (A)] = Std [A] But is this one useful? Gleich (Stanford) Random sensitivity Ph.D. Defense 22 / 41
  • 23. RAPr on Wikipedia E [x(A)] Std [x(A)] United States United States C:Living people C:Living people France C:Main topic classif. United Kingdom C:Contents Germany C:Ctgs. by country England United Kingdom Canada France Japan C:Fundamental Poland England Australia C:Ctgs. by topic Gleich (Stanford) Random sensitivity Ph.D. Defense 23 / 41
  • 24. Std vs. PageRank Does it tell us more than just PageRank? uk2006 — 77M nodes and 2B edges 1 k 1 isim(k) = k =1 2 |Diff[Y(1: ), Z(1: )]| Disjoint 1 Std[x(A )] vs. x(0.85) 1 Std[x(A2)] vs. x(0.5) Kendall’s τ 0.8 τ(x(E1 ), S1 ) = +0.3 Intersection Similarity (k) Std[x(A )] vs. x(0.85) 3 0.6 τ(x(E2 ), S2 ) = −0.5 0.4 τ(x(0.85), S3 ) = −0.2 0.2 Identical 0 0 2 4 6 10 10 10 10 k A1 ∼ Bet (2, 16, [0, 1]) A2 ∼ Bet (1, 1, [0, 1]) A3 ∼ Bet (0.5, 1.5, [0, 1]) Gleich (Stanford) Random sensitivity Ph.D. Defense 24 / 41
  • 25. Computation 1. monte carlo 1 N E [x(A)] = N =1 x(α ) α ∼A 2. path damping N E [x(A)] ≈ =0 E A − A +1 P v 3. quadrature r N E [x(A)] = x(α) dρ(α) ≈ =1 x(ζ )ω Gleich (Stanford) Random sensitivity Ph.D. Defense 25 / 41
  • 26. Time cnr2000 — 325k nodes and 3M edges 0 10 −5 10 −10 10 Monte Carlo Path Damping Quadrature −15 10 −2 −1 0 1 2 3 4 10 10 10 10 10 10 10 Time (sec) Gleich (Stanford) Random sensitivity Ph.D. Defense 26 / 41
  • 27. Convergence theory Method Conv. Work Required What is N? 1 number of Monte Carlo N PageRank systems N samples from A Path Damping r N+2 N + 1 matrix vector terms of (without N1+ products Neumann series Std [x(A)]) number of Gaussian r 2N N PageRank systems quadrature Quadrature points and r are parameters from Bet ( , b, , r) Gleich (Stanford) Random sensitivity Ph.D. Defense 27 / 41
  • 28. Webspam application Hosts of uk-2006 are labeled as spam, not-spam, other P R f FP FN Baseline 0.694 0.558 0.618 0.034 0.442 Beta(0.5,1.5) 0.695 0.561 0.621 0.034 0.439 Beta(1,1) 0.698 0.562 0.622 0.033 0.438 Beta(2,16) 0.699 0.562 0.623 0.033 0.438 Note Bagged (10) J48 decision tree classifier in Weka, mean of 50 repetitions from 10-fold cross-validation of 4948 non-spam and 674 spam hosts (5622 total). Becchetti et al. Link analysis for Web spam detection, 2008. Gleich (Stanford) Random sensitivity Ph.D. Defense 28 / 41
  • 29. PageRank intro Sensitivity Inner-Outer Random sensitivity Slide 29 of 41 Inner-Outer Summary
  • 30. Motivation Why another PageRank algorithm? For the RAPr codes, we need 1. reliable code 2. fast code over a range of α’s fancy → Use Matlab’s “” 3. code for big problems → Use a Gauss-Seidel or custom Richardson method 4. code with only matvec products → Use the inner-outer iteration 5. code with only 2 vectors of memory → Use the power method simple Gleich (Stanford) Inner-Outer Ph.D. Defense 30 / 41
  • 31. Inner-Outer Note PageRank is easier when α is smaller Thus Solve PageRank with itself using β < α! Outer ( − βP)x(k+1) = (α − β)Px(k) + (1 − α)v ≡ f(k) Inner y(j+1) = βPy(j) + (α − β)Px(k) + (1 − α)v A new parameter? What is β? 0.5 How many inner iterations? Until a residual of 10−2 Gray, Greif, Lau, 2007. Gleich (Stanford) Inner-Outer Ph.D. Defense 31 / 41
  • 32. Inner-Outer algorithm Input: P, v, α, τ, (β = 0.5, η = 10−2 ) Output: x if 0 ≤ β ≤ α, 1: x ← v convergence with 2: y ← Px any η 3: while αy + (1 − α)v − x 1 ≥ τ uses only three 4: f ← (α − β)y + (1 − α)v vectors of memory 5: repeat 6: x ← f + βy β = 0.5, η = 10−2 7: y ← Px often faster than the 8: until f + βy − x 1 < η power method 9: end while (or just a titch slower) 10: x ← αy + (1 − α)v Note Note that the inner-loop checks its condition after doing one iteration. Gleich (Stanford) Inner-Outer Ph.D. Defense 32 / 41
  • 33. Performance wb−edu, α = 0.85 wb−edu, α = 0.99 0 10 0 10 −1 0 10 10 −1 10 10 0 −2 10 −2 −2 10 10 10 −2 5 10 15 20 20 40 −3 −3 10 10 Residual Residual −4 −4 10 10 −5 −5 10 10 −6 −6 10 10 power power inout inout −7 −7 10 10 10 20 30 40 50 60 70 80 200 400 600 800 1000 1200 Multiplication Multiplication τ = 10−7 , β = 0.5, η = 10−2 ; wb-edu graph (9.8M nodes, 57.M edges) Gleich (Stanford) Inner-Outer Ph.D. Defense 33 / 41
  • 34. Extensions 1. A large scale shared-memory parallel version on compressed web graphs 2. A Gauss-Seidel variant 3. A BiCG-STAB preconditioner 4. A conjecture about the performance of the iteration 5. Showed the algorithm converges for “any” β, η Gleich, Gray, Greif, Lau, submitted. Gleich (Stanford) Inner-Outer Ph.D. Defense 34 / 41
  • 35. Convergence Result Sketch of convergence result 1. error after j steps of the inner iteration j−1 α−β f(j) = αβj−1 Pj + βℓ Pℓ f(0) β ℓ=1 2. upper bound error by (α − β) + (1 − α)βj f(j) ≤ f(0) . 1−β 3. notice f(j) ≤ α f(0) , j ≥ 1 4. hence, convergence as long as β ≤ α Gleich (Stanford) Inner-Outer Ph.D. Defense 35 / 41
  • 36. PageRank intro Sensitivity Summary Random sensitivity Slide 36 of 41 Inner-Outer Summary
  • 37. Conclusions α matters sensitivity is useful everything is just PageRank Gleich (Stanford) Summary Ph.D. Defense 37 / 41
  • 38. Contributions 1. Derivative Gleich, Glynn, Golub, Greif, 2007. New technique to compute the derivative using just PageRank 2. RAPr 3. Inner-Outer Constantine and Gleich, 2007; Constantine, Gleich, Gleich, Gray, Greif, Lau, submitted. and Iaccarino, submitted. New PageRank model and Improved convergence sensitivity measure analysis Range of algorithms and Gauss-Seidel and algorithmic analysis preconditioning variants Empirically helpful for Shared-memory parallel spam identification implementation Robust software Robust software Gleich (Stanford) Summary Ph.D. Defense 38 / 41
  • 39. Thanks! Michael Saunders (My Advisor) Hector Garcia-Molina Chen Greif Art Owen Amin Saberi Gleich (Stanford) Summary Ph.D. Defense 39 / 41
  • 41. Margot Gerritsen Debbie Heimowitz Peter Glynn Jason Azicri Walter Murray Steven Fan Reid Andersen Paul Constantine Pavel Berkhin Michael Atkinson Kevin Lang Jeremy Kozdon Amy Langville Esteban Arcaute Matthew Rasmussen Sebastiano Vigna Adam Guetz Will Fong THANK Leonid Zhukov Andrew Bradley Indira Choudhury Seth Tornborg Nick Henderson Chris Maes YOU Brian Tempero Nicole Taheri Prisilla Williams Ying Wang Deb Michael Nick West Mayita Romero Kaustuv's Rum Les Fletcher Saeco Coffee Machine Hugh Fletcher Napa Valley Lindsey Fletcher Matlab Jane Fletcher superlu