SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Numerical computing &
Google’s PageRank

DAVID F. GLEICH, CS 197 PRESENTATION
Hey Katie, do you have a
  date for Valentine’s Day? 




It was
1234567890
in 2009.
Thanks Internet!
                                
  http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html
              http://listsoplenty.com/pix/tag/cartoon
         https://www.facebook.com/ProgrammersJokes
http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-
                     on-valentines-day.html
og le
                           Go
               Thanks Internet!


    n                   ks
                                 




 ha
  http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html




T
              http://listsoplenty.com/pix/tag/cartoon
         https://www.facebook.com/ProgrammersJokes
http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-
                     on-valentines-day.html
How did Google get started?
How did Google get started?
… with an idea … 
… on the shoulders of giants!
LEO KATZ
Vannevar Bush
“wholly new forms of
encyclopedias will appear,
ready made with a mesh of
associative trails running
through them, ready to be
dropped into the memex and
there amplified” 
-- “As we may think” The Atlantic, July 1945
Sir Tim Berners-Lee
“We should work towards a
universal linked information
system … to allow a place for
any information or reference
one felt was important and a
way of finding it afterwards.”
 -- Founding proposal for “the mesh”, 1989
… the mesh became the web 
… the web became a mess
... “finding it afterwards”? Hah!
Larry Page "
Sergey Brin
•  Grad students at Stanford
•  Worked with Terry Winograd
   (artificial intelligence)
•  Created a web-search
   algorithm called “backrub”
•  Spun-off a company “Googol”
•  Worth about $20 billion each
A cartoon websearch primer
1.  Crawl webpages
2.  Analyze webpage text (information retrieval)
3.  Analyze webpage links
4.  Fit measures to human evaluations
5.  Produce rankings
6.  Continuously update
SportsIllustrated.com

BobsPortsIllustrated.com
1
             2
to

         3
What pages are
important?
Those that people visit a lot!
How to we check?
Create a model of how people
visit the web.
What pages are
important?
The Google random surfer
•  Follows a random link with
   probability alpha"
   “random clicks”
•  Goes anywhere with
   probability (1-alpha)"
   “random jumps”
This is a Markov chain!
Andrei Markov
•  Studied sequences of random
   variables.
•  The probability that the random
   variable takes a particular value
   only depends on it’s current value.
•  The “page id” is the “random
   variable” in the Markov chain!
Oskar Perron"
Georg Frobenius
•  Simultaneously discovered
   when a Markov chain has an
   “average” 
•  The “average” of the web? It’s
   the probability of finding the
   random surfer at a page.
•  In 1907
What pages are
important?
Perron and Frobenius proved the
following algorithm always
converges to a solution…
set prob[i] = 0 for all pages
set p to a random page
for t = 1 to ...
  increment prob[p]
  if rand() < alpha,
    set p to a random neighbor of p
  else, set p to a random page
Richard von Mises
•  Created “the power method”
•  An efficient algorithm to
   “average” a Markov chain
•  It updated the probabilities of
   all pages at once.
“Praktische Verfahren der Gleichungsauflösung”"
R. von Mises and H. Pollaczek-Geiringer, 1929
What pages are
important?
Using the von Mises method …

set prob[i] = 1/n for all pages
for t = 1 to about 80
  set newprob[i] = 0 for all pages
  for all links from page i to page j
    set newprob[j] += prob[i]/deg[i]
  for all pages I
    set prob[i] = alpha*newprob[i] +
                   (1-alpha)/n
That algorithm underlying
Google’s analysis of the web is
from 1929!
Leo Katz
That’s
           not qu
   right W        ite
          ikipedi
                 a!
Leo Katz
A new status index (1953)"
Leo Katz
A paper about how information spreads in groups … 
“For example, the information that the new high-
school principal is unmarried and handsome might
occasion a violent reaction in a ladies' garden club
and hardly a ripple of interest in a luncheon group of
the local chamber of commerce. On the other hand,
the luncheon group might be anything but apathetic
in its response to information concerning a fractional
change in credit buying restrictions announced by the
federal government.”
… there were many other
    shoulders too …
Gene Golub
                             
                             Popularized numerical computing with
                             matrices via the informal “Golub thesis”
                             
                             “anything worth computing can be
                             stated as a matrix problem”
                             




                William Kahan
                                             
Formalized IEEE-754 floating point arithmetic.
                                             
Make it possible to compute with probabilities
 as “real numbers” instead of discrete counts.
Credits



Most pictures taken from Google image search.
Original idea from Massimo Franceschet.
“PageRank: Standing on the shoulders of giants”

Contenu connexe

En vedette

Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
David Gleich
 

En vedette (20)

MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 

Similaire à A history of PageRank from the numerical computing perspective

Thriving In The 21st Century: Speaking the Language of the Digital Native
Thriving In The 21st Century: Speaking the Language of the Digital NativeThriving In The 21st Century: Speaking the Language of the Digital Native
Thriving In The 21st Century: Speaking the Language of the Digital Native
Glenn Wiebe
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
caise2013vlc
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
PROS-UPV
 
Thriving in the 21st century
Thriving in the 21st centuryThriving in the 21st century
Thriving in the 21st century
Glenn Wiebe
 
From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011
From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011
From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011
ajai
 

Similaire à A history of PageRank from the numerical computing perspective (20)

Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IA
 
BL Labs 2014 Symposium: The Mechanical Curator
BL Labs 2014 Symposium: The Mechanical CuratorBL Labs 2014 Symposium: The Mechanical Curator
BL Labs 2014 Symposium: The Mechanical Curator
 
SEO What? (SEO and Journalism)
SEO What? (SEO and Journalism)SEO What? (SEO and Journalism)
SEO What? (SEO and Journalism)
 
Ficod 2011 (keynote file)
Ficod 2011 (keynote file)Ficod 2011 (keynote file)
Ficod 2011 (keynote file)
 
The Mobile Frontier
The Mobile FrontierThe Mobile Frontier
The Mobile Frontier
 
NDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - KeynoteNDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - Keynote
 
Researchers, Discovery and the Internet: What Next?
Researchers, Discovery and the Internet: What Next?Researchers, Discovery and the Internet: What Next?
Researchers, Discovery and the Internet: What Next?
 
Thriving In The 21st Century: Speaking the Language of the Digital Native
Thriving In The 21st Century: Speaking the Language of the Digital NativeThriving In The 21st Century: Speaking the Language of the Digital Native
Thriving In The 21st Century: Speaking the Language of the Digital Native
 
Soulitarian City: Looking for the Hacker Ethic in Glasgow by Pat Kane, The Pl...
Soulitarian City: Looking for the Hacker Ethic in Glasgow by Pat Kane, The Pl...Soulitarian City: Looking for the Hacker Ethic in Glasgow by Pat Kane, The Pl...
Soulitarian City: Looking for the Hacker Ethic in Glasgow by Pat Kane, The Pl...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Putting IT back in reality
Putting IT back in reality   Putting IT back in reality
Putting IT back in reality
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Big data in the web
Big data in the webBig data in the web
Big data in the web
 
Introduction to Semantic Web
Introduction to Semantic WebIntroduction to Semantic Web
Introduction to Semantic Web
 
Thriving in the 21st century
Thriving in the 21st centuryThriving in the 21st century
Thriving in the 21st century
 
From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011
From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011
From bit-streams-to-life-streams-ajai-narendran-srishti-bangalore-stff-2011
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and PublishingAI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
 
Design, AI, and "-isms"
Design, AI, and "-isms"Design, AI, and "-isms"
Design, AI, and "-isms"
 

Plus de David Gleich

Plus de David Gleich (12)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for Hadoop
 

A history of PageRank from the numerical computing perspective

  • 1. Numerical computing & Google’s PageRank DAVID F. GLEICH, CS 197 PRESENTATION
  • 2. Hey Katie, do you have a date for Valentine’s Day? It was 1234567890 in 2009.
  • 3. Thanks Internet! http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
  • 4. og le Go Thanks Internet! n ks ha http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html T http://listsoplenty.com/pix/tag/cartoon https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890- on-valentines-day.html
  • 5. How did Google get started?
  • 6. How did Google get started? … with an idea … … on the shoulders of giants!
  • 8. Vannevar Bush “wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified” -- “As we may think” The Atlantic, July 1945
  • 9. Sir Tim Berners-Lee “We should work towards a universal linked information system … to allow a place for any information or reference one felt was important and a way of finding it afterwards.” -- Founding proposal for “the mesh”, 1989
  • 10. … the mesh became the web … the web became a mess ... “finding it afterwards”? Hah!
  • 11. Larry Page " Sergey Brin •  Grad students at Stanford •  Worked with Terry Winograd (artificial intelligence) •  Created a web-search algorithm called “backrub” •  Spun-off a company “Googol” •  Worth about $20 billion each
  • 12. A cartoon websearch primer 1.  Crawl webpages 2.  Analyze webpage text (information retrieval) 3.  Analyze webpage links 4.  Fit measures to human evaluations 5.  Produce rankings 6.  Continuously update
  • 14. 1 2 to 3
  • 15. What pages are important? Those that people visit a lot! How to we check? Create a model of how people visit the web.
  • 16. What pages are important? The Google random surfer •  Follows a random link with probability alpha" “random clicks” •  Goes anywhere with probability (1-alpha)" “random jumps”
  • 17. This is a Markov chain!
  • 18. Andrei Markov •  Studied sequences of random variables. •  The probability that the random variable takes a particular value only depends on it’s current value. •  The “page id” is the “random variable” in the Markov chain!
  • 19. Oskar Perron" Georg Frobenius •  Simultaneously discovered when a Markov chain has an “average” •  The “average” of the web? It’s the probability of finding the random surfer at a page. •  In 1907
  • 20. What pages are important? Perron and Frobenius proved the following algorithm always converges to a solution… set prob[i] = 0 for all pages set p to a random page for t = 1 to ... increment prob[p] if rand() < alpha, set p to a random neighbor of p else, set p to a random page
  • 21. Richard von Mises •  Created “the power method” •  An efficient algorithm to “average” a Markov chain •  It updated the probabilities of all pages at once. “Praktische Verfahren der Gleichungsauflösung”" R. von Mises and H. Pollaczek-Geiringer, 1929
  • 22. What pages are important? Using the von Mises method … set prob[i] = 1/n for all pages for t = 1 to about 80 set newprob[i] = 0 for all pages for all links from page i to page j set newprob[j] += prob[i]/deg[i] for all pages I set prob[i] = alpha*newprob[i] + (1-alpha)/n
  • 23. That algorithm underlying Google’s analysis of the web is from 1929!
  • 25. That’s not qu right W ite ikipedi a! Leo Katz
  • 26. A new status index (1953)" Leo Katz A paper about how information spreads in groups … “For example, the information that the new high- school principal is unmarried and handsome might occasion a violent reaction in a ladies' garden club and hardly a ripple of interest in a luncheon group of the local chamber of commerce. On the other hand, the luncheon group might be anything but apathetic in its response to information concerning a fractional change in credit buying restrictions announced by the federal government.”
  • 27. … there were many other shoulders too …
  • 28. Gene Golub Popularized numerical computing with matrices via the informal “Golub thesis” “anything worth computing can be stated as a matrix problem” William Kahan Formalized IEEE-754 floating point arithmetic. Make it possible to compute with probabilities as “real numbers” instead of discrete counts.
  • 29. Credits Most pictures taken from Google image search. Original idea from Massimo Franceschet. “PageRank: Standing on the shoulders of giants”