SlideShare une entreprise Scribd logo
1  sur  27
Deriving “The Google Matrix”:
       G = αS + (1- α)1/neeT
             Lecture 4
 B.S Physics 1993, University of Washington
 M.S EE 1998, Washington State (four patents)
 10+ Years in Search Marketing
 Founder of SEMJ.org (Research Journal)
 Blogger for SemanticWeb.com
 President of Future Farm Inc.
   Build a focused crawler in:
    Java, Python, PERL
 Point at MSU home page. Gather all the URLs and
  store for later use.
  http://www.montana.edu/robots.txt
 Store all the HTML and label with DocID.
 Read Google’s Paper. Next time Page Rank & the
  Google Matrix.
 Contest: Who can store the most unique URLS?
 Due Feb 7th (Next week). Send coded and URL list.
   #! /user/bin/python
   ### Basic Web Crawler in Python to Grab a URL from command
    line
   ## Use the urllib2 library for URLs, Use BeautifulSoup
   #
   from BeautifulSoup import BeautifulSoup
   import sys #allow users to input string
   import urllib2
   ####change user-agent name
   from urllib import FancyURLopener
   class MyOpener(FancyURLopener):
      version = 'BadBot/1.0'
   print MyOpener.version # print the user agent name
   httpResponse = urllib2.urlopen(sys.argv[1])
  #store html page in an object called htmlPage
 htmlPage = httpResponse.read()
 print htmlPage
 htmlDom = BeautifulSoup(htmlPage)
 # dump page title
 print htmlDom.title.string
 # dump all links in page
 allLinks = htmlDom.findAll('a', {'href': True})
 for link in allLinks:
 print link['href']
#Print name of Bot
 MyOpener.version
 Open source Java-based crawler
 https://webarchive.jira.com/wiki/display/H
  eritrix/Heritrix;jsessionid=AE9A595F01C
  AAB59BBCDC50C8A3ED2A9
 http://www.robotstxt.org/robotstxt.html
 http://www.commoncrawl.org/
1               2



        3




    6               5




            4
r(Pi) = Σr(Pj)/|Pj|
                    PjΞBPi



• r(Pi) is page rank of Page Pi
• Pj is number of outlinks from page Pj
• BPi is set of pages pointing into Pi
   r(Pj) values of Inlinking page is unknown. Need a starting value.
     Could initialize the values to 1/n (number of pages)




   R0(Pi) = 1/n for all pages Pi


   Process is repeated until a stable value is obtained (Will not
    happen in all cases). Will this converge?
R k + 1(Pi) = Σrk(Pj)/|Pj|
                       PjΞBPi



•   R k + 1 PageRank at of Pi at iteration K + 1
•   Ro(Pi) = 1/n, where in is all nodes
•   r(Pi) is page rank of Page Pi
•   Pj is number of outlinks from page Pj
•   BPi is set of pages pointing into Pi
Iteration 0    Iteration 1     Iteration2       Rankk= 2

r0(P1) = 1/6   r1(P1) = 1/18   r2(P1) = 1/36          5

r0(P2) = 1/6   r1(P2) = 5/36   r2(P2) = 1/18          4

r0(P3) = 1/6   r1(P3) = 1/12   r2(P3) = 1/36          5

r0(P4) = 1/6   r1(P4) = 1/4    r2(P4) = 17/72         1

r0(P5) = 1/6   r1(P5) = 5/36   r2(P5) = 11/72         3

r0(P6) = 1/6   r1(P6) = 1/6    R2(P6) = 14/72         2
[nxm] * [mxr] = nxr
•   Non-zero row elements i are outlinking
    pages of page i

•   Non-zero column elements I are inlinking
    pages of page i
π (k + 1) T =   π (k)T*H

Where: πT is a 1x n row vector
• Rank sinks & Convergence
• Resembles work done on Markov Chains
    • H = transitional probability matrix
    • Converges to a unique positive vector if
      • Stochastic: Each row sum = 1
      • Irreducible: Non-zero probability of transitioning
        (even if more than one state) to any other state.
      • Aperiodic: No requirements on how many steps
        to get to a state i. Can be irregular.
      • Primitive: Irreducible and Periodic
Next state depends on current state (no memory)
•   “Random Surfer” Model
    • Following hyperlinks
    • Time spent on a page is proportional to its
      importance.
    • Fixes the “dangling node” problem. Surfer gets
      stuck on a node. Pdf files, images, etc.
    • Need to allow surfer to “teleport” or make
      random jumps.
S = H + a(1/n *eT)


Where: ai = 1 if page i is dangling otherwise
 0.

  eT(1x6) = all 1’s, n = number of nodes
Serendipity?: Page and Brin introduced an
 “adjustment”. Random Surfer can “teleport”
 and enter a new destination into a browser.
• Teleportation matrix: E = 1/n * eeT
• α controls the proportion of time a “rand
  surfer” follows hyperlinks as opposed to
  teleporting. If = 0.5 then half the time is
  spent doing both.
• At 0.5 about 34 iterations required to
  converge to a tolerance of 10^-10.
• Originally set at 0.85. As it -> 1
  computation time grows. Sensitivity issue.
G = αS + (1-   α)1/nee T
π (k + 1) T =         π (k)T*G


*2002 World’s largest matrix computation. Order in
   2002 ~8.1 x10^9 !
G = αH + (αa + (1-α)e)1/neT
PageRank and The Google Matrix

Contenu connexe

Tendances

Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLenz Gschwendtner
 
Taming Rich GML with Stetl - FOSS4G 2013 Nottingham
Taming Rich GML with Stetl - FOSS4G 2013 NottinghamTaming Rich GML with Stetl - FOSS4G 2013 Nottingham
Taming Rich GML with Stetl - FOSS4G 2013 NottinghamJust van den Broecke
 
Stetl for INSPIRE Data Transformation
Stetl for INSPIRE Data TransformationStetl for INSPIRE Data Transformation
Stetl for INSPIRE Data TransformationJust van den Broecke
 
Geospatial ETL with Stetl - GeoPython 2016
Geospatial ETL with Stetl - GeoPython 2016Geospatial ETL with Stetl - GeoPython 2016
Geospatial ETL with Stetl - GeoPython 2016Just van den Broecke
 
15CS664 Python Question Bank-3
15CS664 Python Question Bank-315CS664 Python Question Bank-3
15CS664 Python Question Bank-3Syed Mustafa
 
Golang concurrency design
Golang concurrency designGolang concurrency design
Golang concurrency designHyejong
 
IT talk "Python language evolution"
IT talk "Python language evolution"IT talk "Python language evolution"
IT talk "Python language evolution"DataArt
 
Need 4 speed
Need 4 speedNeed 4 speed
Need 4 speedalikonweb
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools Yulia Shcherbachova
 
Bind Python and C @ COSCUP 2015
Bind Python and C @ COSCUP 2015Bind Python and C @ COSCUP 2015
Bind Python and C @ COSCUP 2015Jian-Hong Pan
 
APMG juni 2014 - Regular Expression
APMG juni 2014 - Regular ExpressionAPMG juni 2014 - Regular Expression
APMG juni 2014 - Regular ExpressionByte
 
15CS664- Python Application Programming- Question bank 1
15CS664- Python Application Programming- Question bank 115CS664- Python Application Programming- Question bank 1
15CS664- Python Application Programming- Question bank 1Syed Mustafa
 
Linux Administration (Revised Syllabus) [QP / May - 2016]
Linux Administration (Revised Syllabus) [QP / May - 2016]Linux Administration (Revised Syllabus) [QP / May - 2016]
Linux Administration (Revised Syllabus) [QP / May - 2016]Mumbai B.Sc.IT Study
 
Linux Administration (Revised Syllabus) [QP / October - 2012]
Linux Administration (Revised Syllabus) [QP / October - 2012]Linux Administration (Revised Syllabus) [QP / October - 2012]
Linux Administration (Revised Syllabus) [QP / October - 2012]Mumbai B.Sc.IT Study
 
Infecting Python Bytecode
Infecting Python BytecodeInfecting Python Bytecode
Infecting Python BytecodeIftach Ian Amit
 
Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Robert Schadek
 
Lua. The Splendors and Miseries of Game Scripting
Lua. The Splendors and Miseries of Game ScriptingLua. The Splendors and Miseries of Game Scripting
Lua. The Splendors and Miseries of Game ScriptingDevGAMM Conference
 
Working with file(35,45,46)
Working with file(35,45,46)Working with file(35,45,46)
Working with file(35,45,46)Dishant Modi
 

Tendances (20)

Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talk
 
Taming Rich GML with Stetl - FOSS4G 2013 Nottingham
Taming Rich GML with Stetl - FOSS4G 2013 NottinghamTaming Rich GML with Stetl - FOSS4G 2013 Nottingham
Taming Rich GML with Stetl - FOSS4G 2013 Nottingham
 
Go. Why it goes
Go. Why it goesGo. Why it goes
Go. Why it goes
 
Stetl for INSPIRE Data Transformation
Stetl for INSPIRE Data TransformationStetl for INSPIRE Data Transformation
Stetl for INSPIRE Data Transformation
 
Geospatial ETL with Stetl - GeoPython 2016
Geospatial ETL with Stetl - GeoPython 2016Geospatial ETL with Stetl - GeoPython 2016
Geospatial ETL with Stetl - GeoPython 2016
 
15CS664 Python Question Bank-3
15CS664 Python Question Bank-315CS664 Python Question Bank-3
15CS664 Python Question Bank-3
 
5 Minute Intro to Stetl
5 Minute Intro to Stetl5 Minute Intro to Stetl
5 Minute Intro to Stetl
 
Golang concurrency design
Golang concurrency designGolang concurrency design
Golang concurrency design
 
IT talk "Python language evolution"
IT talk "Python language evolution"IT talk "Python language evolution"
IT talk "Python language evolution"
 
Need 4 speed
Need 4 speedNeed 4 speed
Need 4 speed
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools
 
Bind Python and C @ COSCUP 2015
Bind Python and C @ COSCUP 2015Bind Python and C @ COSCUP 2015
Bind Python and C @ COSCUP 2015
 
APMG juni 2014 - Regular Expression
APMG juni 2014 - Regular ExpressionAPMG juni 2014 - Regular Expression
APMG juni 2014 - Regular Expression
 
15CS664- Python Application Programming- Question bank 1
15CS664- Python Application Programming- Question bank 115CS664- Python Application Programming- Question bank 1
15CS664- Python Application Programming- Question bank 1
 
Linux Administration (Revised Syllabus) [QP / May - 2016]
Linux Administration (Revised Syllabus) [QP / May - 2016]Linux Administration (Revised Syllabus) [QP / May - 2016]
Linux Administration (Revised Syllabus) [QP / May - 2016]
 
Linux Administration (Revised Syllabus) [QP / October - 2012]
Linux Administration (Revised Syllabus) [QP / October - 2012]Linux Administration (Revised Syllabus) [QP / October - 2012]
Linux Administration (Revised Syllabus) [QP / October - 2012]
 
Infecting Python Bytecode
Infecting Python BytecodeInfecting Python Bytecode
Infecting Python Bytecode
 
Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...
 
Lua. The Splendors and Miseries of Game Scripting
Lua. The Splendors and Miseries of Game ScriptingLua. The Splendors and Miseries of Game Scripting
Lua. The Splendors and Miseries of Game Scripting
 
Working with file(35,45,46)
Working with file(35,45,46)Working with file(35,45,46)
Working with file(35,45,46)
 

Similaire à PageRank and The Google Matrix

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
 
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019Rafał Leszko
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph ProcessingVasia Kalavri
 
02 functions, variables, basic input and output of c++
02   functions, variables, basic input and output of c++02   functions, variables, basic input and output of c++
02 functions, variables, basic input and output of c++Manzoor ALam
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetupsource{d}
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the webMichiel Borkent
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processingHarisankar H
 
Python for Scientific Computing
Python for Scientific ComputingPython for Scientific Computing
Python for Scientific ComputingAlbert DeFusco
 
JavaScript Foundations Day1
JavaScript Foundations Day1JavaScript Foundations Day1
JavaScript Foundations Day1Troy Miles
 
Lec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsLec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsYashJain47002
 
Value Objects, Full Throttle (to be updated for spring TC39 meetings)
Value Objects, Full Throttle (to be updated for spring TC39 meetings)Value Objects, Full Throttle (to be updated for spring TC39 meetings)
Value Objects, Full Throttle (to be updated for spring TC39 meetings)Brendan Eich
 
Go from a PHP Perspective
Go from a PHP PerspectiveGo from a PHP Perspective
Go from a PHP PerspectiveBarry Jones
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelArangoDB Database
 
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...NoSQLmatters
 
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios
 

Similaire à PageRank and The Google Matrix (20)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
02 functions, variables, basic input and output of c++
02   functions, variables, basic input and output of c++02   functions, variables, basic input and output of c++
02 functions, variables, basic input and output of c++
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the web
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
 
Python for Scientific Computing
Python for Scientific ComputingPython for Scientific Computing
Python for Scientific Computing
 
JavaScript Foundations Day1
JavaScript Foundations Day1JavaScript Foundations Day1
JavaScript Foundations Day1
 
Clojure intro
Clojure introClojure intro
Clojure intro
 
Lec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsLec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questions
 
l7-pointers.ppt
l7-pointers.pptl7-pointers.ppt
l7-pointers.ppt
 
Value Objects, Full Throttle (to be updated for spring TC39 meetings)
Value Objects, Full Throttle (to be updated for spring TC39 meetings)Value Objects, Full Throttle (to be updated for spring TC39 meetings)
Value Objects, Full Throttle (to be updated for spring TC39 meetings)
 
Clojure
ClojureClojure
Clojure
 
Go from a PHP Perspective
Go from a PHP PerspectiveGo from a PHP Perspective
Go from a PHP Perspective
 
go.ppt
go.pptgo.ppt
go.ppt
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
 
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
 
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
 

Plus de Sean Golliher

Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Sean Golliher
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:Sean Golliher
 
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...Sean Golliher
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - IndexingSean Golliher
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerSean Golliher
 

Plus de Sean Golliher (9)

Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
Goprez sg
Goprez  sgGoprez  sg
Goprez sg
 
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
 

Dernier

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

PageRank and The Google Matrix

  • 1. Deriving “The Google Matrix”: G = αS + (1- α)1/neeT Lecture 4
  • 2.  B.S Physics 1993, University of Washington  M.S EE 1998, Washington State (four patents)  10+ Years in Search Marketing  Founder of SEMJ.org (Research Journal)  Blogger for SemanticWeb.com  President of Future Farm Inc.
  • 3. Build a focused crawler in: Java, Python, PERL  Point at MSU home page. Gather all the URLs and store for later use. http://www.montana.edu/robots.txt  Store all the HTML and label with DocID.  Read Google’s Paper. Next time Page Rank & the Google Matrix.  Contest: Who can store the most unique URLS?  Due Feb 7th (Next week). Send coded and URL list.
  • 4. #! /user/bin/python  ### Basic Web Crawler in Python to Grab a URL from command line  ## Use the urllib2 library for URLs, Use BeautifulSoup  #  from BeautifulSoup import BeautifulSoup  import sys #allow users to input string  import urllib2  ####change user-agent name  from urllib import FancyURLopener  class MyOpener(FancyURLopener):  version = 'BadBot/1.0'  print MyOpener.version # print the user agent name  httpResponse = urllib2.urlopen(sys.argv[1])
  • 5.  #store html page in an object called htmlPage  htmlPage = httpResponse.read()  print htmlPage  htmlDom = BeautifulSoup(htmlPage)  # dump page title  print htmlDom.title.string  # dump all links in page  allLinks = htmlDom.findAll('a', {'href': True})  for link in allLinks:  print link['href'] #Print name of Bot  MyOpener.version
  • 6.  Open source Java-based crawler  https://webarchive.jira.com/wiki/display/H eritrix/Heritrix;jsessionid=AE9A595F01C AAB59BBCDC50C8A3ED2A9  http://www.robotstxt.org/robotstxt.html  http://www.commoncrawl.org/
  • 7. 1 2 3 6 5 4
  • 8. r(Pi) = Σr(Pj)/|Pj| PjΞBPi • r(Pi) is page rank of Page Pi • Pj is number of outlinks from page Pj • BPi is set of pages pointing into Pi
  • 9. r(Pj) values of Inlinking page is unknown. Need a starting value.  Could initialize the values to 1/n (number of pages)  R0(Pi) = 1/n for all pages Pi  Process is repeated until a stable value is obtained (Will not happen in all cases). Will this converge?
  • 10. R k + 1(Pi) = Σrk(Pj)/|Pj| PjΞBPi • R k + 1 PageRank at of Pi at iteration K + 1 • Ro(Pi) = 1/n, where in is all nodes • r(Pi) is page rank of Page Pi • Pj is number of outlinks from page Pj • BPi is set of pages pointing into Pi
  • 11. Iteration 0 Iteration 1 Iteration2 Rankk= 2 r0(P1) = 1/6 r1(P1) = 1/18 r2(P1) = 1/36 5 r0(P2) = 1/6 r1(P2) = 5/36 r2(P2) = 1/18 4 r0(P3) = 1/6 r1(P3) = 1/12 r2(P3) = 1/36 5 r0(P4) = 1/6 r1(P4) = 1/4 r2(P4) = 17/72 1 r0(P5) = 1/6 r1(P5) = 5/36 r2(P5) = 11/72 3 r0(P6) = 1/6 r1(P6) = 1/6 R2(P6) = 14/72 2
  • 12. [nxm] * [mxr] = nxr
  • 13. Non-zero row elements i are outlinking pages of page i • Non-zero column elements I are inlinking pages of page i
  • 14.
  • 15. π (k + 1) T = π (k)T*H Where: πT is a 1x n row vector
  • 16. • Rank sinks & Convergence • Resembles work done on Markov Chains • H = transitional probability matrix • Converges to a unique positive vector if • Stochastic: Each row sum = 1 • Irreducible: Non-zero probability of transitioning (even if more than one state) to any other state. • Aperiodic: No requirements on how many steps to get to a state i. Can be irregular. • Primitive: Irreducible and Periodic
  • 17. Next state depends on current state (no memory)
  • 18. “Random Surfer” Model • Following hyperlinks • Time spent on a page is proportional to its importance. • Fixes the “dangling node” problem. Surfer gets stuck on a node. Pdf files, images, etc. • Need to allow surfer to “teleport” or make random jumps.
  • 19.
  • 20. S = H + a(1/n *eT) Where: ai = 1 if page i is dangling otherwise 0. eT(1x6) = all 1’s, n = number of nodes
  • 21.
  • 22. Serendipity?: Page and Brin introduced an “adjustment”. Random Surfer can “teleport” and enter a new destination into a browser.
  • 23. • Teleportation matrix: E = 1/n * eeT • α controls the proportion of time a “rand surfer” follows hyperlinks as opposed to teleporting. If = 0.5 then half the time is spent doing both. • At 0.5 about 34 iterations required to converge to a tolerance of 10^-10. • Originally set at 0.85. As it -> 1 computation time grows. Sensitivity issue.
  • 24. G = αS + (1- α)1/nee T
  • 25. π (k + 1) T = π (k)T*G *2002 World’s largest matrix computation. Order in 2002 ~8.1 x10^9 !
  • 26. G = αH + (αa + (1-α)e)1/neT

Notes de l'éditeur

  1. Never taught this course in MT. Taught for MASCO last Jan.
  2. Never taught this course in MT. Taught for MASCO last Jan.
  3. Never taught this course in MT. Taught for MASCO last Jan.
  4. Never taught this course in MT. Taught for MASCO last Jan.
  5. Hyper text transer protocol…
  6. Never taught this course in MT. Taught for MASCO last Jan.
  7. Rows n and columns m. Inner dimensions must match.
  8. In this example initialize pi(o) matrix to [1/6, 1/6, 1/6, … ] multiply out times H and you get Iteration 1 in table 4.1 of book. This gives the same results as the page rank formula.
  9. A11 could be a probability that we stay where we are. A12 is probablity that we go to s@.
  10. The I refers to rows only. So if there is all zeros in a row then ai = 1. S is the same dimension as H. a is 6 x 1 and eT is 1 x 6 which gives 6 x 6 matrix Plus H. eT is all ones.
  11. The I refers to rows only. So if there is all zeros in a row then ai = 1. S is the same dimension as H. a is 6 x 1 and eT is 1 x 6 which gives 6 x 6 matrix Plus H. eT is all ones.
  12. Order of a matrix is m times n!
  13. Multiply this by pi(0) which is a 1x6 matrix [ 1/6 , 1/1…. End up with page rank vector of 1x6. Interpretation. If one value is 0.37 then 37% of the time is spent on that page.