2. CONTENT
What is PageRank?
Objective
PageRank for whole Dataset
Local Approximation of PageRank
Experiments and Results
Challenges and Issues
Conclusion and Future Scope
References
3. WHAT IS PAGERANK?
Named after Larry Page, cofounder of Google
PageRank is an algorithm used by Google
Search to rank websites in their search engine
results.
Way of measuring the importance of website
pages
Works by counting the number and quality of
links to a page to determine a rough estimate of
how important the website is.
4. OBJECTIVE
In General, for PageRank calculation, a global
computation is needed
But there are situations in which PageRank scores
are required for just a small subset of the nodes.
Suppose a web site owner want to promote his
website in search engine rankings in order to attract
traffic of potential clients.
So he is interested only in the PageRank score of his
own website but not in the PageRank scores of all
other web pages.
5. OBJECTIVE
Global PageRank computation for the entire web
graph is out of the question for most users, as it
requires significant resources and knowhow.
That is why Local Approximation of Page-Rank
is required.
6. PAGERANK FOR WHOLE DATASET
We traversed through the dataset and applied the
algorithm proposed by Page and Brin on the set directly.
1. In that approach, Page Rank for each page is calculated
based on the back links which are pointing to that page.
2. A given Page-Rank value of a page is equally divided
among the forward-links of that page. The page to which
it has pointed will use that value to calculate its own page
rank.
3. Additional factor has also to be considered which will
make sure that the page-rank algorithm converges
(especially in cases where loops are present).
7. PAGERANK FOR WHOLE DATASET
Algorithm
(Proposed by Larry Page and Sergie Brin )
Where,
-PR(X) is the PageRank of page X, initial value of 1
-PR(Ti) is the PageRank of pages Ti which link to page A,
-C(Ti) is the number of backward links on page Ti and
-d is a damping factor which can be set between 0 & 1.
Iterate over pages
Calculate for each page
PR(X) = (1-d) + d ( PR(T1) / C(T1) + ... +
PR(Tn) / C(Tn))
Till PR(X-1)=PR(X) for all pages
8. LOCAL PAGERANK APPROXIMATION
Given a node (page), we have to calculate the approximate
page rank:
The Algorithm crawls the sub-graph of radius r around the
given node (page) “backwards” in BFS order. For each node
(page) v at layer t, the algorithm calculates the influence of
v on given node at radius t.
It sums up the influence values, weighted by some factor.
For that the algorithm uses the recursive property of
influence: the influence of v on given node at radius t
equals the average influence of the out-neighbours of v on
given node at radius t−1.
9. LOCAL PAGERANK APPROXIMATION
Now we can have two approaches to consider the
value 'r‘
1. Run the algorithm with r, which is guaranteed
to be an upper bound
2. Run the algorithm without knowing r a priori,
and stop the algorithm whenever we notice that
the value of Page-Rank does not change by
much.
12. CHALLENGES AND ISSUES
Loading small indexes into memory created problem. But
we resolved it by increasing the heap size allocated for the
Virtual Machine
Deciding the threshold value during the implementation of
pruning.
There is no unique value for threshold as it varies widely
for different PageRank values.
Choose wisely !!
13. CONCLUSIONS AND FUTURE SCOPE
Normal Procedure to calculate PageRank consider whole
DataSet for its computation which is time and resource
consuming and also not feasible in most of the situations.
So Local approximation of PageRank can be predicted by
just calculating PageRank over nodes in a smaller graph
without calculating PageRank for all the nodes in the
dataset
The results obtained are very near to the original
PageRank results with the average error rate of 15 -20 %.
14. CONCLUSIONS AND FUTURE SCOPE
The implementation of algorithm and the correctness of the
value depend upon the radius defined for the smaller
graph.
Smaller the radius, higher the error rate and vice versa.
But on increasing the radius, the complexity increases
exponentially as the number of in links we have to deal
with becomes very large.
Generally a value of r=3-4 is taken.
Pruning techniques can be used to increase the value of r in
which the procedure removes all nodes whose influence is
below some threshold value T from layer r.
15. REFERENCES
Ziv Bar-Yossef, Li-Tal Mashiach, and Google Haifa
Engineering Center, Haifa, Israel, Local Approximation
of PageRank and Reverse PageRank, October 26–30,
2008, ACM 978-1-59593-991-3/08/10
Lawrence Page, Sergey Brin, Rajeev Motwani, Terry
Winograd, The PageRank Citation Ranking: Bringing
Order to the Web, January 29, 1998, Stanford InfoLab
Yen-Yu Chen, Qingquin Gan, Torsten Suel, Local
Methods for Estimating PageRank Values, November
8-13, 2004, CIKM’04