Comparative study of different ranking algorithms adopted by search engine

COMPARATIVE STUDY OF
DIFFERENT RANKING ALGORITHMS
ADOPTED BY SEARCH ENGINE

Under the guidance of ,
Dr. Manoj Wadhwa

Presented by,
Shikha Taneja
12-MCS-110

MOTIVATION


When searching for information on the WWW, user perform a query
to a search engine. The engine return, as the query’s result, a list of
Web sites which usually is a huge set. So the ranking of these web
sites is very important. Because much information is contained in the
link-structure of the WWW, information such as which pages are
linked to others can be used to augment search algorithms.



It is so important for any web search engine to rank the pages with
the aim of providing more useful data, by listing the pages containing
the data at higher places, to the searcher about the searched keyword
or subject.



So to be able to provide desired ordering for the web pages: A page
ranking algorithm is the technique to rank websites in their search
engine results.



Together with the development of the Internet and the popularity of
World Wide Web, Web page ranking systems have drawn significant
attention.



Many Web Search Engines have been introduced until now, but still
have difficulty in providing completely relevant answers to the
general subject of queries.



The main reason is not the lack of data but rather an excess of data.

WHAT IS SEARCH ENGINE??
Web Search Engine is a tool enabling document search,
with respect to specified keywords, in the Web and returns
a list of documents where the keywords were found.

INTRODUCTION


Early search engines mainly compare content similarity of the query
and the indexed pages.



From 1996, it became clear that content similarity alone was no
longer sufficient.
 The number of pages grew rapidly in the mid-late 1990’s.
 Content similarity is easily spammed.
 A page owner can repeat some words and add many related
words to boost the rankings of his pages and/or to make the
pages relevant to a large number of queries.



Starting around 1996, researchers began to work on the problem.
They resort to hyperlinks.



Web pages on the other hand are connected through hyperlinks,
which carry important information.
 Some hyperlinks: organize information at the same site.
 Other hyperlinks: point to pages from other Web sites.



Those pages that are pointed to by many other pages are likely to
contain authoritative information.



During 1997-1998, two most influential hyperlink based search
algorithms PageRank and HITS were reported.

PAGE RANK








PageRank is an algorithm used by the Google web search
engine to rank websites in their search engine results.
PageRank works by counting the number and quality of
links to a page to determine a rough estimate of how
important the website is. The underlying assumption is
that more important websites are likely to receive more
links from other websites.
It is an excellent way to prioritize the result of web
keyword searches.
Example of the PageRank indicator as found on the Google
toolbar:

HITS ALGORITHM










The HITS algorithm stands for “Hypertext Induced Topic Selection”
and is used for rating and ranking websites based on the link
information when identifying topic areas.
Unlike PageRank which is a static ranking algorithm, HITS is search
query dependent.
It is a very popular and effective algorithm to rank documents based
on the link information among a set of documents.
An authority value is computed as the sum of the scaled hub values
that point to that page.
A hub value is the sum of the scaled authority values of the pages it
points to.

When the user issues a search query,
 HITS first expands the list of relevant pages returned by a
search engine and then produces two rankings of the
expanded set of pages, authority ranking and hub ranking.
Authority: Roughly, a authority is a page with many in-links.
 The idea is that the page may have good or authoritative
content on some topic and
 thus many people trust it and link to it.
Hub: A hub is a page with many out-links.
 The page serves as an organizer of the information on a
particular topic and
 points to many good authority pages on the topic.


SALSA





SALSA- The Stochastic Approach for Link- Structure
Analysis (Lempel, Moran 2001)
 Probabilistic extension of the HITS algorithm
 Combines ideas from both HITS and PAGERANK
 Random walk is carried out by following hyperlinks
both in the forward and in the backward direction
SALSA uses authority and hub score
SALSA creates a neighborhood graph using authority and
hub pages and links

WEIGHTED PAGERANK
ALGORITHM








Weighted Page Rank algorithm is an extension of the PageRank algorithm.
This algorithm allocates a higher rank values to the more
significant pages rather than dividing the rank value of a
page evenly among its outgoing linked web pages.
Each outgoing link gets a value proportional to its
significance.
WPR takes into account the importance of both the inlinks
and outlinks of the pages and distributes rank scores based
on the popularity of the pages.

DISTANCE RANK ALGORITHM,







The distance between pages is considered as a factor.
The algorithm calculates the minimum average distance
between two web pages and more pages.
This adopts the Page-Rank properties i.e. the rank of each
page is computed as the weighted sum of ranks of all
incoming pages to that particular page.
Then, a page has a high page rank value if it has more
incoming links on a page.

TOPIC SENSITIVE PAGE-RANK
ALGORITHM
This algorithm computes the scores of web page according
to the importance of content available on web page.


Pages receiving only a few incoming links, but from very
related web sites, will be given much more consideration for
that topic. The result will be a higher Topic-Sensitive Page
Rank for that site, for that specific search query, despite a
lower Page Rank under the current system


COMPARISON BETWEEN
DIFFERENT SEARCH ENGINES
CRITERI PAGERA HITS
A
NK

SALSA

Weighted Distance
PageRank
Rank

TopicSensitive
PageRank

Came into 1998
existence

1999

2001

2006

1998

2000

Objective

to rank
document
s based on
the link
informatio
n among a
set of
document
s.

Perform a
random
walk
alternatin
g between
hubs and
authoritie
s

Weight of
web page
is
calculated
on the
basis of
inbound
and
outbound
links and
on the
basis of
weight of

The
algorithm
calculates
the
minimum
average
distance
between
two web
pages and
more
pages.

This
algorithm
computes
the scores
of web
page
according
to the
importanc
e of
content
available
on web

an
excellent
way to
prioritize
the result
of web
keyword
searches

CRITERIA PAGERAN HITS

K

SALSA

Weighted
PageRank

Distance
Rank

TopicSensitive
PageRank

Input
parameters

Back links

Content,
Back and
Forward
links

Content,
Back links
and forward
links

Back links
and forward
links

Inbound links

Content,
Back link,
Forward
Link

Importance

High. Back
links are
considered.

Moderate.
Hub &
authorities
scores are
utilized.

High. it
weighs the
entries
according to
their in and
out-degrees.

High. The
pages are
sorted
according to
the
importance.

High. It is
based on
distance
between the
pages.

High. It
computes
important
score per
topic.

Limitations

Query
independent,
Dangling
page

Topic drift
and
efficiency
problem

Query
dependent,
handle spam
but not as
good as
PageRank

Query
independent,
Dangling
page

Needs to work
along with
Page-Rank

Only
available to
text, images
are not
taken into
account.

Search
Engine

Google

Clever

Google

Research
model

Research
Model

Google

Quality Of
Results

Medium

Less than
Page Rank

Less than
Page Rank

Higher than
Page Rank

Less than
Page-Rank

High

PROPOSED WORK


The proposed work in the Page Rank algorithm includes
the implementation to solve the problem of Dangling Page.
Dangling pages are pages which do not have any outbound
link or the page which does not provide any reference to
other pages. These Dangling pages create many issues to
calculate efficient page rank of different pages of a
websites .

REFERENCES
o

Mridula Batra, Sachin Sharma, “Comparative Study Of Page rank algorithm with different
ranking algorithms adopted by search engine for website ranking” , Int.J.Computer
Technology & Applications,Vol 4 (1), 8-18, Jan-Feb 2013

o

Ankur gupta, Rajni Jindal, “An overwiew of ranking algorithm for search engines”,
INDIAcom-2008CFND, Feb 08-09,2008



Alessio Signorini, “A Survey of Ranking Algorithms”, Department of Computer Science
University of Iowa, September 11, 2005



Mitali Desai, Sanjaysinh Parmar, Nitesh Shah, Jitendra Upadhyay, “A Study of different
Page Rank Algorithms: Issues”, International Journal of Computer Science Research &
Technology, ISSN: 2321-8827 IJCSRTIJCSRT www.ijcsrt.org IJCSRTV1IS040089 Vol. 1
Issue 4, September - 2013

o

Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search
Engine”



Marc Najork, “Comparing the Effectiveness of HITS and SALSA”, Microsoft Research,
1065 La Avenida, Mountain View, CA 94043, USA, najork@microsoft.com.

o

Dilip Kumar Sharma, A.K.Sharma, ”A Comparative Analysis Of Web Page Ranking
Algorithms” in proceedings of the International Journal Computer Science and
Engineering,Vol. 02,No. 08,2010,2670-2676.



R. lempel and S. moran, “SALSA: The Stochastic Approach for Link-Structure Analysis”



Allan borodin, Gareth o. roberts, Ieffrey s. rosenthal and Panayiotis tsaparas, “Link
Analysis Ranking: Algorithms, Theory, and Experiments”

Comparative study of different ranking algorithms adopted by search engine

Comparative study of different ranking algorithms adopted by search engine

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Comparative study of different ranking algorithms adopted by search engine

Similaire à Comparative study of different ranking algorithms adopted by search engine (20)

Dernier

Dernier (20)

Comparative study of different ranking algorithms adopted by search engine