SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Emrullah Delibas
ž The Problem of Ranking
•  Objectives, Challenges
ž Early Assumptions & Approaches
ž Link-Based Ranking Algorithms
•  InDegree Algorithm
•  Hubs and Authorities: HITS
•  PageRank
•  SALSA
•  Hilltop
ž Search Engine Spamming
ž Problems with Non-textual Context
ž “Cornell”
•  Did the searcher want information about the
university?
•  The university’s hockey team?
•  The Lab of Ornithology run by the university?
•  Cornell College in Iowa?
•  The Nobel-Prize-winning physicist Eric Cornell?
The same ranking of search results can’t be
right for everyone.
ž  Objectives:
•  To categorize webpages
•  To find pages related to given pages
•  To find duplicated websites
•  To calculate the ‘quality’ of a web link
•  To get the most ‘relevant’ web links based on a given query
•  To model human judgments indirectly
•  …
ž  Challenges:
•  Searching by itself is a hard problem for computers to solve in any
setting
•  scale and complexity on the Web
•  problems of synonymy and polysemy
•  dynamic and constantly-changing nature of Web content
•  …
ž Back in the 1990’s, web search was purely
based on the number of occurrences of a
word in a document.
ž The search was purely and only based on
relevancy of a document with the query.
Simply getting the relevant documents wasn’t
sufficient as the number of relevant
documents may range in a few millions.
ž  Links are assumed to be endorsements
•  Disagreement
•  Self-citation
•  Link to a popular document
ž  Hyperlinks contain information about the human judgment
of a site
ž  The more incoming links to a site, the more it is judged
ž  The Web is not a random network
-Bray,Tim. "Measuring the web." Computer networks and ISDN systems 28.7 (1996): 993-1005.
-Marchiori, Massimo. "The quest for correct information on the web: Hyper search engines." Computer
Networks and ISDN Systems 29.8 (1997): 1225-1235.
ž Hyperlinks are not at random, they
provide valuable information for:
•  Link-based ranking
•  Structure analysis
•  Detection of communities
•  Spam detection
•  …
ž This approach could be seen as the basis of
each and every link analysis ranking
algorithm.
ž The link recommendation assumption is that
by linking to another page, the author
recommends it.
•  So, a page with many incoming links has been highly
recommended.
ž The ranking is just base on the authority and
no weighting of authority values.
Hypertext Induced Topic Selection
ž The basic idea is that relevant pages
(“authorities”) are linked to by many other
pages (“hubs”).
ž The algorithm is now a part of the Ask
search engine.
Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM,
46(5):604–632, 1999. A preliminary version appears in the Proceedings of the 9th ACM-SIAM
Symposium on Discrete Algorithms, Jan. 1998.
ž It is developed by looking at the way how
humans analyze a search process rather
than the machines searching up a query
by looking at a bunch of documents and
return the matches.
ž For example;
•  “top automobile makers in the world”
ž Rules:
•  A good hub points to many good authorities.
•  A good authority is pointed to by many good
hubs.
•  Authorities and hubs have a mutual
reinforcement relationship.
ž Objective: Sq
•  (i) Sq is relatively small
•  (ii) Sq is rich in relevant pages
•  (iii) Sq contains most (or many) of the strongest
authorities
ž Solution
•  Generate a Root Set Qσ from text-based search
engine
•  Expand the root set
ž Let authority score of the page i be x(i),
and the hub score of page i be y(i).
ž mutual reinforcing relationship:
•  I step:
•  O step:
ž 1st iteration
ž 1st iteration
•  I step
ž 1st iteration
•  I step
•  O step
ž 2nd iteration
•  I step
ž 2nd iteration
•  I step
•  O step
ž 2nd iteration
•  I step
•  O step
•  …
•  ...
•  ...
1.  must be built “on the fly”
2.  suffers from topic drift
3.  cannot detect advertisements
4.  can easily be spammed
5.  query time evaluation is slow
Heart of Google
ž Proposed by by Sergey Brin and Lawrence
Page
ž Uses a recursive scheme similar to
Kleinberg’s HITS algorithm
ž But the PageRank algorithm produces a
ranking, independent of a user’s query.
Sergey Brin and Lawrence Page.The anatomy of a large-scale hypertextual Web search
engine. In Proc. 7th International World Wide Web Conference, pages 107–117, 1998.
ž A page is important if it is pointed to by
other important pages.
ž The PageRank of a page pi is given as
follows:
•  Suppose that the page pi has pages M(pi) linking
to it.
•  L(pj) is the number of outbound links on page pj.
ž The algorithm is robust against Spam
•  since its not easy for a webpage owner to add in-
links to his/her page from other important
pages.
ž PageRank is a global measure and is
query independent.
ž It favors the older pages
•  Since new ones will not have many links
ž PageRank can be easily increased by the
concept of “link-farms”
•  However, while indexing, the search actively
tries to find these flaws.
ž Rank Sinks: occurs when in a network
pages get in infinite link cycles
ž Spider Traps: occurs if there are no links
from within the group to outside the group.
ž Dangling Links: occurs when a page
contains a link such that the hypertext
points to a page with no outgoing links.
ž Dead Ends: pages with no outgoing links.
ž Damping Factor
•  random jumps (teleportation)
–  where N is the total number of pages
–  Typically d ≈ 0.85
PAGERANK HITS
ž  Computed for all web-
pages stored prior to
the query
ž  Computes authorities only
ž  Fast to compute
ž  No need for additional
normalization
ž  Performed on the subset
generated by each query.
ž  Computes authorities and
hubs
ž  Easy to compute, real-time
execution is hard.
ž  There is need for
normalization
Criteria HITS PageRank
Complexity Analysis O(kN2) O(n)
Result quality Less than PageRank
algorithm
Medium
Relevancy Less. Since this
algorithm ranks the
pages on the indexing
time
More since this
algorithm uses the
hyperlinks to give good
results and also
consider the content of
the page
Neighborhood applied to the local
neighborhood of pages
surrounding the results
of a query
applied to entire web
Grover, Nidhi, and Ritika Wason. "Comparative analysis of pagerank and hits
algorithms." International Journal of Engineering Research and Technology.Vol. 1.
No. 8 (October-2012). ESRSA Publications, 2012.
ž  Keyword-Stuffing: Overloading the website with
relevant keywords.
ž  Text-Hidding: Placing relevant content on the
website which can only be seen by search engines.
ž  Doorway-Page: A page which is very well optimized
for some keywords and with the only purpose to
redirect to a real website.
ž  Link-farms: Websites which are optimized for some
keywords and contains only a huge number of links
to other websites.
ž Flash: rarely processed by search engines
ž Java Applets: normally not processed.
ž Videos and Images: not directly
processable for search engines.
ž Other Rich-Media Formats: (e.g.
Silverlight) which are typically not
processed by search engines.

Contenu connexe

Tendances

Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectivenessemapesce
 
Social network analysis
Social network analysisSocial network analysis
Social network analysisCaleb Jones
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 
HITS + Pagerank
HITS + PagerankHITS + Pagerank
HITS + Pagerankajkt
 
similarity measure
similarity measure similarity measure
similarity measure ZHAO Sam
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
Unit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANKUnit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANKUsha Rani M
 
CS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVCS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVpkaviya
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisSujoy Bag
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysisAcad
 
Clustering & classification
Clustering & classificationClustering & classification
Clustering & classificationJamshed Khan
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information RetrievalDishant Ailawadi
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 

Tendances (20)

Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
TDA for feature selection
TDA for feature selectionTDA for feature selection
TDA for feature selection
 
Agent Based Models
Agent Based ModelsAgent Based Models
Agent Based Models
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 
HITS + Pagerank
HITS + PagerankHITS + Pagerank
HITS + Pagerank
 
1 Supervised learning
1 Supervised learning1 Supervised learning
1 Supervised learning
 
similarity measure
similarity measure similarity measure
similarity measure
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Unit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANKUnit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANK
 
CS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IVCS6010 Social Network Analysis Unit IV
CS6010 Social Network Analysis Unit IV
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
Clustering & classification
Clustering & classificationClustering & classification
Clustering & classification
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 

Similaire à Ranking Algorithms and Search Engine Optimization Techniques

Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学Xu jiakon
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure miningAtul Khanna
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 
Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismUmang MIshra
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for BeginnersValeria de Paiva
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningValeria de Paiva
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportIOSR Journals
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHDivyansh Verma
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.pptrayyverma
 

Similaire à Ranking Algorithms and Search Engine Optimization Techniques (20)

Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学
 
Mazhiming
MazhimingMazhiming
Mazhiming
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web miningWeb mining
Web mining
 
Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanism
 
Search engines
Search enginesSearch engines
Search engines
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for Beginners
 
Macran
MacranMacran
Macran
 
Searchland2
Searchland2Searchland2
Searchland2
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data Mining
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
 
Page Rank
Page RankPage Rank
Page Rank
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
DC presentation 1
DC presentation 1DC presentation 1
DC presentation 1
 
Web mining
Web miningWeb mining
Web mining
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.ppt
 
Page rank algortihm
Page rank algortihmPage rank algortihm
Page rank algortihm
 

Dernier

How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxJosielynTars
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...HafsaHussainp
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 

Dernier (20)

How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptx
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 

Ranking Algorithms and Search Engine Optimization Techniques

  • 2. ž The Problem of Ranking •  Objectives, Challenges ž Early Assumptions & Approaches ž Link-Based Ranking Algorithms •  InDegree Algorithm •  Hubs and Authorities: HITS •  PageRank •  SALSA •  Hilltop ž Search Engine Spamming ž Problems with Non-textual Context
  • 3. ž “Cornell” •  Did the searcher want information about the university? •  The university’s hockey team? •  The Lab of Ornithology run by the university? •  Cornell College in Iowa? •  The Nobel-Prize-winning physicist Eric Cornell? The same ranking of search results can’t be right for everyone.
  • 4. ž  Objectives: •  To categorize webpages •  To find pages related to given pages •  To find duplicated websites •  To calculate the ‘quality’ of a web link •  To get the most ‘relevant’ web links based on a given query •  To model human judgments indirectly •  … ž  Challenges: •  Searching by itself is a hard problem for computers to solve in any setting •  scale and complexity on the Web •  problems of synonymy and polysemy •  dynamic and constantly-changing nature of Web content •  …
  • 5. ž Back in the 1990’s, web search was purely based on the number of occurrences of a word in a document. ž The search was purely and only based on relevancy of a document with the query. Simply getting the relevant documents wasn’t sufficient as the number of relevant documents may range in a few millions.
  • 6. ž  Links are assumed to be endorsements •  Disagreement •  Self-citation •  Link to a popular document ž  Hyperlinks contain information about the human judgment of a site ž  The more incoming links to a site, the more it is judged ž  The Web is not a random network -Bray,Tim. "Measuring the web." Computer networks and ISDN systems 28.7 (1996): 993-1005. -Marchiori, Massimo. "The quest for correct information on the web: Hyper search engines." Computer Networks and ISDN Systems 29.8 (1997): 1225-1235.
  • 7. ž Hyperlinks are not at random, they provide valuable information for: •  Link-based ranking •  Structure analysis •  Detection of communities •  Spam detection •  …
  • 8.
  • 9. ž This approach could be seen as the basis of each and every link analysis ranking algorithm. ž The link recommendation assumption is that by linking to another page, the author recommends it. •  So, a page with many incoming links has been highly recommended. ž The ranking is just base on the authority and no weighting of authority values.
  • 10.
  • 12. ž The basic idea is that relevant pages (“authorities”) are linked to by many other pages (“hubs”). ž The algorithm is now a part of the Ask search engine. Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999. A preliminary version appears in the Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, Jan. 1998.
  • 13. ž It is developed by looking at the way how humans analyze a search process rather than the machines searching up a query by looking at a bunch of documents and return the matches. ž For example; •  “top automobile makers in the world”
  • 14. ž Rules: •  A good hub points to many good authorities. •  A good authority is pointed to by many good hubs. •  Authorities and hubs have a mutual reinforcement relationship.
  • 15.
  • 16. ž Objective: Sq •  (i) Sq is relatively small •  (ii) Sq is rich in relevant pages •  (iii) Sq contains most (or many) of the strongest authorities ž Solution •  Generate a Root Set Qσ from text-based search engine •  Expand the root set
  • 17.
  • 18. ž Let authority score of the page i be x(i), and the hub score of page i be y(i). ž mutual reinforcing relationship: •  I step: •  O step:
  • 21. ž 1st iteration •  I step •  O step
  • 23. ž 2nd iteration •  I step •  O step
  • 24. ž 2nd iteration •  I step •  O step •  … •  ... •  ...
  • 25. 1.  must be built “on the fly” 2.  suffers from topic drift 3.  cannot detect advertisements 4.  can easily be spammed 5.  query time evaluation is slow
  • 27. ž Proposed by by Sergey Brin and Lawrence Page ž Uses a recursive scheme similar to Kleinberg’s HITS algorithm ž But the PageRank algorithm produces a ranking, independent of a user’s query. Sergey Brin and Lawrence Page.The anatomy of a large-scale hypertextual Web search engine. In Proc. 7th International World Wide Web Conference, pages 107–117, 1998.
  • 28. ž A page is important if it is pointed to by other important pages.
  • 29. ž The PageRank of a page pi is given as follows: •  Suppose that the page pi has pages M(pi) linking to it. •  L(pj) is the number of outbound links on page pj.
  • 30.
  • 31.
  • 32. ž The algorithm is robust against Spam •  since its not easy for a webpage owner to add in- links to his/her page from other important pages. ž PageRank is a global measure and is query independent.
  • 33. ž It favors the older pages •  Since new ones will not have many links ž PageRank can be easily increased by the concept of “link-farms” •  However, while indexing, the search actively tries to find these flaws.
  • 34. ž Rank Sinks: occurs when in a network pages get in infinite link cycles ž Spider Traps: occurs if there are no links from within the group to outside the group. ž Dangling Links: occurs when a page contains a link such that the hypertext points to a page with no outgoing links. ž Dead Ends: pages with no outgoing links.
  • 35.
  • 36. ž Damping Factor •  random jumps (teleportation) –  where N is the total number of pages –  Typically d ≈ 0.85
  • 37. PAGERANK HITS ž  Computed for all web- pages stored prior to the query ž  Computes authorities only ž  Fast to compute ž  No need for additional normalization ž  Performed on the subset generated by each query. ž  Computes authorities and hubs ž  Easy to compute, real-time execution is hard. ž  There is need for normalization
  • 38. Criteria HITS PageRank Complexity Analysis O(kN2) O(n) Result quality Less than PageRank algorithm Medium Relevancy Less. Since this algorithm ranks the pages on the indexing time More since this algorithm uses the hyperlinks to give good results and also consider the content of the page Neighborhood applied to the local neighborhood of pages surrounding the results of a query applied to entire web Grover, Nidhi, and Ritika Wason. "Comparative analysis of pagerank and hits algorithms." International Journal of Engineering Research and Technology.Vol. 1. No. 8 (October-2012). ESRSA Publications, 2012.
  • 39. ž  Keyword-Stuffing: Overloading the website with relevant keywords. ž  Text-Hidding: Placing relevant content on the website which can only be seen by search engines. ž  Doorway-Page: A page which is very well optimized for some keywords and with the only purpose to redirect to a real website. ž  Link-farms: Websites which are optimized for some keywords and contains only a huge number of links to other websites.
  • 40. ž Flash: rarely processed by search engines ž Java Applets: normally not processed. ž Videos and Images: not directly processable for search engines. ž Other Rich-Media Formats: (e.g. Silverlight) which are typically not processed by search engines.