SlideShare une entreprise Scribd logo
1  sur  40
Web Search Engines Chapter 27, Part C Based on Larson and Hearst’s slides at UC-Berkeley http://www.sims.berkeley.edu/courses/is202/f00/
Search Engine Characteristics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Search Queries ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Directories vs. Search Engines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What about Ranking? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relevance: Going Beyond IR ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Search Architecture
Standard Web Search Engine Architecture crawl the web create an  inverted index Check for duplicates, store the  documents Inverted  index Search  engine  servers user query Show results  To user DocIds
Inverted Indexes the IR Way
How Inverted Files  Are Created ,[object Object],[object Object],Now is the time for all good men to come to the aid of their country Doc 1 It was a dark and stormy night in  the country  manor. The time  was past midnight Doc 2
How Inverted  Files are Created ,[object Object]
How Inverted Files are Created ,[object Object],[object Object]
How Inverted Files are Created ,[object Object],[object Object],[object Object],[object Object]
How Inverted Files are Created ,[object Object]
Inverted indexes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inverted Indexes for Web Search Engines ,[object Object],[object Object],[object Object],[object Object]
From description of the FAST search engine, by Knut Risvik http://www.infonortics.com/searchengines/sh00/risvik_files/frame.htm In this example, the data for the pages is partitioned across machines.  Additionally, each partition is allocated multiple machines to handle the queries. Each row can handle 120 queries per second Each column can handle 7M pages To handle more queries, add another row.
Cascading Allocation of CPUs ,[object Object],[object Object],[object Object],[object Object],[object Object]
Web Crawling
Web Crawlers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Crawling Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Crawling Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Google: A Case Study
Google’s Indexing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Google’s Inverted Index Lexicon (in-memory) Postings (“Inverted barrels”, on disk) Each “barrel” contains postings for a range of wordids. Sorted by wordid Barrel i Barrel i+1 Sorted by Docid wordid #docs wordid #docs wordid #docs Docid #hits Hit, hit, hit, hit, hit Docid #hits Hit Docid #hits Hit Docid #hits Hit, hit, hit Docid #hits Hit, hit
Google ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Link Analysis for Ranking Pages ,[object Object],[object Object],[object Object],[object Object],[object Object]
Link Analysis for Ranking Pages ,[object Object],[object Object],[object Object],[object Object]
PageRank ,[object Object],[object Object],[object Object],PR(A)  = (1-d) + d (  PR(A1)/C(A1)  + … +  PR(An)/C(An)  )
PageRank: User Model ,[object Object],[object Object],[object Object],[object Object],[object Object]
Web Search Statistics
Searches per Day
Web Search Engine Visits
Percentage of web users who visit the site shown
Search Engine Size (July 2000)
Does size matter?  You can’t access many hits anyhow.
Increasing numbers of indexed pages, self-reported
Web Coverage
From description of the FAST search engine, by Knut Risvik http://www.infonortics.com/searchengines/sh00/risvik_files/frame.htm
Directory sizes

Contenu connexe

Tendances

google search engine
google search enginegoogle search engine
google search engine
way2go
 

Tendances (20)

Search Engines and its working
Search Engines and its workingSearch Engines and its working
Search Engines and its working
 
Smart Searching
Smart SearchingSmart Searching
Smart Searching
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information Retrieval
 
google search engine
google search enginegoogle search engine
google search engine
 
Search engines
Search enginesSearch engines
Search engines
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
How search engine work ppt
How search engine work pptHow search engine work ppt
How search engine work ppt
 
Search engine ppt
Search engine pptSearch engine ppt
Search engine ppt
 
search engines
search enginessearch engines
search engines
 
Search Engine Powerpoint
Search Engine  Powerpoint Search Engine  Powerpoint
Search Engine Powerpoint
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engine
 
Searching the Web
Searching the WebSearching the Web
Searching the Web
 
Search engines
Search enginesSearch engines
Search engines
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpoint
 
Search engine
Search engineSearch engine
Search engine
 

Similaire à Web Search Engine

Google Paper
Google Paper Google Paper
Google Paper
girish1m
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGle
binit singh
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
ankur881120
 
The Internet
The InternetThe Internet
The Internet
mscuttle
 
This presentation is based on alan november’s book
This presentation is based on alan november’s bookThis presentation is based on alan november’s book
This presentation is based on alan november’s book
campbelltricia
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data Mining
Valeria de Paiva
 

Similaire à Web Search Engine (20)

Google Paper
Google Paper Google Paper
Google Paper
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 
Anatomy of google
Anatomy of googleAnatomy of google
Anatomy of google
 
Search engines
Search enginesSearch engines
Search engines
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGle
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for Beginners
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
The Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search EngineThe Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine
 
Introduction to internet.
Introduction to internet.Introduction to internet.
Introduction to internet.
 
Internet Search
Internet SearchInternet Search
Internet Search
 
Web2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldWeb2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google world
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 
Research power point for students
Research power point for studentsResearch power point for students
Research power point for students
 
The Internet
The InternetThe Internet
The Internet
 
This presentation is based on alan november’s book
This presentation is based on alan november’s bookThis presentation is based on alan november’s book
This presentation is based on alan november’s book
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data Mining
 
Searchland2
Searchland2Searchland2
Searchland2
 

Plus de Chidanand Byahatti (6)

Seo Search Engine Marketing
Seo Search Engine MarketingSeo Search Engine Marketing
Seo Search Engine Marketing
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Search Engine Google
Search Engine GoogleSearch Engine Google
Search Engine Google
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Intro Html
Intro HtmlIntro Html
Intro Html
 
Html
HtmlHtml
Html
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Web Search Engine

  • 1. Web Search Engines Chapter 27, Part C Based on Larson and Hearst’s slides at UC-Berkeley http://www.sims.berkeley.edu/courses/is202/f00/
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 8. Standard Web Search Engine Architecture crawl the web create an inverted index Check for duplicates, store the documents Inverted index Search engine servers user query Show results To user DocIds
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. From description of the FAST search engine, by Knut Risvik http://www.infonortics.com/searchengines/sh00/risvik_files/frame.htm In this example, the data for the pages is partitioned across machines. Additionally, each partition is allocated multiple machines to handle the queries. Each row can handle 120 queries per second Each column can handle 7M pages To handle more queries, add another row.
  • 18.
  • 20.
  • 21.
  • 22.
  • 23. Google: A Case Study
  • 24.
  • 25. Google’s Inverted Index Lexicon (in-memory) Postings (“Inverted barrels”, on disk) Each “barrel” contains postings for a range of wordids. Sorted by wordid Barrel i Barrel i+1 Sorted by Docid wordid #docs wordid #docs wordid #docs Docid #hits Hit, hit, hit, hit, hit Docid #hits Hit Docid #hits Hit Docid #hits Hit, hit, hit Docid #hits Hit, hit
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 34. Percentage of web users who visit the site shown
  • 35. Search Engine Size (July 2000)
  • 36. Does size matter? You can’t access many hits anyhow.
  • 37. Increasing numbers of indexed pages, self-reported
  • 39. From description of the FAST search engine, by Knut Risvik http://www.infonortics.com/searchengines/sh00/risvik_files/frame.htm

Notes de l'éditeur

  1. These slides cover material from Chapter 27, but go well beyond it in depth of coverage.