SlideShare une entreprise Scribd logo
1  sur  7
The heart of search engine - Inverted Index
It is the foundation of the search engine.
When you want to build search engine such as yahoo, google at the core of the search engine it lays an
Inverted Index.
1 Building an index by crawling the web.
2 Building an inverted index.
3 Lookup the Inverted Index for relevant webpages
Steps involved in building Inverted Index
www.npntraining.com/courses/big-data-and-hadoop.php
Step 01
Build an index by crawling the web
e.g. Selenium occurs on which all sites. In order to build an Inverted Index we have to crawl the webpages
from the web and store them along with their contents.
www.abc.com Training provided on Selenium Big Data Hadoop
www.xyz.com Trainings provided on Apache Spark Scala J2EE
www.def.com Training provided on Java J2EE Python Selenium
This is an index of webpages and their contents
www.npntraining.com/courses/big-data-and-hadoop.php
Step 02
Build an inverted index
www.abc.com Training provided on Selenium BigData Hadoop
www.xyz.com Trainings provided on Apache Spark Scala J2EE
www.def.com Training provided on Java J2EE Python Selenium
This is an index of webpages and their contents
Training www.abc.com, www.xyz.com, www.def.com
BigData www.abc.com
Spark www.xyz.com
J2EE www.xyz.com, www.def.com
Build an index of words to webpages they appear in
www.npntraining.com/courses/big-data-and-hadoop.php
Step 03
Given a search term, look up the inverted index for the relevant webpages
Training www.abc.com, www.xyz.com, www.def.com
Big Data www.abc.com
Spark www.xyz.com
J2EE www.xyz.com, www.def.com
Build an index of words to webpages they appear in
www.npntraining.com/courses/big-data-and-hadoop.php
Building an Inverted Index `
www.abc.com Selenium BigData Hadoop
www.xyz.com ApacheSpark Scala J2EE Selenium
www.def.com Java J2EE Python Selenium
Map
Selenium www.abc.com
Selenium ww.xyz.com
Selenium www.def.com
BigData www.abc.com
ApacheSpark www.def.com
Spark www.xyz.com
Scala www.xyz.com
J2EE www.xyz.com
J2EE www.def.com
Python www.def.com
www.npntraining.com/courses/big-data-and-hadoop.php
Building an Inverted Index `
Reduce
Selenium [www.abc.com www.xyz.com www.def.com]
BigData [ www.abc.com ]
ApacheSpark www.def.com
Spark www.xyz.com
Scala www.xyz.com
J2EE [ www.xyz.com www.def.com ]
Python www.def.com
Selenium www.abc.com|www.xyz.com|www.def.com
BigData www.abc.com
ApacheSpark www.def.com
Spark www.xyz.com
Scala www.xyz.com
J2EE www.xyz.com|www.def.com
Python www.def.com
The heart of search engine  Inverted index

Contenu connexe

En vedette

Keyword Searching: Advanced Techniques
Keyword Searching: Advanced TechniquesKeyword Searching: Advanced Techniques
Keyword Searching: Advanced TechniquesKris Jacobson
 
Search strategies – subject searching
Search strategies – subject searchingSearch strategies – subject searching
Search strategies – subject searchingdoverlibrary
 
5013 Indexing Presentation
5013 Indexing Presentation5013 Indexing Presentation
5013 Indexing Presentationlmartin8
 
Advanced keyword research
Advanced keyword researchAdvanced keyword research
Advanced keyword researchJono Alderson
 
Kwic
KwicKwic
KwicPU
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniquesabbas mohd
 
From KWIC to Enterprise Search - M G Lindquist
From KWIC to Enterprise Search - M G LindquistFrom KWIC to Enterprise Search - M G Lindquist
From KWIC to Enterprise Search - M G Lindquistmglindquist
 
Richard kwock jsm 2012 poster
Richard kwock jsm 2012 posterRichard kwock jsm 2012 poster
Richard kwock jsm 2012 posterAjay Ohri
 
Presentation search strategy
Presentation   search strategyPresentation   search strategy
Presentation search strategyjmunks
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional RepositoriesSarika Sawant
 
Introduction to indexing
Introduction to indexingIntroduction to indexing
Introduction to indexingDaryl Superio
 
Indexing or dividing_head
Indexing or dividing_headIndexing or dividing_head
Indexing or dividing_headJavaria Chiragh
 
The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 

En vedette (20)

Keyword Searching: Advanced Techniques
Keyword Searching: Advanced TechniquesKeyword Searching: Advanced Techniques
Keyword Searching: Advanced Techniques
 
Search strategies – subject searching
Search strategies – subject searchingSearch strategies – subject searching
Search strategies – subject searching
 
3rd Thesaurus
3rd Thesaurus3rd Thesaurus
3rd Thesaurus
 
5013 Indexing Presentation
5013 Indexing Presentation5013 Indexing Presentation
5013 Indexing Presentation
 
Advanced keyword research
Advanced keyword researchAdvanced keyword research
Advanced keyword research
 
Kwic
KwicKwic
Kwic
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniques
 
Slic System
Slic SystemSlic System
Slic System
 
Identifying Keywords and Searching Techniques
Identifying Keywords and Searching TechniquesIdentifying Keywords and Searching Techniques
Identifying Keywords and Searching Techniques
 
From KWIC to Enterprise Search - M G Lindquist
From KWIC to Enterprise Search - M G LindquistFrom KWIC to Enterprise Search - M G Lindquist
From KWIC to Enterprise Search - M G Lindquist
 
Richard kwock jsm 2012 poster
Richard kwock jsm 2012 posterRichard kwock jsm 2012 poster
Richard kwock jsm 2012 poster
 
Presentation search strategy
Presentation   search strategyPresentation   search strategy
Presentation search strategy
 
Types of sentences
Types of sentencesTypes of sentences
Types of sentences
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
Introduction to indexing
Introduction to indexingIntroduction to indexing
Introduction to indexing
 
Types of indexes
Types of indexesTypes of indexes
Types of indexes
 
Indexing or dividing_head
Indexing or dividing_headIndexing or dividing_head
Indexing or dividing_head
 
Indexing
IndexingIndexing
Indexing
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Dernier

Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 

Dernier (20)

Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 

The heart of search engine Inverted index

  • 1. The heart of search engine - Inverted Index It is the foundation of the search engine. When you want to build search engine such as yahoo, google at the core of the search engine it lays an Inverted Index. 1 Building an index by crawling the web. 2 Building an inverted index. 3 Lookup the Inverted Index for relevant webpages Steps involved in building Inverted Index www.npntraining.com/courses/big-data-and-hadoop.php
  • 2. Step 01 Build an index by crawling the web e.g. Selenium occurs on which all sites. In order to build an Inverted Index we have to crawl the webpages from the web and store them along with their contents. www.abc.com Training provided on Selenium Big Data Hadoop www.xyz.com Trainings provided on Apache Spark Scala J2EE www.def.com Training provided on Java J2EE Python Selenium This is an index of webpages and their contents www.npntraining.com/courses/big-data-and-hadoop.php
  • 3. Step 02 Build an inverted index www.abc.com Training provided on Selenium BigData Hadoop www.xyz.com Trainings provided on Apache Spark Scala J2EE www.def.com Training provided on Java J2EE Python Selenium This is an index of webpages and their contents Training www.abc.com, www.xyz.com, www.def.com BigData www.abc.com Spark www.xyz.com J2EE www.xyz.com, www.def.com Build an index of words to webpages they appear in www.npntraining.com/courses/big-data-and-hadoop.php
  • 4. Step 03 Given a search term, look up the inverted index for the relevant webpages Training www.abc.com, www.xyz.com, www.def.com Big Data www.abc.com Spark www.xyz.com J2EE www.xyz.com, www.def.com Build an index of words to webpages they appear in www.npntraining.com/courses/big-data-and-hadoop.php
  • 5. Building an Inverted Index ` www.abc.com Selenium BigData Hadoop www.xyz.com ApacheSpark Scala J2EE Selenium www.def.com Java J2EE Python Selenium Map Selenium www.abc.com Selenium ww.xyz.com Selenium www.def.com BigData www.abc.com ApacheSpark www.def.com Spark www.xyz.com Scala www.xyz.com J2EE www.xyz.com J2EE www.def.com Python www.def.com www.npntraining.com/courses/big-data-and-hadoop.php
  • 6. Building an Inverted Index ` Reduce Selenium [www.abc.com www.xyz.com www.def.com] BigData [ www.abc.com ] ApacheSpark www.def.com Spark www.xyz.com Scala www.xyz.com J2EE [ www.xyz.com www.def.com ] Python www.def.com Selenium www.abc.com|www.xyz.com|www.def.com BigData www.abc.com ApacheSpark www.def.com Spark www.xyz.com Scala www.xyz.com J2EE www.xyz.com|www.def.com Python www.def.com

Notes de l'éditeur

  1. Rather than creating an object inside a function, you pass it to the function.
  2. Rather than creating an object inside a function, you pass it to the function.