SlideShare une entreprise Scribd logo
1  sur  17
Information Retrieval 
CSE 840
Presenter 
NADIA NAHAR 
BSSE 0327 
2
Topic 
DYNAMIC INDEXING 
3
Why Dynamic Indexing?? 
• Collections are not static 
• Documents come in over time and need to 
be inserted 
• Documents are often deleted and modified 
• So the dictionary and postings lists need to 
be modified: 
– Postings updates for terms already in 
dictionary 
– New terms added to dictionary 
4
Simplest approach 
Maintain “big” main index 
New docs go into “small” auxiliary index 
Search across both, merge results 
Invalidation bit-vector for deleted docs 
Filter docs output on a search result by this 
invalidation bit-vector 
Documents are updated by deleting and 
reinserting them 
5
Merge 
Results 
Search Result 
Search Result 
6
Simplest approach 
Maintain “big” main index 
New docs go into “small” auxiliary index 
Search across both, merge results 
Invalidation bit-vector for deleted docs 
Filter docs output on a search result by this 
invalidation bit-vector 
Documents are updated by deleting and 
reinserting them 
7
Simplest approach 
Maintain “big” main index 
New docs go into “small” auxiliary index 
Search across both, merge results 
Invalidation bit-vector for deleted docs 
Filter docs output on a search result by this 
invalidation bit-vector 
Documents are updated by deleting and 
reinserting them 
8
Issues with main and auxiliary indexes 
• Problem of frequent merges – you touch stuff a lot 
• Poor performance during merge 
• Actually: 
– Merging of the auxiliary index into the main index is efficient if we 
keep a separate file for each postings list. 
– Merge is the same as a simple append. 
– But then we would need a lot of files – inefficient for OS. 
9
Logarithmic merge 
• Maintain a series of indexes, each twice as 
large as the previous one 
– At any time, some of these powers of 2 are 
instantiated 
• Keep smallest (Z0) in memory 
• Larger ones (I0, I1, …) on disk 
• If Z0 gets too big (> n), write to disk as I0 
• or merge with I0 (if I0 already exists) as Z1 
• Either write merge Z1 to disk as I1 (if no I1) 
• Or merge with I1 to form Z2 
10
11
12
Logarithmic merge 
• Auxiliary and main index: index construction 
time is O(T2) as each posting is touched in 
each merge. 
• Logarithmic merge: Each posting is merged 
O(log T) times, so complexity is O(T log T) 
• So logarithmic merge is much more efficient 
for index construction 
• But query processing now requires the 
merging of O(log T) indexes 
– Whereas it is O(1) if you just have a main and 
auxiliary index 
13
Further issues with multiple indexes 
• Collection-wide statistics are hard to 
maintain 
• E.g., spell-correction: which of several 
corrected alternatives do we present to the 
user? 
– pick the one with the most hits 
• How do we maintain the top ones with 
multiple indexes and invalidation bit vectors? 
– One possibility: ignore everything but the main 
index for such ordering 
14
Dynamic indexing at search engines 
• All the large search engines now do dynamic 
indexing 
• Their indices have frequent incremental 
changes 
– News items, blogs, new topical web pages 
• But (sometimes/typically) they also 
periodically reconstruct the index from 
scratch 
– Query processing is then switched to the new 
index, and the old index is deleted 
15
16
17

Contenu connexe

Tendances

Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data StreamsSujaAldrin
 
Data cube computation
Data cube computationData cube computation
Data cube computationRashmi Sheikh
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented CommunicationDilum Bandara
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Simplilearn
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronMostafa G. M. Mostafa
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 

Tendances (20)

Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's Perceptron
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Text clustering
Text clusteringText clustering
Text clustering
 

En vedette

Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...Universitat Politècnica de Catalunya
 
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)Karel Minarik
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Alex Pinto
 
The Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep LearningThe Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep Learningindico data
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMSkoolkampus
 

En vedette (6)

Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentimen...
 
Html5
Html5Html5
Html5
 
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
The Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep LearningThe Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep Learning
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 

Similaire à Information retrieval dynamic indexing

Inb343 week2 sql server intro
Inb343 week2 sql server introInb343 week2 sql server intro
Inb343 week2 sql server introFredlive503
 
Hekaton introduction for .Net developers
Hekaton introduction for .Net developersHekaton introduction for .Net developers
Hekaton introduction for .Net developersShy Engelberg
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesCidar Mendizabal
 
Data Bases - Introduction to data science
Data Bases - Introduction to data scienceData Bases - Introduction to data science
Data Bases - Introduction to data scienceFrank Kienle
 
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Databricks
 
Not Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaNot Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaSpark Summit
 
Andrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublinAndrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublinlucenerevolution
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02Francisco Gonçalves
 
Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Atner Yegorov
 
Data Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big DatabasesData Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big Databasesomnidba
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureNir Rubinstein
 
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...BI Brainz
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Pivot table essential learning 1
Pivot table   essential learning 1Pivot table   essential learning 1
Pivot table essential learning 1Vijay Perepa
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsTerry Bunio
 
Government and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceGovernment and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceSolarWinds
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Lucidworks
 

Similaire à Information retrieval dynamic indexing (20)

Inb343 week2 sql server intro
Inb343 week2 sql server introInb343 week2 sql server intro
Inb343 week2 sql server intro
 
Hekaton introduction for .Net developers
Hekaton introduction for .Net developersHekaton introduction for .Net developers
Hekaton introduction for .Net developers
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
A12 vercelletto indexing_techniques
A12 vercelletto indexing_techniquesA12 vercelletto indexing_techniques
A12 vercelletto indexing_techniques
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiences
 
Data Bases - Introduction to data science
Data Bases - Introduction to data scienceData Bases - Introduction to data science
Data Bases - Introduction to data science
 
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
 
Not Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaNot Your Father's Database by Vida Ha
Not Your Father's Database by Vida Ha
 
Andrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublinAndrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublin
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12
 
Data Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big DatabasesData Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big Databases
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
 
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
An In-Depth Look at Pinpointing and Addressing Sources of Performance Problem...
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Pivot table essential learning 1
Pivot table   essential learning 1Pivot table   essential learning 1
Pivot table essential learning 1
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
Government and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for PerformanceGovernment and Education Webinar: SQL Server—Indexing for Performance
Government and Education Webinar: SQL Server—Indexing for Performance
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 

Plus de Nadia Nahar

Internship Final Report
Internship Final Report Internship Final Report
Internship Final Report Nadia Nahar
 
Deadlock detection
Deadlock detectionDeadlock detection
Deadlock detectionNadia Nahar
 
Remote Procedure Call
Remote Procedure CallRemote Procedure Call
Remote Procedure CallNadia Nahar
 
Final project report of a game
Final project report of a gameFinal project report of a game
Final project report of a gameNadia Nahar
 
Job Training Methods and Process
Job Training Methods and ProcessJob Training Methods and Process
Job Training Methods and ProcessNadia Nahar
 
Software Design Document
Software Design DocumentSoftware Design Document
Software Design DocumentNadia Nahar
 
Final document of software project
Final document of software projectFinal document of software project
Final document of software projectNadia Nahar
 
Component based software engineering
Component based software engineeringComponent based software engineering
Component based software engineeringNadia Nahar
 
Component level design
Component level designComponent level design
Component level designNadia Nahar
 
Architectural design presentation
Architectural design presentationArchitectural design presentation
Architectural design presentationNadia Nahar
 
Privacy act, bangladesh
Privacy act, bangladeshPrivacy act, bangladesh
Privacy act, bangladeshNadia Nahar
 
Long formal report
Long formal reportLong formal report
Long formal reportNadia Nahar
 
Adjusting the accounts
Adjusting the accountsAdjusting the accounts
Adjusting the accountsNadia Nahar
 
Southwest airlines takes off with better supply chain management
Southwest airlines takes off with better supply chain managementSouthwest airlines takes off with better supply chain management
Southwest airlines takes off with better supply chain managementNadia Nahar
 

Plus de Nadia Nahar (18)

Internship Final Report
Internship Final Report Internship Final Report
Internship Final Report
 
Test plan
Test planTest plan
Test plan
 
Deadlock detection
Deadlock detectionDeadlock detection
Deadlock detection
 
Remote Procedure Call
Remote Procedure CallRemote Procedure Call
Remote Procedure Call
 
Paper review
Paper reviewPaper review
Paper review
 
Final project report of a game
Final project report of a gameFinal project report of a game
Final project report of a game
 
Job Training Methods and Process
Job Training Methods and ProcessJob Training Methods and Process
Job Training Methods and Process
 
Software Design Document
Software Design DocumentSoftware Design Document
Software Design Document
 
Final document of software project
Final document of software projectFinal document of software project
Final document of software project
 
Component based software engineering
Component based software engineeringComponent based software engineering
Component based software engineering
 
Component level design
Component level designComponent level design
Component level design
 
Architectural design presentation
Architectural design presentationArchitectural design presentation
Architectural design presentation
 
Privacy act, bangladesh
Privacy act, bangladeshPrivacy act, bangladesh
Privacy act, bangladesh
 
Paper review
Paper reviewPaper review
Paper review
 
Long formal report
Long formal reportLong formal report
Long formal report
 
Psycology
PsycologyPsycology
Psycology
 
Adjusting the accounts
Adjusting the accountsAdjusting the accounts
Adjusting the accounts
 
Southwest airlines takes off with better supply chain management
Southwest airlines takes off with better supply chain managementSouthwest airlines takes off with better supply chain management
Southwest airlines takes off with better supply chain management
 

Dernier

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 

Dernier (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 

Information retrieval dynamic indexing

  • 2. Presenter NADIA NAHAR BSSE 0327 2
  • 4. Why Dynamic Indexing?? • Collections are not static • Documents come in over time and need to be inserted • Documents are often deleted and modified • So the dictionary and postings lists need to be modified: – Postings updates for terms already in dictionary – New terms added to dictionary 4
  • 5. Simplest approach Maintain “big” main index New docs go into “small” auxiliary index Search across both, merge results Invalidation bit-vector for deleted docs Filter docs output on a search result by this invalidation bit-vector Documents are updated by deleting and reinserting them 5
  • 6. Merge Results Search Result Search Result 6
  • 7. Simplest approach Maintain “big” main index New docs go into “small” auxiliary index Search across both, merge results Invalidation bit-vector for deleted docs Filter docs output on a search result by this invalidation bit-vector Documents are updated by deleting and reinserting them 7
  • 8. Simplest approach Maintain “big” main index New docs go into “small” auxiliary index Search across both, merge results Invalidation bit-vector for deleted docs Filter docs output on a search result by this invalidation bit-vector Documents are updated by deleting and reinserting them 8
  • 9. Issues with main and auxiliary indexes • Problem of frequent merges – you touch stuff a lot • Poor performance during merge • Actually: – Merging of the auxiliary index into the main index is efficient if we keep a separate file for each postings list. – Merge is the same as a simple append. – But then we would need a lot of files – inefficient for OS. 9
  • 10. Logarithmic merge • Maintain a series of indexes, each twice as large as the previous one – At any time, some of these powers of 2 are instantiated • Keep smallest (Z0) in memory • Larger ones (I0, I1, …) on disk • If Z0 gets too big (> n), write to disk as I0 • or merge with I0 (if I0 already exists) as Z1 • Either write merge Z1 to disk as I1 (if no I1) • Or merge with I1 to form Z2 10
  • 11. 11
  • 12. 12
  • 13. Logarithmic merge • Auxiliary and main index: index construction time is O(T2) as each posting is touched in each merge. • Logarithmic merge: Each posting is merged O(log T) times, so complexity is O(T log T) • So logarithmic merge is much more efficient for index construction • But query processing now requires the merging of O(log T) indexes – Whereas it is O(1) if you just have a main and auxiliary index 13
  • 14. Further issues with multiple indexes • Collection-wide statistics are hard to maintain • E.g., spell-correction: which of several corrected alternatives do we present to the user? – pick the one with the most hits • How do we maintain the top ones with multiple indexes and invalidation bit vectors? – One possibility: ignore everything but the main index for such ordering 14
  • 15. Dynamic indexing at search engines • All the large search engines now do dynamic indexing • Their indices have frequent incremental changes – News items, blogs, new topical web pages • But (sometimes/typically) they also periodically reconstruct the index from scratch – Query processing is then switched to the new index, and the old index is deleted 15
  • 16. 16
  • 17. 17