SlideShare une entreprise Scribd logo
1  sur  43
Optimising Xapian Richard Boulton UKUUG, Birmingham 9 th  August 2009
Xapian is a search engine
Xapian is  a search engine an information retrieval toolkit
Index my stuff
Find things
…  quickly …  quickly
Arbitrary Boolean restrictions Correct spelling mistaks Suggest search completions Find similar things Browse facets of things Find nearby things Get diverse results Find images which look similar Different sort orders Arbitrary weight influences
Mature ,[object Object]
Not going to talk about... ,[object Object]
Am going to talk about... Two types of optimisation ,[object Object],10
Making the most of hardware ,[object Object]
Limited memory
Database on a slow disk
Requirements ,[object Object]
Analysing the problem ,[object Object]
Precalculated searches?
Can't precalculate everything...
Calculate all single-term queries
Stored data ,[object Object],Felt 1 6 8 Pens 3 6 7 9
Single term search ,[object Object]
Remember the best Felt 1 6 8 Pens 3 6 7 9
AND search ,[object Object]
Hold it in memory:
Read next list
Merge it in:
Select the best 1 6 8 1 3 6 7 8 9
AND search ,[object Object]
Problem – no way to avoid reading all of the list
Better AND search ,[object Object]
Start with the shortest
Jump forward in second list to keep up with first
Keep only the best N items
Better AND search ,[object Object]
Start with the shortest
Jump  forward in second list to keep up with first
Keep only the best N items
OR search ,[object Object]
But, unlike AND, we can't skip items
So … make it into an AND
How?
OR search ,[object Object]
Track only those
Keep track of the lowest weight of those

Contenu connexe

Similaire à Optimising Xapian

Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12
Atner Yegorov
 
Data Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big DatabasesData Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big Databases
omnidba
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
Eric Sun
 
Management Consulting Productivity Hacks
Management Consulting Productivity HacksManagement Consulting Productivity Hacks
Management Consulting Productivity Hacks
Asen Gyczew
 

Similaire à Optimising Xapian (20)

Unit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-recordsUnit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-records
 
Placement oriented data structures
Placement oriented data structuresPlacement oriented data structures
Placement oriented data structures
 
Advanced Cinahl
Advanced CinahlAdvanced Cinahl
Advanced Cinahl
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
L24Searching&Sorting.ppt
L24Searching&Sorting.pptL24Searching&Sorting.ppt
L24Searching&Sorting.ppt
 
L24Searching&Sorting.ppt
L24Searching&Sorting.pptL24Searching&Sorting.ppt
L24Searching&Sorting.ppt
 
AAXGF.ppt
AAXGF.pptAAXGF.ppt
AAXGF.ppt
 
Xgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - ExplainedXgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - Explained
 
Scalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchScalable Data Models with Elasticsearch
Scalable Data Models with Elasticsearch
 
Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12Bender kuszmaul tutorial-xldb12
Bender kuszmaul tutorial-xldb12
 
Data Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big DatabasesData Structures and Algorithms for Big Databases
Data Structures and Algorithms for Big Databases
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
 
How To Research
How To ResearchHow To Research
How To Research
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Advanced PubMed (Productivity & Efficiency): Professional & Clinical Informat...
Advanced PubMed (Productivity & Efficiency): Professional & Clinical Informat...Advanced PubMed (Productivity & Efficiency): Professional & Clinical Informat...
Advanced PubMed (Productivity & Efficiency): Professional & Clinical Informat...
 
Management Consulting Productivity Hacks
Management Consulting Productivity HacksManagement Consulting Productivity Hacks
Management Consulting Productivity Hacks
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data Science
 
Combining General and Genre-Specific Approaches to L2 Writing Instruction
Combining General and Genre-Specific Approaches to L2 Writing InstructionCombining General and Genre-Specific Approaches to L2 Writing Instruction
Combining General and Genre-Specific Approaches to L2 Writing Instruction
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and Hadoop
 

Plus de Richard Boulton (6)

Improving relevance with log information
Improving relevance with log informationImproving relevance with log information
Improving relevance with log information
 
Making a simple question into a complicated query
Making a simple question into a complicated queryMaking a simple question into a complicated query
Making a simple question into a complicated query
 
Haystack
HaystackHaystack
Haystack
 
Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009Search as a Service with Xapian - Search Solutions 2009
Search as a Service with Xapian - Search Solutions 2009
 
Comparing open source search engines
Comparing open source search enginesComparing open source search engines
Comparing open source search engines
 
The Xapian Open Source Search Engine
The Xapian Open Source Search EngineThe Xapian Open Source Search Engine
The Xapian Open Source Search Engine
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Optimising Xapian