SlideShare a Scribd company logo
1 of 17
NWRUG
March 2009
  Sponsored by:
Search

• Sphinx + Thinking Sphinx (Will Jessop)
• Solr (Asa Calow)
• Ferret, and maybe Xapian (John Leach)
Sphinx + Thinking Sphinx

         Will Jessop
All you need is…
>> query = quot;The cat sat on the matquot;
=> quot;The cat sat on the matquot;

>> where = quot;(email like '%#{ query.split(/s+/).map{|term| term.downcase }.join(quot;%') OR (email
like '%quot;) }')quot;
=> quot;(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%')
OR (email like '%the%') OR (email like '%mat')quot;

>>execute(“select * from users where #{where}”)
=> fail

                                                                 PHP
    Congratulations, you are now a l33t programmer!
                                                                   ^


                                   Job done!
All you need is…
>> query = quot;The cat sat on the matquot;
=> quot;The cat sat on the matquot;

>> where = quot;(email like '%#{ query.split(/s+/).map{|term| term.downcase }.join(quot;%') OR (email
like '%quot;) }')quot;
=> quot;(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%')
OR (email like '%the%') OR (email like '%mat')quot;

>>execute(“select * from users where #{where}”)
=> fail

                                                                 PHP
    Congratulations, you are now a l33t programmer!
                                                                   ^


                                   Job done!
Why not use the DB?

• Building up SQL queries in
  code sucks
• Full text indexing in DBs isn’t
  great either
• DB’s are hard to scale
Why not use the DB?

• Building up SQL queries in
  code sucks
• Full text indexing in DBs isn’t
  great either
• DB’s are hard to scale
Sphinx is…

• Sphinx is a full-text search engine
• Open source (GPL version 2)
• Standalone
• Proven stable
• Performs well
Sphinx


Much better than Solr and Ferret
Sphinx


Much better than Solr and Ferret


              Maybe
Installing sphinx


  sudo port install sphinx
Out of the box

•   indexer - utility which creates fulltext indexes

•   searchd - daemon which enables external software to search fulltext
    indexes




                      Amongst other things
Using with your app
         Two Ruby on Rails APIs


• Ultra Sphinx
• Thinking Sphinx
Installing Ultra Sphinx

cd rails_app && script/plugin install git://github.com/
freelancing-god/thinking-sphinx.git
Questions
Resources

• http://reinh.com/blog/2008/07/14/a-
  thinking-mans-sphinx.html
• http://ts.freelancing-gods.com/
• http://ts.freelancing-gods.com/rdoc/
• http://www.sphinxsearch.com/

More Related Content

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

2009 03 19 Search In Your Rails App

  • 1. NWRUG March 2009 Sponsored by:
  • 2. Search • Sphinx + Thinking Sphinx (Will Jessop) • Solr (Asa Calow) • Ferret, and maybe Xapian (John Leach)
  • 3. Sphinx + Thinking Sphinx Will Jessop
  • 4. All you need is… >> query = quot;The cat sat on the matquot; => quot;The cat sat on the matquot; >> where = quot;(email like '%#{ query.split(/s+/).map{|term| term.downcase }.join(quot;%') OR (email like '%quot;) }')quot; => quot;(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%') OR (email like '%the%') OR (email like '%mat')quot; >>execute(“select * from users where #{where}”) => fail PHP Congratulations, you are now a l33t programmer! ^ Job done!
  • 5. All you need is… >> query = quot;The cat sat on the matquot; => quot;The cat sat on the matquot; >> where = quot;(email like '%#{ query.split(/s+/).map{|term| term.downcase }.join(quot;%') OR (email like '%quot;) }')quot; => quot;(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%') OR (email like '%the%') OR (email like '%mat')quot; >>execute(“select * from users where #{where}”) => fail PHP Congratulations, you are now a l33t programmer! ^ Job done!
  • 6. Why not use the DB? • Building up SQL queries in code sucks • Full text indexing in DBs isn’t great either • DB’s are hard to scale
  • 7. Why not use the DB? • Building up SQL queries in code sucks • Full text indexing in DBs isn’t great either • DB’s are hard to scale
  • 8. Sphinx is… • Sphinx is a full-text search engine • Open source (GPL version 2) • Standalone • Proven stable • Performs well
  • 9. Sphinx Much better than Solr and Ferret
  • 10. Sphinx Much better than Solr and Ferret Maybe
  • 11. Installing sphinx sudo port install sphinx
  • 12. Out of the box • indexer - utility which creates fulltext indexes • searchd - daemon which enables external software to search fulltext indexes Amongst other things
  • 13. Using with your app Two Ruby on Rails APIs • Ultra Sphinx • Thinking Sphinx
  • 14. Installing Ultra Sphinx cd rails_app && script/plugin install git://github.com/ freelancing-god/thinking-sphinx.git
  • 15.
  • 17. Resources • http://reinh.com/blog/2008/07/14/a- thinking-mans-sphinx.html • http://ts.freelancing-gods.com/ • http://ts.freelancing-gods.com/rdoc/ • http://www.sphinxsearch.com/

Editor's Notes

  1. - Welcome - Beer and pizza sponsored by Brightbox - Please consider talking!
  2. -- So why not just use the DB?
  3. - building SQL in code, easy to introduce mistakes - in with image - Someone has already handled the hard stuff, stop-word removal, stemming, that sort of thing - DB’s are traditionally the hardest element of a stack to scale, lets not put more stuff there. One of the main points. - Luckily there are a bunch of alternatives, next slide
  4. …and sphinx is one - Standalone - runs as a separate process - written in c, small memory footprint - stable - high indexing speed (upto 10 MB/sec on modern CPUs) - high search speed (avg query is under 0.1 sec on 2-4 GB text collections) - high scalability (upto 100 GB of text, upto 100 M documents on a single CPU) - supports distributed searching (since v.0.9.6)
  5. - Most importantly… read - Don’t let anyone try to convince you otherwise with shady propaganda
  6. Installing sphinx is easy
  7. - indexer, builds indexes - searchd, where the magic happens
  8. - Not much use to us unless we can use it with our applications, we have two choices - Both widely used at EY - differences
  9. Demo!