SlideShare une entreprise Scribd logo
1  sur  15
SPHINX AND THINKING
       SPHINX
      10 Minute Intro
HAYES DAVIS
      Founder, Appozite
cheaptweet.com | @cheaptweet
        @hayesdavis
SPHINX
•Open Source full-text search
 engine
•Designed around SQL
•Standalone daemon
 (searchd)


                                http://guardians.net/hawass/images/sphinx3.jpg
THINKING
     SPHINX
•Rails plugin
•Integrates Active Record
 with Sphinx
•Makes talking to Sphinx
 basically painless
BASIC IDEA


• Configure   your indexes

• Index

• Query

• Repeat
CONFIGURING INDEXES

• Add indexes on your AR            class Article < ActiveRecord::Base

 classes using define_index           define_index do
                                        # fields
• Fields (indexes)   contain text       indexes subject, :sortable => true
                                        indexes content
 you can search                         indexes author.name, :as=> :author,
                                          :sortable => true

• Attributes (has)
                 allow you to           # attributes

 sort and constrain your                has author_id, created_at,
                                            updated_at
 searches                             end

                                    end
• Careful!Column names
 aren’t symbols
Run the indexer
rake thinking_sphinx:index
source twitterer_core_0
{
  type = mysql
  sql_host = 127.0.0.1
  sql_user = cheaptweet
  sql_pass = cheaptweet
  sql_db = cheaptweet_development2
  sql_query_pre = UPDATE `twitterer` SET `delta` = 0
  sql_query_pre = SET NAMES utf8
  sql_query = SELECT `twitterer`.`id` * 1 + 0 AS `id` , CAST(`twitterer`.`screen_name` AS CHAR) AS `screen_name`, CAST(`twitterer`.`name` AS
CHAR) AS `name`, CAST(`twitterer`.`description` AS CHAR) AS `description`, CAST(`twitterer`.`url` AS CHAR) AS `url`,
CAST(`twitterer`.`location` AS CHAR) AS `location`, `twitterer`.`id` AS `sphinx_internal_id`, 283224142 AS `class_crc`, '283224142' AS
`subclass_crcs`, 0 AS `sphinx_deleted` FROM twitterer    WHERE `twitterer`.`id` >= $start   AND `twitterer`.`id` <= $end    AND
`twitterer`.`delta` = 0 GROUP BY `twitterer`.`id` ORDER BY NULL
  sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM `twitterer` WHERE `twitterer`.`delta` = 0
  sql_attr_uint = sphinx_internal_id
  sql_attr_uint = class_crc
  sql_attr_uint = sphinx_deleted
  sql_attr_multi = uint subclass_crcs from field
  sql_query_info = SELECT * FROM `twitterer` WHERE `id` = (($id - 0) / 1)
}

index twitterer_core
{
  source = twitterer_core_0
  path = /Users/hayesdavis/Appozite/workspace/CheapTweet/data/sphinx/development/twitterer_core
  morphology = stem_en
  charset_type = utf-8
}




          MORE ABOUT INDEXING
Thinking Sphinx generates a config file for sphinx, indexes (aka
        “sources”) are defined. It’s a little complicated.
Start Sphinx
rake thinking_sphinx:start
#Searches all fields for “pants”
Article.search “pants”

#Conditions are allowed on fields but must be hash
Article.search “pants”, :conditions=>{
  :subject=>”How To Wear”
}

#Query attributes using :with
Article.search “pants”, :with=>{
  :author_id=>1, :created_at=>1.week.ago..Time.now
}




               SEARCHING
         Use the search method on AR classes
BUT WAIT
     HOW DO I KEEP INDEXES
(ESPECIALLY BIG ONES) UP TO DATE?
DELTA INDEXES TO THE
                 RESCUE
• Mini   index of only rows that have been updated

• Must    merge into “core” index periodically or it’ll get slow

• Simplest   approach: add delta boolean column to model

• Add set_property :delta=>true        to define_index block

• Delta   index is rebuilt on model saves, can cause performance
 hit
DEPLOYMENT &
                 PRODUCTION

• Must   schedule full re-indexing periodically

• Have   god or monit keep an eye on things

• Consider adding some cap tasks to help out with reindexing
 and restarting
TIPS, TRICKS, GOTCHAS

• Simplest   delta indexing can lead to performance issues

• Indexer assumes you have sequential ids on your DB rows and
 iterates through them in chunks - very bad if you have big
 gaps

• Run full indexing as often as you can without hurting
 performance - it’s usually pretty fast

• Youcan hand-edit config files if you need to tune - but be
 careful not to regenerate
RESOURCES


Sphinx http://www.sphinxsearch.com/

Thinking Sphinx http://freelancing-god.github.com/ts/en/

Railscast http://railscasts.com/episodes/120-thinking-sphinx

Contenu connexe

Tendances

5分で説明する Play! scala
5分で説明する Play! scala5分で説明する Play! scala
5分で説明する Play! scalamasahitojp
 
Solr Anti - patterns
Solr Anti - patternsSolr Anti - patterns
Solr Anti - patternsRafał Kuć
 
(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014
(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014
(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014Amazon Web Services
 
State of search | drupalcon dublin
State of search | drupalcon dublinState of search | drupalcon dublin
State of search | drupalcon dublinJoris Vercammen
 
Transforming WordPress Search and Query Performance with Elasticsearch
Transforming WordPress Search and Query Performance with Elasticsearch Transforming WordPress Search and Query Performance with Elasticsearch
Transforming WordPress Search and Query Performance with Elasticsearch Taylor Lovett
 
Assetic (Symfony Live Paris)
Assetic (Symfony Live Paris)Assetic (Symfony Live Paris)
Assetic (Symfony Live Paris)Kris Wallsmith
 
Better Data Persistence on Android
Better Data Persistence on AndroidBetter Data Persistence on Android
Better Data Persistence on AndroidEric Maxwell
 
AngularJS Tips&Tricks
AngularJS Tips&TricksAngularJS Tips&Tricks
AngularJS Tips&TricksPetr Bela
 
Intro To Moose
Intro To MooseIntro To Moose
Intro To MoosecPanel
 
The effective use of Django ORM
The effective use of Django ORMThe effective use of Django ORM
The effective use of Django ORMYaroslav Muravskyi
 
Building Cloud Castles - LRUG
Building Cloud Castles - LRUGBuilding Cloud Castles - LRUG
Building Cloud Castles - LRUGBen Scofield
 
Great Developers Steal
Great Developers StealGreat Developers Steal
Great Developers StealBen Scofield
 
Getting Hiera and Hiera
Getting Hiera and HieraGetting Hiera and Hiera
Getting Hiera and HieraPuppet
 
Building Cloud Castles
Building Cloud CastlesBuilding Cloud Castles
Building Cloud CastlesBen Scofield
 
Pourquoi ruby et rails déchirent
Pourquoi ruby et rails déchirentPourquoi ruby et rails déchirent
Pourquoi ruby et rails déchirentNicolas Ledez
 

Tendances (20)

5分で説明する Play! scala
5分で説明する Play! scala5分で説明する Play! scala
5分で説明する Play! scala
 
Solr Anti - patterns
Solr Anti - patternsSolr Anti - patterns
Solr Anti - patterns
 
(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014
(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014
(DEV305) Building Apps with the AWS SDK for PHP | AWS re:Invent 2014
 
it's just search
it's just searchit's just search
it's just search
 
Mentor Your Indexes
Mentor Your IndexesMentor Your Indexes
Mentor Your Indexes
 
State of search | drupalcon dublin
State of search | drupalcon dublinState of search | drupalcon dublin
State of search | drupalcon dublin
 
Transforming WordPress Search and Query Performance with Elasticsearch
Transforming WordPress Search and Query Performance with Elasticsearch Transforming WordPress Search and Query Performance with Elasticsearch
Transforming WordPress Search and Query Performance with Elasticsearch
 
Assetic (Symfony Live Paris)
Assetic (Symfony Live Paris)Assetic (Symfony Live Paris)
Assetic (Symfony Live Paris)
 
Elegant APIs
Elegant APIsElegant APIs
Elegant APIs
 
Better Data Persistence on Android
Better Data Persistence on AndroidBetter Data Persistence on Android
Better Data Persistence on Android
 
Assetic (OSCON)
Assetic (OSCON)Assetic (OSCON)
Assetic (OSCON)
 
AngularJS Tips&Tricks
AngularJS Tips&TricksAngularJS Tips&Tricks
AngularJS Tips&Tricks
 
Intro To Moose
Intro To MooseIntro To Moose
Intro To Moose
 
The effective use of Django ORM
The effective use of Django ORMThe effective use of Django ORM
The effective use of Django ORM
 
Building Cloud Castles - LRUG
Building Cloud Castles - LRUGBuilding Cloud Castles - LRUG
Building Cloud Castles - LRUG
 
Great Developers Steal
Great Developers StealGreat Developers Steal
Great Developers Steal
 
Getting Hiera and Hiera
Getting Hiera and HieraGetting Hiera and Hiera
Getting Hiera and Hiera
 
Building Cloud Castles
Building Cloud CastlesBuilding Cloud Castles
Building Cloud Castles
 
Pourquoi ruby et rails déchirent
Pourquoi ruby et rails déchirentPourquoi ruby et rails déchirent
Pourquoi ruby et rails déchirent
 
Lu solr32 34-20110912
Lu solr32 34-20110912Lu solr32 34-20110912
Lu solr32 34-20110912
 

En vedette

Структура сайта Camisco 2006
Структура сайта Camisco 2006Структура сайта Camisco 2006
Структура сайта Camisco 2006Vadim Andreev
 
fidelity national information 2nd Quarter 2007 10Q
fidelity national information  2nd Quarter 2007 10Qfidelity national information  2nd Quarter 2007 10Q
fidelity national information 2nd Quarter 2007 10Qfinance48
 
Thesis110309
Thesis110309Thesis110309
Thesis110309klee4vp
 
Thesis100609
Thesis100609Thesis100609
Thesis100609klee4vp
 
18 Minute Presentation In Greek
18 Minute Presentation In Greek18 Minute Presentation In Greek
18 Minute Presentation In GreekFred Johansen
 
AIESEC HUST 09Fall 招新——外语学院
AIESEC HUST 09Fall 招新——外语学院AIESEC HUST 09Fall 招新——外语学院
AIESEC HUST 09Fall 招新——外语学院cscguochang
 
Have Breakfast… Or…Be Breakfast
Have Breakfast… Or…Be BreakfastHave Breakfast… Or…Be Breakfast
Have Breakfast… Or…Be BreakfastRajesh Goyal
 
Městská karta
Městská kartaMěstská karta
Městská kartabezouska
 
Ground breakingceremony csr-ptcsi
Ground breakingceremony csr-ptcsiGround breakingceremony csr-ptcsi
Ground breakingceremony csr-ptcsiAmril Taufik Gobel
 
AIESEC HUST 09Fall招新进行时——信息学院
AIESEC HUST 09Fall招新进行时——信息学院AIESEC HUST 09Fall招新进行时——信息学院
AIESEC HUST 09Fall招新进行时——信息学院cscguochang
 
Passie Voor Horeca Minicursus Arrangeren De Rooi Pannen
Passie Voor Horeca Minicursus Arrangeren De Rooi PannenPassie Voor Horeca Minicursus Arrangeren De Rooi Pannen
Passie Voor Horeca Minicursus Arrangeren De Rooi PannenJohan Lapidaire
 
Cuestionariojornadadereflexion
CuestionariojornadadereflexionCuestionariojornadadereflexion
CuestionariojornadadereflexionJuan Castillo
 
Mobile Cloud Architectures
Mobile Cloud ArchitecturesMobile Cloud Architectures
Mobile Cloud ArchitecturesDavid Coallier
 
Thesis Midterm032610
Thesis Midterm032610Thesis Midterm032610
Thesis Midterm032610klee4vp
 
Crusade propaganda and ideology
Crusade propaganda and ideologyCrusade propaganda and ideology
Crusade propaganda and ideologyMehmet Saruhan
 
SXSW 2013 Submission- Marketing Tech When Your Product Changes Every Day
SXSW 2013 Submission- Marketing Tech When Your Product Changes Every DaySXSW 2013 Submission- Marketing Tech When Your Product Changes Every Day
SXSW 2013 Submission- Marketing Tech When Your Product Changes Every DayCaitlin Jeansonne
 
RIAアーキテクチャー研究会 第3回 セッション4 Mvpvm pattern
RIAアーキテクチャー研究会 第3回 セッション4 Mvpvm patternRIAアーキテクチャー研究会 第3回 セッション4 Mvpvm pattern
RIAアーキテクチャー研究会 第3回 セッション4 Mvpvm patternMami Shiino
 
Lams201: Digging deeper into the Learning Activity Management System
Lams201: Digging deeper into the Learning Activity Management SystemLams201: Digging deeper into the Learning Activity Management System
Lams201: Digging deeper into the Learning Activity Management SystemAllan Carrington
 
As cores do casamento - O azul
As cores do casamento - O azulAs cores do casamento - O azul
As cores do casamento - O azulcasebem
 

En vedette (20)

Структура сайта Camisco 2006
Структура сайта Camisco 2006Структура сайта Camisco 2006
Структура сайта Camisco 2006
 
fidelity national information 2nd Quarter 2007 10Q
fidelity national information  2nd Quarter 2007 10Qfidelity national information  2nd Quarter 2007 10Q
fidelity national information 2nd Quarter 2007 10Q
 
Thesis110309
Thesis110309Thesis110309
Thesis110309
 
Thesis100609
Thesis100609Thesis100609
Thesis100609
 
18 Minute Presentation In Greek
18 Minute Presentation In Greek18 Minute Presentation In Greek
18 Minute Presentation In Greek
 
AIESEC HUST 09Fall 招新——外语学院
AIESEC HUST 09Fall 招新——外语学院AIESEC HUST 09Fall 招新——外语学院
AIESEC HUST 09Fall 招新——外语学院
 
Have Breakfast… Or…Be Breakfast
Have Breakfast… Or…Be BreakfastHave Breakfast… Or…Be Breakfast
Have Breakfast… Or…Be Breakfast
 
Accept the Pain
Accept the PainAccept the Pain
Accept the Pain
 
Městská karta
Městská kartaMěstská karta
Městská karta
 
Ground breakingceremony csr-ptcsi
Ground breakingceremony csr-ptcsiGround breakingceremony csr-ptcsi
Ground breakingceremony csr-ptcsi
 
AIESEC HUST 09Fall招新进行时——信息学院
AIESEC HUST 09Fall招新进行时——信息学院AIESEC HUST 09Fall招新进行时——信息学院
AIESEC HUST 09Fall招新进行时——信息学院
 
Passie Voor Horeca Minicursus Arrangeren De Rooi Pannen
Passie Voor Horeca Minicursus Arrangeren De Rooi PannenPassie Voor Horeca Minicursus Arrangeren De Rooi Pannen
Passie Voor Horeca Minicursus Arrangeren De Rooi Pannen
 
Cuestionariojornadadereflexion
CuestionariojornadadereflexionCuestionariojornadadereflexion
Cuestionariojornadadereflexion
 
Mobile Cloud Architectures
Mobile Cloud ArchitecturesMobile Cloud Architectures
Mobile Cloud Architectures
 
Thesis Midterm032610
Thesis Midterm032610Thesis Midterm032610
Thesis Midterm032610
 
Crusade propaganda and ideology
Crusade propaganda and ideologyCrusade propaganda and ideology
Crusade propaganda and ideology
 
SXSW 2013 Submission- Marketing Tech When Your Product Changes Every Day
SXSW 2013 Submission- Marketing Tech When Your Product Changes Every DaySXSW 2013 Submission- Marketing Tech When Your Product Changes Every Day
SXSW 2013 Submission- Marketing Tech When Your Product Changes Every Day
 
RIAアーキテクチャー研究会 第3回 セッション4 Mvpvm pattern
RIAアーキテクチャー研究会 第3回 セッション4 Mvpvm patternRIAアーキテクチャー研究会 第3回 セッション4 Mvpvm pattern
RIAアーキテクチャー研究会 第3回 セッション4 Mvpvm pattern
 
Lams201: Digging deeper into the Learning Activity Management System
Lams201: Digging deeper into the Learning Activity Management SystemLams201: Digging deeper into the Learning Activity Management System
Lams201: Digging deeper into the Learning Activity Management System
 
As cores do casamento - O azul
As cores do casamento - O azulAs cores do casamento - O azul
As cores do casamento - O azul
 

Similaire à Quick Introduction to Sphinx and Thinking Sphinx

Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']Jan Helke
 
Remixing Confluence with Speakeasy - AtlasCamp 2011
Remixing Confluence with Speakeasy - AtlasCamp 2011Remixing Confluence with Speakeasy - AtlasCamp 2011
Remixing Confluence with Speakeasy - AtlasCamp 2011Atlassian
 
Ako prepojiť aplikáciu s Elasticsearch
Ako prepojiť aplikáciu s ElasticsearchAko prepojiť aplikáciu s Elasticsearch
Ako prepojiť aplikáciu s Elasticsearchbart-sk
 
Slides python elixir
Slides python elixirSlides python elixir
Slides python elixirAdel Totott
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchTaylor Lovett
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHPMike Lively
 
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails AppSrijan Technologies
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Sterling for Windows Phone 7
Sterling for Windows Phone 7Sterling for Windows Phone 7
Sterling for Windows Phone 7Jeremy Likness
 
Rails 3 (beta) Roundup
Rails 3 (beta) RoundupRails 3 (beta) Roundup
Rails 3 (beta) RoundupWayne Carter
 
Remixing Confluence With Speakeasy
Remixing Confluence With SpeakeasyRemixing Confluence With Speakeasy
Remixing Confluence With Speakeasynabeelahali
 
The Way to Theme Enlightenment
The Way to Theme EnlightenmentThe Way to Theme Enlightenment
The Way to Theme EnlightenmentAmanda Giles
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextCarsten Czarski
 
Sphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in DrupalSphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in Drupalelliando dias
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Wongnai
 
Wordpress search-elasticsearch
Wordpress search-elasticsearchWordpress search-elasticsearch
Wordpress search-elasticsearchTaylor Lovett
 

Similaire à Quick Introduction to Sphinx and Thinking Sphinx (20)

Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']
 
Remixing Confluence with Speakeasy - AtlasCamp 2011
Remixing Confluence with Speakeasy - AtlasCamp 2011Remixing Confluence with Speakeasy - AtlasCamp 2011
Remixing Confluence with Speakeasy - AtlasCamp 2011
 
Ako prepojiť aplikáciu s Elasticsearch
Ako prepojiť aplikáciu s ElasticsearchAko prepojiť aplikáciu s Elasticsearch
Ako prepojiť aplikáciu s Elasticsearch
 
Slides python elixir
Slides python elixirSlides python elixir
Slides python elixir
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Real World MVC
Real World MVCReal World MVC
Real World MVC
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHP
 
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
[Srijan Wednesday Webinar] Easy Performance Wins for Your Rails App
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Sterling for Windows Phone 7
Sterling for Windows Phone 7Sterling for Windows Phone 7
Sterling for Windows Phone 7
 
Rails 3 (beta) Roundup
Rails 3 (beta) RoundupRails 3 (beta) Roundup
Rails 3 (beta) Roundup
 
Remixing Confluence With Speakeasy
Remixing Confluence With SpeakeasyRemixing Confluence With Speakeasy
Remixing Confluence With Speakeasy
 
The Way to Theme Enlightenment
The Way to Theme EnlightenmentThe Way to Theme Enlightenment
The Way to Theme Enlightenment
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
 
Sphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in DrupalSphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in Drupal
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)
 
Wordpress search-elasticsearch
Wordpress search-elasticsearchWordpress search-elasticsearch
Wordpress search-elasticsearch
 
SphinxSE with MySQL
SphinxSE with MySQLSphinxSE with MySQL
SphinxSE with MySQL
 

Dernier

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Quick Introduction to Sphinx and Thinking Sphinx

  • 1. SPHINX AND THINKING SPHINX 10 Minute Intro
  • 2. HAYES DAVIS Founder, Appozite cheaptweet.com | @cheaptweet @hayesdavis
  • 3. SPHINX •Open Source full-text search engine •Designed around SQL •Standalone daemon (searchd) http://guardians.net/hawass/images/sphinx3.jpg
  • 4. THINKING SPHINX •Rails plugin •Integrates Active Record with Sphinx •Makes talking to Sphinx basically painless
  • 5. BASIC IDEA • Configure your indexes • Index • Query • Repeat
  • 6. CONFIGURING INDEXES • Add indexes on your AR class Article < ActiveRecord::Base classes using define_index define_index do # fields • Fields (indexes) contain text indexes subject, :sortable => true indexes content you can search indexes author.name, :as=> :author, :sortable => true • Attributes (has) allow you to # attributes sort and constrain your has author_id, created_at, updated_at searches end end • Careful!Column names aren’t symbols
  • 7. Run the indexer rake thinking_sphinx:index
  • 8. source twitterer_core_0 { type = mysql sql_host = 127.0.0.1 sql_user = cheaptweet sql_pass = cheaptweet sql_db = cheaptweet_development2 sql_query_pre = UPDATE `twitterer` SET `delta` = 0 sql_query_pre = SET NAMES utf8 sql_query = SELECT `twitterer`.`id` * 1 + 0 AS `id` , CAST(`twitterer`.`screen_name` AS CHAR) AS `screen_name`, CAST(`twitterer`.`name` AS CHAR) AS `name`, CAST(`twitterer`.`description` AS CHAR) AS `description`, CAST(`twitterer`.`url` AS CHAR) AS `url`, CAST(`twitterer`.`location` AS CHAR) AS `location`, `twitterer`.`id` AS `sphinx_internal_id`, 283224142 AS `class_crc`, '283224142' AS `subclass_crcs`, 0 AS `sphinx_deleted` FROM twitterer WHERE `twitterer`.`id` >= $start AND `twitterer`.`id` <= $end AND `twitterer`.`delta` = 0 GROUP BY `twitterer`.`id` ORDER BY NULL sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM `twitterer` WHERE `twitterer`.`delta` = 0 sql_attr_uint = sphinx_internal_id sql_attr_uint = class_crc sql_attr_uint = sphinx_deleted sql_attr_multi = uint subclass_crcs from field sql_query_info = SELECT * FROM `twitterer` WHERE `id` = (($id - 0) / 1) } index twitterer_core { source = twitterer_core_0 path = /Users/hayesdavis/Appozite/workspace/CheapTweet/data/sphinx/development/twitterer_core morphology = stem_en charset_type = utf-8 } MORE ABOUT INDEXING Thinking Sphinx generates a config file for sphinx, indexes (aka “sources”) are defined. It’s a little complicated.
  • 10. #Searches all fields for “pants” Article.search “pants” #Conditions are allowed on fields but must be hash Article.search “pants”, :conditions=>{ :subject=>”How To Wear” } #Query attributes using :with Article.search “pants”, :with=>{ :author_id=>1, :created_at=>1.week.ago..Time.now } SEARCHING Use the search method on AR classes
  • 11. BUT WAIT HOW DO I KEEP INDEXES (ESPECIALLY BIG ONES) UP TO DATE?
  • 12. DELTA INDEXES TO THE RESCUE • Mini index of only rows that have been updated • Must merge into “core” index periodically or it’ll get slow • Simplest approach: add delta boolean column to model • Add set_property :delta=>true to define_index block • Delta index is rebuilt on model saves, can cause performance hit
  • 13. DEPLOYMENT & PRODUCTION • Must schedule full re-indexing periodically • Have god or monit keep an eye on things • Consider adding some cap tasks to help out with reindexing and restarting
  • 14. TIPS, TRICKS, GOTCHAS • Simplest delta indexing can lead to performance issues • Indexer assumes you have sequential ids on your DB rows and iterates through them in chunks - very bad if you have big gaps • Run full indexing as often as you can without hurting performance - it’s usually pretty fast • Youcan hand-edit config files if you need to tune - but be careful not to regenerate
  • 15. RESOURCES Sphinx http://www.sphinxsearch.com/ Thinking Sphinx http://freelancing-god.github.com/ts/en/ Railscast http://railscasts.com/episodes/120-thinking-sphinx