SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Indexing Stuff &&
Things with Sphinx and Perl
Houston Perl Mongers
May 8th, 2014
Hosted by cPanel, Inc.
Brett Estrade <estrabd@gmail.com>
Sphinx
● full text search indexer and daemon
● indexer - builds indexes
● searchd - services search requests
● very easy to install and configure
Sphinx Data Sources
● Directly from MySQL (MariaDB), PostgreSQL
○ Indexing data from arbitrary SQL
○ Excellent for fast reading of expensive JOINs
● XMLPipe2
○ General intermediate data understood by Sphinx
Search Interface
● Native protocol (e.g., Sphinx::Search)
● Supports MySQL protocol (4.1)
○ Subset of SQL supported is called SphinxQL
indexer data
named index for
searchd
searchd config
Client Example - Sphinx::Search
search term -
empty string
returns “all”
Search Results
Some Common Use Cases
● Rebuild index from database regularly
● Incrementally add to existing index
● Query Sphinx for DB primary keys, make DB
call for related rows
● Query Sphinx for wanted data (no DB at all)
== my use case
Real Life Examples
1. Indexing MariaDB
2. Filtering on string using CRC32
3. Creating sources w/Sphinx::XML::Pipe2
4. Dynamic config w/Sphinx::Config::Builder
Indexing MariaBD ~2.25 Million Rows
● Use case - saving eBay auction data in DB
● Providing search interface to it
● Demo run of indexer
How to Filter on Strings
● Requires CRC32 hashing (strings to ints)
● When indexing, use MySQL’s CRC32 function
● Use Perl’s String::CRC32 to encode string,
○ then set filter
And inside of client, use Perl’s String::CRC32 to encode to the same integer
Transforming Things to XMLPipe2
● XMLPipe2 is Sphinx’s generic data format
● Extract/Transform scripts -> XMLPipe2
● use Sphinx::XML::Pipe2; #’nuff said
Sample XMLPipe2 File
Sample XMLPipe2 Source Conf Entry
Example XMLPipe2 Use Case
● Monitor ephemera,e.g. active eBay listings
● Don’t want to use a database
● Many data partitions (i.e., indexes)
○ e.g., by store, by category, etc
○ > 250 (yikes!)
● Data partitions change over time (slowly)
Dynamic Indexing of XMLPipe2 Stuff
● Fact - Sphinx partitions data by indexes
● Problem - each index uses its own data file
○ data as XMLPipe2
● Challenge - how to manage a changing set
of indexes?
Sphinx’s --config to the Rescue!
● Config files are typically static, right?
● Sphinx can handle executables via --config
● indexer --config ./generate-config.pl --all
Sphinx::Config::Builder
● Module I created specifically for this case
○ uploaded to CPAN
● Why? No Sphinx config builders were a fit
● Module is low level and does what I need
○ i.e., dynamically builds a XMLPipe2 specific config
● A+ 100 Passing
○ http://cpantesters.org/distro/S/Sphinx-Config-Builder.html
Solution
● Expects XML2Pipe data files to already exist
● Iterate over array of indexes to build
● Creates “source” entries for XMLPipe2 data
● Creates “index” entries for each “source”
Demo
Tip of the Iceberg
● Sphinx has TONs of options and modes
● Tons of areas of application
● Many clients, Simple interface
● Super easy to install and maintain
Thank You!
● http://sphinxsearch.com/
● cpan://Sphinx::Search
● cpan://Sphinx::Config::Builder
● http://houston.pm.org

Contenu connexe

Tendances

Prometheus london
Prometheus londonPrometheus london
Prometheus londonwyukawa
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesRob Miller
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsm_richardson
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316radiocats
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2wyukawa
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow ManagementRomi Kuntsman
 
Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)xlwings
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistJeremy Zawodny
 

Tendances (20)

Kafka Workshop
Kafka WorkshopKafka Workshop
Kafka Workshop
 
Prometheus london
Prometheus londonPrometheus london
Prometheus london
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
 
Writing Well-Behaved Unix Utilities
Writing Well-Behaved Unix UtilitiesWriting Well-Behaved Unix Utilities
Writing Well-Behaved Unix Utilities
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
Containers and Logging
Containers and LoggingContainers and Logging
Containers and Logging
 
Scrapy.for.dummies
Scrapy.for.dummiesScrapy.for.dummies
Scrapy.for.dummies
 
Logstash
LogstashLogstash
Logstash
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
What Reika Taught us
What Reika Taught usWhat Reika Taught us
What Reika Taught us
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316
 
DrupalANDElasticsearch
DrupalANDElasticsearchDrupalANDElasticsearch
DrupalANDElasticsearch
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
 
Fluentd and AWS at classmethod
Fluentd and AWS at classmethodFluentd and AWS at classmethod
Fluentd and AWS at classmethod
 
Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)Deployment of xlwings-powered spreadsheets (webinar)
Deployment of xlwings-powered spreadsheets (webinar)
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 

En vedette

Pressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development GoalsPressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development Goalslucyannemorgan
 
Work WIth Redis and Perl
Work WIth Redis and PerlWork WIth Redis and Perl
Work WIth Redis and PerlBrett Estrade
 
Qore for the Perl Programmer
Qore for the Perl ProgrammerQore for the Perl Programmer
Qore for the Perl ProgrammerBrett Estrade
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

En vedette (6)

Pressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development GoalsPressure Groups and the Millennium Development Goals
Pressure Groups and the Millennium Development Goals
 
Work WIth Redis and Perl
Work WIth Redis and PerlWork WIth Redis and Perl
Work WIth Redis and Perl
 
6 Suffering Of Christ
6 Suffering Of Christ6 Suffering Of Christ
6 Suffering Of Christ
 
Openmp combined
Openmp combinedOpenmp combined
Openmp combined
 
Qore for the Perl Programmer
Qore for the Perl ProgrammerQore for the Perl Programmer
Qore for the Perl Programmer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similaire à Sphinx && Perl Houston Perl Mongers - May 8th, 2014

A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientistsStitch Fix Algorithms
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb PerformanceJack
 
InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1InfiniFlux
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQLPGConf APAC
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typeJumping Bean
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Toolcrus0e
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark StoriesRylan Halteman
 
IniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_ComparisonIniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_ComparisonInfiniFlux
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark TutorialAhmet Bulut
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic WebSteffen Staab
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportMetosin Oy
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Max Lapan
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Webeswcsummerschool
 

Similaire à Sphinx && Perl Houston Perl Mongers - May 8th, 2014 (20)

A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
Scaling / optimizing search on netlog
Scaling / optimizing search on netlogScaling / optimizing search on netlog
Scaling / optimizing search on netlog
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Mongodb Performance
Mongodb PerformanceMongodb Performance
Mongodb Performance
 
InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1InfiniFlux Feature perf comp_v1
InfiniFlux Feature perf comp_v1
 
There is Javascript in my SQL
There is Javascript in my SQLThere is Javascript in my SQL
There is Javascript in my SQL
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typePostgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Tool
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark Stories
 
IniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_ComparisonIniniFlux Feature_Perf_Comparison
IniniFlux Feature_Perf_Comparison
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience report
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Sphinx && Perl Houston Perl Mongers - May 8th, 2014

  • 1. Indexing Stuff && Things with Sphinx and Perl Houston Perl Mongers May 8th, 2014 Hosted by cPanel, Inc. Brett Estrade <estrabd@gmail.com>
  • 2. Sphinx ● full text search indexer and daemon ● indexer - builds indexes ● searchd - services search requests ● very easy to install and configure
  • 3. Sphinx Data Sources ● Directly from MySQL (MariaDB), PostgreSQL ○ Indexing data from arbitrary SQL ○ Excellent for fast reading of expensive JOINs ● XMLPipe2 ○ General intermediate data understood by Sphinx
  • 4. Search Interface ● Native protocol (e.g., Sphinx::Search) ● Supports MySQL protocol (4.1) ○ Subset of SQL supported is called SphinxQL
  • 8. Client Example - Sphinx::Search search term - empty string returns “all”
  • 10. Some Common Use Cases ● Rebuild index from database regularly ● Incrementally add to existing index ● Query Sphinx for DB primary keys, make DB call for related rows ● Query Sphinx for wanted data (no DB at all) == my use case
  • 11. Real Life Examples 1. Indexing MariaDB 2. Filtering on string using CRC32 3. Creating sources w/Sphinx::XML::Pipe2 4. Dynamic config w/Sphinx::Config::Builder
  • 12. Indexing MariaBD ~2.25 Million Rows ● Use case - saving eBay auction data in DB ● Providing search interface to it ● Demo run of indexer
  • 13. How to Filter on Strings ● Requires CRC32 hashing (strings to ints) ● When indexing, use MySQL’s CRC32 function ● Use Perl’s String::CRC32 to encode string, ○ then set filter
  • 14. And inside of client, use Perl’s String::CRC32 to encode to the same integer
  • 15. Transforming Things to XMLPipe2 ● XMLPipe2 is Sphinx’s generic data format ● Extract/Transform scripts -> XMLPipe2 ● use Sphinx::XML::Pipe2; #’nuff said
  • 18. Example XMLPipe2 Use Case ● Monitor ephemera,e.g. active eBay listings ● Don’t want to use a database ● Many data partitions (i.e., indexes) ○ e.g., by store, by category, etc ○ > 250 (yikes!) ● Data partitions change over time (slowly)
  • 19. Dynamic Indexing of XMLPipe2 Stuff ● Fact - Sphinx partitions data by indexes ● Problem - each index uses its own data file ○ data as XMLPipe2 ● Challenge - how to manage a changing set of indexes?
  • 20. Sphinx’s --config to the Rescue! ● Config files are typically static, right? ● Sphinx can handle executables via --config ● indexer --config ./generate-config.pl --all
  • 21. Sphinx::Config::Builder ● Module I created specifically for this case ○ uploaded to CPAN ● Why? No Sphinx config builders were a fit ● Module is low level and does what I need ○ i.e., dynamically builds a XMLPipe2 specific config ● A+ 100 Passing ○ http://cpantesters.org/distro/S/Sphinx-Config-Builder.html
  • 22. Solution ● Expects XML2Pipe data files to already exist ● Iterate over array of indexes to build ● Creates “source” entries for XMLPipe2 data ● Creates “index” entries for each “source”
  • 23. Demo
  • 24. Tip of the Iceberg ● Sphinx has TONs of options and modes ● Tons of areas of application ● Many clients, Simple interface ● Super easy to install and maintain
  • 25. Thank You! ● http://sphinxsearch.com/ ● cpan://Sphinx::Search ● cpan://Sphinx::Config::Builder ● http://houston.pm.org