SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
Full Text Search with
     Apache Solr
        Pittaya Sroilong
      pittaya@gmail.com
Who am I?
Solr?
Not her!
But a search server
based on Lucene
Lucene?
Full-text search
     library
100% java
   :-(
Solr is based on
     Lucene
XML/HTTP, JSON
  interface
Open Source
Shield us from using
        Java
         :-)
Who use Solr/Lucene?
Who use Solr/Lucene?
What is our problem?
How do we
implement this?
SELECT * FROM post WHERE
topic LIKE ‘%aoi%’ OR author
LIKE ‘%aoi%’ ORDER BY id DESC
SELECT * FROM post WHERE
(topic LIKE ‘%aoi%’ OR author
LIKE ‘%aoi%’)
OR
(topic LIKE ‘%miyabi%’ OR
author LIKE ‘%miyabi%’)
ORDER BY id DESC
Full table scan
         =
Performance killer
No search scoring
RDBMS isn’t designed
    to do this
Use the right tool!
Indexer
    Update index
                   Query


    Solr                    Web App
   Lucene
                   Result
1
De ne schema.xml
<field name=quot;idquot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;fullnamequot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;positionquot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;tagquot; type=quot;stringiquot;
indexed=quot;truequot; stored=quot;truequot;
multiValued=quot;truequot; />
2
Deploy on any J2EE
    container
Tomcat, Jetty, etc.
3
Index documents
Document format
<add><doc>
 <field name=”id”>555</field>
 <field name=”fullname”>Kaka</field>
 <field name=”position”>Midfielder</field>
 <field name=”tag”>AC Milan</field>
 <field name=”tag”>Brazil</field>
</doc></add>
Post to Solr
http://<host>/solr/update
Any language that can
   do HTTP POST
PHP, Perl, Python
cURL
Commit
<commit />
4
Search
Query from
http://<host>/solr/select
Use Solr query syntax
http://<host>/solr/select?
q=tag:madrid&start=0&rows
=2& =fullname,position,tag
Response in XML or
JSON (con gurable)
<response>
 <result numFound=”46” start=”0”>
  <doc>
    <str name=”fullname”>Sergio Ramos</str>
    <str name=”position”>Defender</str>
    <str name=”tag”>Real Madrid</str>
    <str name=”tag”>Spain</str>
  </doc>
  <doc>
    <str name=”fullname”>Diego Forlan</str>
    <str name=”position”>Striker</str>
    <str name=”tag”>Atletico Madrid</str>
    <str name=”tag”>Uruguay</str>
  </doc>
 </result>
</response>
&wt=json
{
 “result”: { “numFound”: 46, “start”: 0,
   “docs” : [
     { “fullname”: “Sergio Ramos”,
       “position”: “Defender”,
       “tag”: [“Real Madrid”, “Spain”] },
     { “fullname”: “Diego Forlan”,
       “position”: “Striker”,
       “tag”: [“Atletico Madrid”, “Uruguay”] }
   ]
 }
}
Query examples
• David Pizzarro
 • Equiv: David OR Pizzarro
 • Default operator is
   “OR” (con gurable)
 • Result: David Villa, David
   Pizzarro, Claudio Pizzarro,
   David Seaman
• +David +tag:Roma
 • Equiv: David AND tag:Roma
 • Result: David Pizzarro
• +David +position:(Striker OR
 Mid elder)
 • Result: David Villa, David
   Pizzarro
Updating
Post new document to
http://<host>/solr/update
Deleting
<delete>
<id>345</id>
</delete>
<delete>
<query>tag:Brazil</query>
</delete>
<delete>
<query>*:*</query>
</delete>
Thai support
fwdder.com
Sharing forward mails
Use customized eld
   in schema.xml
<fieldType name=quot;html_thquot; class=quot;solr.TextFieldquot;
positionIncrementGap=quot;100quot;>
      <analyzer type=quot;indexquot;>
        <tokenizer
class=quot;solr.HTMLStripStandardTokenizerFactoryquot;/>
        <filter class=quot;solr.ThaiWordFilterFactoryquot; />
        <filter class=quot;solr.StopFilterFactoryquot;
ignoreCase=quot;truequot; words=quot;stopwords.txtquot;/>
        <filter class=quot;solr.LowerCaseFilterFactoryquot;/>
        <filter class=quot;solr.EnglishPorterFilterFactoryquot;
protected=quot;protwords.txtquot;/>
        <filter
class=quot;solr.RemoveDuplicatesTokenFilterFactoryquot;/>
      </analyzer>
    </fieldType>
<field name=quot;idquot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;titlequot; type=quot;html_thquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;detailquot; type=quot;html_thquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;tagquot; type=quot;stringiquot;
indexed=quot;truequot; stored=quot;truequot;
multiValued=quot;truequot; />
<field name=quot;useridquot; type=quot;integerquot;
indexed=quot;falsequot; stored=quot;truequot; />
Index analyzer
Debugging
&debugQuery=on
Further readings
•   http://lucene.apache.org/solr/
•   http://wiki.apache.org/solr
•   http://www.xml.com/pub/a/2006/08/09/
    solr-indexing-xml-with-lucene-
    andrest.html
•   http://lucene.apache.org/java/docs/
    scoring.html
Q&A

Contenu connexe

Tendances

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

Tendances (20)

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 

En vedette

Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional Programming
Garth Gilmour
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaio
shinolajla
 

En vedette (20)

Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
 
Solr: Search at the Speed of Light
Solr: Search at the Speed of LightSolr: Search at the Speed of Light
Solr: Search at the Speed of Light
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
 
CommunitySherpa Field Presentation
CommunitySherpa Field PresentationCommunitySherpa Field Presentation
CommunitySherpa Field Presentation
 
Spring 3.1 and MVC Testing Support - 4Developers
Spring 3.1 and MVC Testing Support - 4DevelopersSpring 3.1 and MVC Testing Support - 4Developers
Spring 3.1 and MVC Testing Support - 4Developers
 
Chicago Hadoop Users Group: Enterprise Data Workflows
Chicago Hadoop Users Group: Enterprise Data WorkflowsChicago Hadoop Users Group: Enterprise Data Workflows
Chicago Hadoop Users Group: Enterprise Data Workflows
 
Reactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons LearnedReactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons Learned
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional Programming
 
Actor Based Asyncronous IO in Akka
Actor Based Asyncronous IO in AkkaActor Based Asyncronous IO in Akka
Actor Based Asyncronous IO in Akka
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaio
 
Big Data - How important it is
Big Data - How important it isBig Data - How important it is
Big Data - How important it is
 
White Paper Presentation (2)
White Paper Presentation (2)White Paper Presentation (2)
White Paper Presentation (2)
 

Similaire à Using Apache Solr

Rails 3: Dashing to the Finish
Rails 3: Dashing to the FinishRails 3: Dashing to the Finish
Rails 3: Dashing to the Finish
Yehuda Katz
 
Building Better Applications with Data::Manager
Building Better Applications with Data::ManagerBuilding Better Applications with Data::Manager
Building Better Applications with Data::Manager
Jay Shirley
 

Similaire à Using Apache Solr (20)

Os Pruett
Os PruettOs Pruett
Os Pruett
 
ApacheCon 2005
ApacheCon 2005ApacheCon 2005
ApacheCon 2005
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solr
 
QA for PHP projects
QA for PHP projectsQA for PHP projects
QA for PHP projects
 
DataMapper
DataMapperDataMapper
DataMapper
 
Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)
 
Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007
 
Java Web Programming [5/9] : EL, JSTL and Custom Tags
Java Web Programming [5/9] : EL, JSTL and Custom TagsJava Web Programming [5/9] : EL, JSTL and Custom Tags
Java Web Programming [5/9] : EL, JSTL and Custom Tags
 
前端概述
前端概述前端概述
前端概述
 
Rails 3: Dashing to the Finish
Rails 3: Dashing to the FinishRails 3: Dashing to the Finish
Rails 3: Dashing to the Finish
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
 
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
 
Rails, Postgres, Angular, and Bootstrap: The Power Stack
Rails, Postgres, Angular, and Bootstrap: The Power StackRails, Postgres, Angular, and Bootstrap: The Power Stack
Rails, Postgres, Angular, and Bootstrap: The Power Stack
 
Building Better Applications with Data::Manager
Building Better Applications with Data::ManagerBuilding Better Applications with Data::Manager
Building Better Applications with Data::Manager
 
Unit testing zend framework apps
Unit testing zend framework appsUnit testing zend framework apps
Unit testing zend framework apps
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12
 
2009 Barcamp Nashville Web Security 101
2009 Barcamp Nashville   Web Security 1012009 Barcamp Nashville   Web Security 101
2009 Barcamp Nashville Web Security 101
 
Unit testing with zend framework tek11
Unit testing with zend framework tek11Unit testing with zend framework tek11
Unit testing with zend framework tek11
 

Plus de pittaya (6)

Firefox OS
Firefox OSFirefox OS
Firefox OS
 
Scaling Wordpress
Scaling WordpressScaling Wordpress
Scaling Wordpress
 
Cooking for guys
Cooking for guysCooking for guys
Cooking for guys
 
Reading xkcd
Reading xkcdReading xkcd
Reading xkcd
 
Fwdder : share your forward mails
Fwdder : share your forward mailsFwdder : share your forward mails
Fwdder : share your forward mails
 
Cross Processing
Cross ProcessingCross Processing
Cross Processing
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Using Apache Solr