SlideShare a Scribd company logo
1 of 32
Download to read offline
1



Rapid Prototyping
      with
       Solr
                 presented by
Erik Hatcher, Technical Staff, Lucid Imagination




                                                       1
Abstract

   Got data?  Let's make it searchable!  This interactive
   presentation will demonstrate getting documents into
   Solr quickly, provide some tips in adjusting Solr's
   schema to match your needs better, and finally showcase
   your data in a flexible search user interface.  We'll
   see how to rapidly leverage faceting, highlighting,
   spell checking, and debugging.  Even after all that,
   there will be enough time left to outline the next
   steps in developing your search application and taking
   it to production.




                                                             2

                                                                 2
Why prototype?
   Demonstrate Solr can handle your needs
   Buy-in
   "Prototyping: faster than teaching a 9-year-old
    Ju-Jitsu"
   It's quick, easy, AND FUN!
   The User Interface is the app



                                                      3

                                                          3
Got Data?
   Files?
       Solr Cell

   Databases?
       Data Import Handler

   Feeds (Atom/RSS/XML)?
       Data Import Handler

   3rd party repositories?
       Lucene Connectors Framework
       custom indexing scripts using a Solr API

   CSV!!!
     CSV upload handler
                                                   4

                                                       4
UI
   Solritas (VelocityResponseWriter)
   http://localhost:8983/solr/itas
   Documentation:
       http://wiki.apache.org/solr/VelocityResponseWriter




                                                             5

                                                                 5
LucidWorks for Solr
   great starting point
   built-in and pre-configured:
       Clustering
         Carrot2

       Search UI
         Solritas (VelocityResponseWriter)

         Server includes root context, handy for serving static files

       Better stemming
         KStem

       Tomcat, optionally


                                                                        6

                                                                            6
~/LucidWorks: start.sh
2010-05-21 08:53:49.595::INFO: Logging to STDERR via
org.mortbay.log.StdErrLog
2010-05-21 08:53:49.764::INFO: jetty-6.1.3
May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: using system property solr.solr.home: /Users/erikhatcher/
LucidWorks/lucidworks/jetty/../solr
May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader <init>
INFO: Solr home set to '/Users/erikhatcher/LucidWorks/lucidworks/
jetty/../solr/'
.
.
.
May 21, 2010 8:53:51 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher Searcher@21fb3211 main


                                                                         7

                                                                             7
Your Data
First Name,Last Name,Company,Title,Work Country
Erik,Hatcher,Lucid Imagination,"Member, Technical Staff", USA
.
.
.




                                                                8

                                                                    8
First try

curl "http://localhost:8983/solr/update/csv?stream.file=EuroCon2010.csv"

undefined field First Name




                                                                       9

                                                                           9
Schema: dynamic field flexibility


<dynamicField name="*_s"   type="string"   indexed="true"   stored="true"/>
<dynamicField name="*_t"   type="text"     indexed="true"   stored="true"/>




                                                                          10

                                                                               10
Mapping to dynamic fields


curl "http://localhost:8983/solr/update/csv?
stream.file=EuroCon2010.csv&fieldnames=first_s,last_s,company_s,title_t,
country_s&header=true"

Document [null] missing required field: id




                                                                       11

                                                                            11
Identifying uniqueKey, or not
curl "http://localhost:8983/solr/update/csv?

stream.file=EuroCon2010.csv&fieldnames=first_s,   id,company_s,title_t,co
untry_s&header=true"

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">40</int></lst>
</response>




                                                                        12

                                                                             12
http://localhost:8983/solr/itas




                                  13

                                       13
Schema tinkering
   Removed all example field definitions
   Uncomment and adjust catch-all dynamic field:
       <dynamicField name="*" type="string" multiValued="false"/>

   Ensure uniqueKey is appropriate
       Unusual in this data example:
       <!-- <uniqueKey>id</uniqueKey> -->

   Make every document/field fully searchable!
       <copyField source="*" dest="text"/>

   Then restart!
                                                                     14

                                                                          14
Issues with no uniqueKey
   Remove from solrconfig.xml references to:
       clustering component
       query elevation component
       data import handler

   Then restart!




                                               15

                                                    15
Reindexing with cleaner field names
# Delete all documents
curl "http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery
%3E*:*%3C/query%3E%3C/delete%3E&commit=true"

# Index your data
curl "http://localhost:8983/solr/update/csv?
commit=true&stream.file=EuroCon2010.csv&fieldnames=   first,last,
company,title,country&header=true"




                                                                       16

                                                                            16
Faceting
 http://localhost:8983/solr/itas?facet.field=country




                                                      17

                                                           17
country normalization
http://localhost:8983/solr/update/csv?
commit=true&stream.file=EuroCon2010.csv&fieldnames=first,last,company,ti
             f.country.map=Great
tle,country&header=true&

+Britain:United+Kingdom




                                                                       18

                                                                            18
UI treatments
   Customize request handler mappings
   Edit templates
       hit display
       header/footer
       style




                                         19

                                              19
Customize request handlers
  <requestHandler name="/browse" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="wt">velocity</str>
      <str name="v.template">browse</str>
      <str name="v.layout">layout</str>

     <str name="rows">10</str>
     <str name="fl">*,score</str>

     <str   name="defType">lucene</str>
     <str   name="q">*:*</str>
     <str   name="debugQuery">true</str>
     <str   name="hl">on</str>
     <str   name="hl.fl">title</str>
     <str   name="hl.fragsize">0</str>
     <str   name="hl.alternateField">title</str>

      <str name="facet">on</str>
      <str name="facet.mincount">1</str>
      <str name="facet.missing">true</str>
    </lst>
    <lst name="appends">
      <str name="facet.field">country</str>
    </lst>
  </requestHandler>                                            20

                                                                    20
hit.vm

<div class="result-document">
  <p>$doc.getFieldValue('first') $doc.getFieldValue('last')</p>
  <p>$!doc.getFieldValue('title'), $!doc.getFieldValue('company')</p>
  <p>$!doc.getFieldValue('country')</p>
</div>




                                                                        21

                                                                             21
Voila!




         22

              22
Adding bells and whistles
   JQuery
       <script type="text/javascript" src="/solr/admin/
        jquery-1.2.3.min.js"></script>

   Let's add a tree map
       <script type="text/javascript" src="/scripts/treemap.js"></script>
       http://plugins.jquery.com/project/Treemap




                                                                             23

                                                                                  23
tree map table
<script type="text/javascript">
  function onLoad() {
     jQuery("#treemap-country").treemap(640,480, {});
  }
</script>
----------------------------
<body onload="onLoad();">
----------------------------
<table id="treemap-country">
#foreach($facet in $response.getFacetField('country').values)
  <tr>
     <td>#if($facet.name)
$esc.html($facet.name)#else&lt;Unspecified&gt;#end</td>
     <td>$facet.count</td>
     <td>#if($facet.name)$esc.html($facet.name)#{else}Unspecified#end</
td>
  </tr>
#end
</table>

                                                                          24

                                                                               24
Tree map




           25

                25
Ajax fun: giveaways
   Add "static" templated page
   JQuery Ajax request
   snippet templated output




                                  26

                                       26
"static" Solritas page
 solrconfig.xml
 <requestHandler name="/giveaways" class="solr.DumpRequestHandler">
   <lst name="defaults">
     <str name="wt">velocity</str>
     <str name="v.template">giveaways</str>
     <str name="v.layout">layout</str>
   </lst>
 </requestHandler>

 giveaways.vm
 <input type="button" value="Pick a Winner" onClick="javascript:$
 ('#winner').load('/solr/generate_winner?sort=random_' + new
 Date().getTime() + '+asc');">
 <h2>And the winner is...</h2>
 <center><font size="20"><div id="winner"></div></font></center>




                                                                      27

                                                                           27
fragment template
solrconfig.xml
<requestHandler name="/generate_winner" class="solr.SearchHandler">
  <!-- sort=random_... required -->
  <lst name="defaults">
    <str name="wt">velocity</str>
    <str name="v.template">winner</str>

    <str name="rows">1</str>
    <str name="fl">first,last</str>

    <str name="defType">lucene</str>
    <str name="q">*:* -company:"Lucid Imagination" -company:"Stone Circle
Productions"</str>
      </lst>
   </requestHandler>


winner.vm
#set($winner=$response.results.get(0))
$winner.getFieldValue('first') $winner.getFieldValue('last')

                                                                       28

                                                                            28
And the winner is...




                       29

                            29
Prototyping tools
   CSV update handler
   Schema Browser
   Solritas
   Solr Explorer
       https://issues.apache.org/jira/browse/SOLR-1163

   Solr Flare
       http://wiki.apache.org/solr/Flare



                                                          30

                                                               30
Refine, iterate, integrate
   What's next?
       script full & delta indexing processes
       adjust schema
         define fields, field types, analysis

       tweak configuration
         caches, indexing parameters

       deploy to staging/production environments




                                                    31

                                                         31
Test
   Performance
   Scalability
   Relevance
   Automate all of the above, start baselines and avoid
    regressions




                                                           32

                                                                32

More Related Content

What's hot

Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecordscalaconfjp
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database JonesJohn David Duncan
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solradunne
 
Developing for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLJohn David Duncan
 
ERRest - The Next Steps
ERRest - The Next StepsERRest - The Next Steps
ERRest - The Next StepsWO Community
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code LabColin Su
 
An introduction to SQLAlchemy
An introduction to SQLAlchemyAn introduction to SQLAlchemy
An introduction to SQLAlchemymengukagan
 
Web2py tutorial to create db driven application.
Web2py tutorial to create db driven application.Web2py tutorial to create db driven application.
Web2py tutorial to create db driven application.fRui Apps
 
PofEAA and SQLAlchemy
PofEAA and SQLAlchemyPofEAA and SQLAlchemy
PofEAA and SQLAlchemyInada Naoki
 
ORM2Pwn: Exploiting injections in Hibernate ORM
ORM2Pwn: Exploiting injections in Hibernate ORMORM2Pwn: Exploiting injections in Hibernate ORM
ORM2Pwn: Exploiting injections in Hibernate ORMMikhail Egorov
 
td_mxc_rubyrails_shin
td_mxc_rubyrails_shintd_mxc_rubyrails_shin
td_mxc_rubyrails_shintutorialsruby
 
Let ColdFusion ORM do the work for you!
Let ColdFusion ORM do the work for you!Let ColdFusion ORM do the work for you!
Let ColdFusion ORM do the work for you!Masha Edelen
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
ShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)SqlShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)SqlChema Alonso
 
New methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applicationsNew methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applicationsMikhail Egorov
 

What's hot (20)

Scala ActiveRecord
Scala ActiveRecordScala ActiveRecord
Scala ActiveRecord
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database Jones
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solr
 
ORM Injection
ORM InjectionORM Injection
ORM Injection
 
ERRest
ERRestERRest
ERRest
 
Developing for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQL
 
ERRest - The Next Steps
ERRest - The Next StepsERRest - The Next Steps
ERRest - The Next Steps
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code Lab
 
An introduction to SQLAlchemy
An introduction to SQLAlchemyAn introduction to SQLAlchemy
An introduction to SQLAlchemy
 
Web2py tutorial to create db driven application.
Web2py tutorial to create db driven application.Web2py tutorial to create db driven application.
Web2py tutorial to create db driven application.
 
PofEAA and SQLAlchemy
PofEAA and SQLAlchemyPofEAA and SQLAlchemy
PofEAA and SQLAlchemy
 
Alfredo-PUMEX
Alfredo-PUMEXAlfredo-PUMEX
Alfredo-PUMEX
 
ORM2Pwn: Exploiting injections in Hibernate ORM
ORM2Pwn: Exploiting injections in Hibernate ORMORM2Pwn: Exploiting injections in Hibernate ORM
ORM2Pwn: Exploiting injections in Hibernate ORM
 
RicoLiveGrid
RicoLiveGridRicoLiveGrid
RicoLiveGrid
 
td_mxc_rubyrails_shin
td_mxc_rubyrails_shintd_mxc_rubyrails_shin
td_mxc_rubyrails_shin
 
Let ColdFusion ORM do the work for you!
Let ColdFusion ORM do the work for you!Let ColdFusion ORM do the work for you!
Let ColdFusion ORM do the work for you!
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
XQuery Design Patterns
XQuery Design PatternsXQuery Design Patterns
XQuery Design Patterns
 
ShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)SqlShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)Sql
 
New methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applicationsNew methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applications
 

Similar to Rapidly Prototype Search with Solr

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVCAlive Kuo
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 
The Magic Revealed: Four Real-World Examples of Using the Client Object Model...
The Magic Revealed: Four Real-World Examples of Using the Client Object Model...The Magic Revealed: Four Real-World Examples of Using the Client Object Model...
The Magic Revealed: Four Real-World Examples of Using the Client Object Model...SPTechCon
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdfsash236
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWAREFIWARE
 
General Principles of Web Security
General Principles of Web SecurityGeneral Principles of Web Security
General Principles of Web Securityjemond
 
XPages Blast - ILUG 2010
XPages Blast - ILUG 2010XPages Blast - ILUG 2010
XPages Blast - ILUG 2010Tim Clark
 
Rails and alternative ORMs
Rails and alternative ORMsRails and alternative ORMs
Rails and alternative ORMsJonathan Dahl
 
09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)
09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)
09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)Igor Bronovskyy
 
All Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesAll Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesDave Stokes
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)Pat Patterson
 
Building Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEisBuilding Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEisFIWARE
 
Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringCary Millsap
 
The top 10 security issues in web applications
The top 10 security issues in web applicationsThe top 10 security issues in web applications
The top 10 security issues in web applicationsDevnology
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!Dave Stokes
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Luca Lusso
 

Similar to Rapidly Prototype Search with Solr (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
The Magic Revealed: Four Real-World Examples of Using the Client Object Model...
The Magic Revealed: Four Real-World Examples of Using the Client Object Model...The Magic Revealed: Four Real-World Examples of Using the Client Object Model...
The Magic Revealed: Four Real-World Examples of Using the Client Object Model...
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
General Principles of Web Security
General Principles of Web SecurityGeneral Principles of Web Security
General Principles of Web Security
 
XPages Blast - ILUG 2010
XPages Blast - ILUG 2010XPages Blast - ILUG 2010
XPages Blast - ILUG 2010
 
Rails and alternative ORMs
Rails and alternative ORMsRails and alternative ORMs
Rails and alternative ORMs
 
09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)
09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)
09 - express nodes on the right angle - vitaliy basyuk - it event 2013 (5)
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
All Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesAll Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for Newbies
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
 
Building Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEisBuilding Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEis
 
Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and Monitoring
 
The top 10 security issues in web applications
The top 10 security issues in web applicationsThe top 10 security issues in web applications
The top 10 security issues in web applications
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!
 

More from Erik Hatcher

Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 

More from Erik Hatcher (20)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Rapidly Prototype Search with Solr

  • 1. 1 Rapid Prototyping with Solr presented by Erik Hatcher, Technical Staff, Lucid Imagination 1
  • 2. Abstract Got data?  Let's make it searchable!  This interactive presentation will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally showcase your data in a flexible search user interface.  We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging.  Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production. 2 2
  • 3. Why prototype?  Demonstrate Solr can handle your needs  Buy-in  "Prototyping: faster than teaching a 9-year-old Ju-Jitsu"  It's quick, easy, AND FUN!  The User Interface is the app 3 3
  • 4. Got Data?  Files?  Solr Cell  Databases?  Data Import Handler  Feeds (Atom/RSS/XML)?  Data Import Handler  3rd party repositories?  Lucene Connectors Framework  custom indexing scripts using a Solr API  CSV!!!  CSV upload handler 4 4
  • 5. UI  Solritas (VelocityResponseWriter)  http://localhost:8983/solr/itas  Documentation:  http://wiki.apache.org/solr/VelocityResponseWriter 5 5
  • 6. LucidWorks for Solr  great starting point  built-in and pre-configured:  Clustering  Carrot2  Search UI  Solritas (VelocityResponseWriter)  Server includes root context, handy for serving static files  Better stemming  KStem  Tomcat, optionally 6 6
  • 7. ~/LucidWorks: start.sh 2010-05-21 08:53:49.595::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2010-05-21 08:53:49.764::INFO: jetty-6.1.3 May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: using system property solr.solr.home: /Users/erikhatcher/ LucidWorks/lucidworks/jetty/../solr May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader <init> INFO: Solr home set to '/Users/erikhatcher/LucidWorks/lucidworks/ jetty/../solr/' . . . May 21, 2010 8:53:51 AM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher Searcher@21fb3211 main 7 7
  • 8. Your Data First Name,Last Name,Company,Title,Work Country Erik,Hatcher,Lucid Imagination,"Member, Technical Staff", USA . . . 8 8
  • 10. Schema: dynamic field flexibility <dynamicField name="*_s" type="string" indexed="true" stored="true"/> <dynamicField name="*_t" type="text" indexed="true" stored="true"/> 10 10
  • 11. Mapping to dynamic fields curl "http://localhost:8983/solr/update/csv? stream.file=EuroCon2010.csv&fieldnames=first_s,last_s,company_s,title_t, country_s&header=true" Document [null] missing required field: id 11 11
  • 12. Identifying uniqueKey, or not curl "http://localhost:8983/solr/update/csv? stream.file=EuroCon2010.csv&fieldnames=first_s, id,company_s,title_t,co untry_s&header=true" <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">40</int></lst> </response> 12 12
  • 14. Schema tinkering  Removed all example field definitions  Uncomment and adjust catch-all dynamic field:  <dynamicField name="*" type="string" multiValued="false"/>  Ensure uniqueKey is appropriate  Unusual in this data example:  <!-- <uniqueKey>id</uniqueKey> -->  Make every document/field fully searchable!  <copyField source="*" dest="text"/>  Then restart! 14 14
  • 15. Issues with no uniqueKey  Remove from solrconfig.xml references to:  clustering component  query elevation component  data import handler  Then restart! 15 15
  • 16. Reindexing with cleaner field names # Delete all documents curl "http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery %3E*:*%3C/query%3E%3C/delete%3E&commit=true" # Index your data curl "http://localhost:8983/solr/update/csv? commit=true&stream.file=EuroCon2010.csv&fieldnames= first,last, company,title,country&header=true" 16 16
  • 19. UI treatments  Customize request handler mappings  Edit templates  hit display  header/footer  style 19 19
  • 20. Customize request handlers <requestHandler name="/browse" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">browse</str> <str name="v.layout">layout</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="defType">lucene</str> <str name="q">*:*</str> <str name="debugQuery">true</str> <str name="hl">on</str> <str name="hl.fl">title</str> <str name="hl.fragsize">0</str> <str name="hl.alternateField">title</str> <str name="facet">on</str> <str name="facet.mincount">1</str> <str name="facet.missing">true</str> </lst> <lst name="appends"> <str name="facet.field">country</str> </lst> </requestHandler> 20 20
  • 21. hit.vm <div class="result-document"> <p>$doc.getFieldValue('first') $doc.getFieldValue('last')</p> <p>$!doc.getFieldValue('title'), $!doc.getFieldValue('company')</p> <p>$!doc.getFieldValue('country')</p> </div> 21 21
  • 22. Voila! 22 22
  • 23. Adding bells and whistles  JQuery  <script type="text/javascript" src="/solr/admin/ jquery-1.2.3.min.js"></script>  Let's add a tree map  <script type="text/javascript" src="/scripts/treemap.js"></script>  http://plugins.jquery.com/project/Treemap 23 23
  • 24. tree map table <script type="text/javascript"> function onLoad() { jQuery("#treemap-country").treemap(640,480, {}); } </script> ---------------------------- <body onload="onLoad();"> ---------------------------- <table id="treemap-country"> #foreach($facet in $response.getFacetField('country').values) <tr> <td>#if($facet.name) $esc.html($facet.name)#else&lt;Unspecified&gt;#end</td> <td>$facet.count</td> <td>#if($facet.name)$esc.html($facet.name)#{else}Unspecified#end</ td> </tr> #end </table> 24 24
  • 25. Tree map 25 25
  • 26. Ajax fun: giveaways  Add "static" templated page  JQuery Ajax request  snippet templated output 26 26
  • 27. "static" Solritas page solrconfig.xml <requestHandler name="/giveaways" class="solr.DumpRequestHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">giveaways</str> <str name="v.layout">layout</str> </lst> </requestHandler> giveaways.vm <input type="button" value="Pick a Winner" onClick="javascript:$ ('#winner').load('/solr/generate_winner?sort=random_' + new Date().getTime() + '+asc');"> <h2>And the winner is...</h2> <center><font size="20"><div id="winner"></div></font></center> 27 27
  • 28. fragment template solrconfig.xml <requestHandler name="/generate_winner" class="solr.SearchHandler"> <!-- sort=random_... required --> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">winner</str> <str name="rows">1</str> <str name="fl">first,last</str> <str name="defType">lucene</str> <str name="q">*:* -company:"Lucid Imagination" -company:"Stone Circle Productions"</str> </lst> </requestHandler> winner.vm #set($winner=$response.results.get(0)) $winner.getFieldValue('first') $winner.getFieldValue('last') 28 28
  • 29. And the winner is... 29 29
  • 30. Prototyping tools  CSV update handler  Schema Browser  Solritas  Solr Explorer  https://issues.apache.org/jira/browse/SOLR-1163  Solr Flare  http://wiki.apache.org/solr/Flare 30 30
  • 31. Refine, iterate, integrate  What's next?  script full & delta indexing processes  adjust schema  define fields, field types, analysis  tweak configuration  caches, indexing parameters  deploy to staging/production environments 31 31
  • 32. Test  Performance  Scalability  Relevance  Automate all of the above, start baselines and avoid regressions 32 32