This document discusses searching with Apache Solr, including what it is, when to use it, and how to implement it. Solr is a powerful and feature-rich search platform that can be used when basic search systems are no longer sufficient, as it supports advanced search capabilities and large datasets. The document outlines how to set up Solr, including choosing a container, configuring the Solr configuration files, defining fields, indexing data, and implementing different types of searches.
44. Default Search Consider useful Analyzers Potentially match on more fields Enrich or refine results with personal data More in depth results 43
45. Advanced Search Offer user control Consider search storage Data size vs Additional queries To return more / less results “Search entire document” “Filter by Colour” 44
48. We’re Hiring NL Vlissingen Utrecht UK London Sheffield Liverpool Speak to me at the end… pmatthews@ibuildings.com 47
49. Thank you Resources Links: http://www.delicious.com/paulm86/solr This talk: http://joind.in/3221 Contact Me: @paulmatthews86 http://about.me/paul.matthews 48
Notes de l'éditeur
Twitter: @paulmatthews86Personal Blog: 86pTechnicalNon-techSoftware Engineer at IbuildingsTechportalMongoDBSolr (May 2011)Solr ProjectsTravel CompanyMedia Company
This talk What Is Solr? When is right timeWhySearch ?How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
What is search? Text based navigation To content / products Customers describing something Capture queries SortingOrganising content Examples Quick search Category listing Advanced search
The Power of SearchFrom LIKE to SOLR
First up DB Like
Pros: Little effort to use, or understand.Cons: Not good User data: Not greater than 1 word
Full Text Lots of people use
Pros: Some power Convenient In DBCons: Feature poor Slow
Basic / Easy to use proper Search
Pros: Can be very fast Often simple to setupCons: Feature poor Less accurate More application code?Google Custom Search Engine Crawls siteXapian Simple search solution
Pros:Poweful Feature rich Relatively Simple Lots of pluginsCons: Could be overkill Different language
On Java stand alone Requires servlet container Tomcat Jetty stand alone Lucene Search library Offers Full Text High performance Java - other implementations available
This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
Who? Traffic Not for Facebook Works for average Features It has many No need to use themWhen? Designed from beginning Easily used to enrich site navigation Implementation as post-live project Implementation into existing open source softwareDrupalMagento
Spending time / effort / money on the search box Fixing bugs Endless tuning Adding functionalityCustomers complaining Not finding content High Bounce rates Site is slow Not finding the *right* content
Large data sets 10000 records Speed Like queriesMySQL full-text Site performanceSlowlog? Results Inaccurate MissingGraceful degradation Important for quality Low cost
This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
Is Solr right for me?Before Answering:Terms:Find materialsCommunicate to peopleFunctionality:Most Use – Know FunctionalityRe-invent – Wheel
Main 2Database tables Data Import Handler Easy – just configAPI Anything publish API Hooked into contentCSV & XMLSolr Cell - Rich Docs PDF MS Office
Parse: text generate index Removes junk Improve matchesHalf now, half later: Reduce time searching
Analyzer Groups actions of Parsing Important to do same / similar in searching
TokenizerStrings to tokensExample ones:Whitespace – splits on whitespaceKeyword – strips special charsStandard – General purpose, adds context
Transforms tokensLower case.Stop – filters out stop words: a, if, to, andStandard – Remove dots, ‘s (Context only)Synonym.
Hit Highlighting* Remember to set the delimiter, not everything is a web page.
Phrase queries "search for a phrase"Wildcard queries Match with wildcards ? single * multipleFuzzy queriesLevenshtein Distance Similar to word ~Proximity queries Words close together "two words"~12Range queries Between two values started:[20110101 TO 20120101] Inclusivename:{Paul TO Jeff} exclusive
Fields Single field Target search Multiple field Build Queries
Faceted Set Counts Filter data Multiple classifications
Ordered results based on best matchOr order by any field
Simultaneous update and search
This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
Blog post – to explainsConfigure ContainerSolrIndex Documents Any sourceSearch Default search Advanced search
Container setup Choose Configure Accessible
Define the data Define what is indexed Define what is storedIntegral to returning relevant search responsesRequire tweaking to get rightConscious of space size of the index - speed
Docs to Schema SpecIndexing by Database or API
Partial Words Analyzing?Search all fields Possibly the main onesResponse Less data Stay clear of additional queries consider caching
Consider using stemming analyzers to return more resultsIncrease matching columnsUse session data affect results Consider caching effectsMore response data required
Users modify their search Specify fields For enriching the results Consider bloated storage Tradeoff with Additional queries Tweak later?Advanced for returning More / Less results Search more of the document Filter on property
This talk When right time – identifying Why Search benefits Dark horse How Start journey– investigate Explain to business integrateWho is this talk aimed at? Developers Toying with search DB search Starting with search
Twitter: @paulmatthews86Personal Blog: 86pTechnicalNon-techSoftware Engineer at IbuildingsTechportalMongoDBSolr (May 2011)Solr ProjectsTravel CompanyMedia Company