2. Presentation by Chris Caple
drupal.org username: reallyordinary
http://drupal.org/user/791914
Presented at May 30, 2011 Toronto Drupal usergroup meetup
4. • verypopular, extremely fast Java-based open source enterprise
search platform from the Apache Lucene project
• runsas a standalone full-text search server within a servlet
container such as Tomcat
• not an acronym - doesn’t stand for anything
• powers the search and navigation features on many of the
world’s largest sites
8. • so the point is - it’s great for large, high traffic sites
• it’s heavy duty, internet-scale stuff
• butit’ll also serve you well on smaller scale but ambitious
Drupal sites
10. • initially
developed by CNET Networks as in-house search
platform in 2004 called “Solar”
• CNET granted existing codebase to Apache Software
Foundation in 2006 - name changed to “Solr”
• in January 2007 Solr became a Lucene subproject
• in March 2010, Solr and Lucene-java merged
12. The Apache Lucene project develops open source search
software, including:
• Apache Lucene Core (formerly Lucene Java) - provides Java-
based indexing and search, plus spellchecking, hit highlighting,
and advanced analysis/tokenization capabilities
• Apache Solr
• Apache PyLucene - a Python port of Lucene Core
• Apache Open Relevance Project - collects and distributes
free materials for relevance testing & performance
14. • default Drupal search is decent for smaller sites
• doesn’t
deal well with large amounts of content (say 10k+
nodes) - doesn’t scale; gets bogged down
• limited operators
• integrated - it runs and searches directly on the same database
• SQL was not designed as a searching language
• “Relational Database Management Systems (RDBMS) are
physically incapable of handling search well.”
15. • thereare several modules that enhance core search by
providing stuff like faceted search and improved stemming
• butthere’s no getting around its performance limitations and
lack of scalability
17. 1. Index and make searchable a really large amount of content -
from 10k+ nodes up into the millions
2. Provide faceted search-based navigation so users can find
content faster & more intuitively, drilling down into content by
date, author, tags, content type, & other attributes
3. Provide search autocomplete, spelling suggestions, and
content recommendations
18. 4. Provide a faster search experience than the default Drupal
search is able to
5. Give site visitors access to simple, easy to use advanced
search features without confronting them with the “advanced
search” page
6. Provide users with the ability to do location-based search - to
filter results by geographic location
7. Expose all attributes of nodes to search
19. 8. Place search functions on a completely separate server
Web server +
PHP
GET to
SQL
search
POST to
index
database Solr server
Diagram adapted from Robert Douglass’ 2008 slide set - see Resources
23. • facetedsearch is dynamic clustering of items or search results
into categories that let users drill down into search results (or
even skip searching entirely) by any value in that field
• eachfacet also shows the number of hits within the search
that match that category
• faceted search is also called faceted browsing, faceted
navigation, guided navigation and sometimes parametric search
24. FACETED SEARCH EXAMPLE
diagram source: Lucid Imagination - http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr
28. You’ll need:
• Java 5 or higher
• PHP 5.2 for Drupal 6, but PHP 5.1.4 will work if you have
PECL JSON extension or Zend Framework JSON classes
29. 1. Go to the Apache Solr Search Integration project page
http://drupal.org/project/apachesolr
2. Install the module
3. Grab the Solr PHP library via svn OR get the bundled Acquia
Search download
4. Enable the module
5. Download Solr 1.4 and unpack outside of Drupal directory
30. 6. Rename the existing files apache-solr-nightly/example/solr/
conf/schema.xml and solrconfig.xml to *.bak to get them out
of the way
7. Copy schema.xml and solrconfig.xml that come with Apache
Solr Drupal module to take their place
8. Start Solr by opening a shell (Putty, Mac Terminal), going to
the apache-solr-nightly/example folder, and executing
command java -jar start.jar
31. 9. Test that Solr server is available at http://localhost:8983/solr/
admin
10. Make sure both the main Apache Solr Framework and
Apache Solr Search modules are enabled - if the Solr Search
module isn’t enabled, no indexing will occur
11. Run cron until your content is indexed
12. Enable blocks for facets
33. • Apache Solr module depends on Drupal’s core Search
module
• when Solr is enabled, the Search module will also be enabled
• as soon as the core Search module is enabled it starts to
index all your nodes
• this takes time to run and fills up the database
(search_dataset, search_index... tables)
34. • if you’re installing Solr Search, you don’t need Drupal’s core
search form
• you replace it with the Solr one by going to the Solr module
settings and clicking “Make Apache Solr Search the default”
• this disables the core Search module’s form - but not the
indexing
35. • to disable the indexing - and save some CPU cycles and
database space - go to your site’s search settings at admin/
settings/search and set the “number of items to index per cron
run” to 0
Thanks to DrupalCoder.com for this tip - http://www.drupalcoder.com/blog/performance-tip-disable-drupals-core-search-indexer-when-using-apache-solr
37. • Solr Search indexing is triggered by cron runs
• default Drupal cron job triggers all cron tasks at the same time
• this can be a serious drag on performance and can cause cron
runs to fail if one or more tasks doesn’t finish in the allotted
cron period
• to get around this, use...
38. • Elysia Cron - http://drupal.org/project/elysia_cron
• expands cron capabilities - gives you crontab-like scheduling so
you can run different tasks at different times and frequencies
• so for example - set Solr Search to index 1000 nodes every
15 minutes, while other cron tasks are set to run once every
hour
39. • to get fastest indexing on your server, experiment with
different numbers of items to index per cron run and different
cron run times until you find the max your server is capable of
handling
• ex: try indexing 1000 items per cron run and set the cron to
run every 5 minutes
• if you don’t get any errors, you’re good
41. • Solr Search integrates with Drush
• you can call Solr tasks from the Drush command line
• commands include...
42. • solr-delete-index
Deletes the contents of the index. Can take content types as
parameters
• solr-index
Send to Solr content marked for (re)indexing. Same as running
cron once but without the other overhead
• solr-reindex
Marks content for reindexing. Can take content types as
parameters
• solr-search
Search the site for keywords using Apache Solr
44. • Acquia has a hosted SaaS version of Solr that they call Acquia
Search
• it’s plug and play and available for Drupal 6 and 7
• gives you all the power of Solr without having to install any
software (beyond the Solr Drupal modules) or manage any
servers
• really easy to set up, really fast and robust, kind of pricey
• http://acquia.com/products-services/acquia-search
45. • you can get a 30 day free trial of Acquia Search at http://
acquia.com/trial
• easiest way to test drive Solr
47. • this is where it starts to get even more interesting
• Views 3 (still in alpha for Drupal 6 but in beta for Drupal 7)
allows you to make custom searches against the Solr index the
same way you currently make views against the MySQL
database
• ex: build a Solr search that just includes videos and MP3s and
render the results as a playlist
• ex: a Solr search that’s limited to the current user’s images,
displayed as a slideshow
48. • upshot: you can bypass the Drupal database and build your
content straight off the Solr index
• no database queries
• no complex views queries with tons of joins
• no node_load() calls for displaying the results
50. • best place to start learning is on the Solr Search docs page on
drupal.org at -
http://drupal.org/node/343467
• Robert Douglass did a great Solr presentation in 2008 - slides
are online at http://www.slideshare.net/robertDouglass/
apachesolr-presentation-from-do-it-with-drupal-presentation
• the book “Solr 1.4 Enterprise Search Server” is apparently
good - review here:
http://www.drupalcoder.com/blog/book-review-from-a-drupal-
point-of-view-solr-14-enterprise-search-server
51. • great article by Robert Douglass - “Views 3 + Apache Solr +
Acquia Drupal = The Future of Search”
http://acquia.com/blog/views-3-apache-solr-acquia-drupal-
future-search
• article - “Three things we learned from indexing a Drupal site
with millions of nodes in Apache Solr” -
http://www.drupalcoder.com/blog/three-things-we-learned-
from-indexing-a-drupal-site-with-millions-of-nodes-in-apache-
solr
• article - “Geospatial Apache Solr searching in Drupal 6 by
upgrading Solr to 3.1” -
http://thedrupalblog.com/geospatial-apache-solr-searching-
drupal-6-upgrading-solr-31
52. • how to install Solr on Mac OS X Snow Leopard -
http://www.drupalcoder.com/blog/installing-apache-solr-in-
tomcat-for-drupal-on-snow-leopard
• setting up Drupal 6 with Apache Solr on Tomcat 6 and
Ubuntu 9.10 -
http://www.nickveenhof.be/blog/setup-drupal-6-apache-solr-
tomcat-6-and-ubuntu-910-karmic-koala
• Configuring Apache Solr Multi-core with Drupal and Tomcat
on Ubuntu 9.10 -
http://drupalconnect.com/blog/steve/configuring-apache-solr-
multi-core-drupal-and-tomcat-ubuntu-910
53. • Jetty powered multicore Apache Solr and Drupal in Ubuntu
10.04 -
http://vladgh.com/blog/jetty-powered-multicore-apache-solr-
and-drupal-ubuntu-1004
• Solr tutorials on the official Apache Solr site -
http://lucene.apache.org/solr/tutorial.html
• the official Apache Solr wiki -
http://wiki.apache.org/solr/FrontPage
• DrupalCamp Montreal 2009 video presentation on Solr -
http://yadadrop.com/drupal-video/drupal-apache-solr-setup-
configuration-extensions-hooks