What good is content if nobody can find it? Many information sites are like icebergs, with only a limited amount of content directly accessible to users and the rest, the "underwater" potion, only available through searches. This talk shows how Rails web sites can take advantage of the world-class Apache SOLR search engine to provide sophisticated and customizable search features. We'll cover how to get started with SOLR, integrating with SOLR using the Sunspot gem, indexing, hit highlighting and other topics.
4. Part 1: Why?
“Why the Lucky Stiff” at a conference in 2006…
- innovative, eccentric, suddenly retired from public life…
5. Why Me?
Well, I’ve been doing Search since 1998…
• CheckPoint – Online tax information for CPA’s
• Legislate – Everything you ever wanted to know
about legislation – more than a million docs
• National Council of Teachers of Mathematics –
Online journals
• Grab Networks – Searching news video summaries
• Pfizer – Drug documentation
7. Why We Need Search
• “A feature doesn’t exist if users can’t find it.”
‐ Jeff Atwood, co‐creator of Stack Overflow
‐ The same principle applies to content
• Content costs money
‐ If people can’t find it, your money is wasted
• The Long Tail
‐ More.content should be == More.traffic
9. Lucene is a Toolbox… SOLR is a Search Server…
• Indexing • With API’s
• Searching • Hit Highlighting
• Spell‐checking • Faceted Searches
• Hit Highlighting • Caching
• Advanced tokenization • A Web Admin Interface
12. Timeline
Lucene becomes Lucene / SOLR
top-level Apache Merger
project
Lucene started SOLR created by Lucene / SOLR
on SourceForge Yonik Seeley at Apache SOLR
leaves incubation 3.5 Released
by Doug Cutter CNET Networks
Lucene joins the SOLR donated to
Apache Jakarta Apache Lucene
product family by CNET
1997 2001 2004 2005 2006 2007 2010 2011
Sept Feb Nov
13. Search Stack for Our Rails App
Rails Web Site
SOLR Sunspot
Lucene rsolr
Dev/Test: Jetty Dev/Test: WEBrick
Production: Tomcat Production: Apache with
Phusion Passenger
14. Rsolr and Sunspot
Sunspot provides Ruby‐style
Rsolr is a SOLR client… API’s…
• By Matt Mitchell • By Andy Lindeman, Nick
• Originated Sept 2009 Zadrozny & Mat Brown
• Reached 1.0 Jan 2011 • Originated Aug 2009
• Now at Version 1.0.7 • Reached 1.0 Mar 2010
• Low‐level client • Now at Version 1.3.1
• With Rails or just Ruby
• Drop‐in ActiveRecord
support
16. A Simple Blog – With Search
source 'http://rubygems.org'
• Generate a Rails App
‐ rails new demo1 gem 'rails', '3.0.11'
gem 'sqlite3' # Default database
• Configure Gemfile gem 'will_paginate'
gem 'sunspot_rails' # Sunspot for Rails
‐ bundle install
group :development do
gem 'nifty-generators' # Better scaffolding
gem 'sunspot_solr' # Pre-built SOLR
end
group :test do
gem 'sunspot_solr' # Pre-built SOLR
gem "mocha" # Testing tool
end
Gemfile
17. A Simple Blog (2)
• Scaffolding and Database
‐ rails g nifty:layout
‐ rails g nifty:scaffold Post title:string body:text featured:boolean
‐ rake db:migrate
‐ Also, remove public/index.html and point root to posts#index
• Populate the SQLite3 Database:
‐ sqlite3 development.sqlite3
> read data.sql # Loads 100+ blog entries
> .exit
18. A Simple Blog (3)
production:
• SOLR Config File solr:
hostname: localhost
‐ rails generate sunspot_rails:install port: 8983
‐ Creates config/sunspot.yml log_level: WARNING
• Make Post class searchable development:
solr:
class Post < ActiveRecord::Base hostname: localhost
attr_accessible :title, :body, :featured port: 8982
self.per_page = 10 log_level: INFO
searchable do test:
text :title, :body solr:
integer :id hostname: localhost
boolean: featured port: 8981
end log_level: WARNING
/config/sunspot.yml
end
/app/models/post.rb
19. A Simple Blog (4)
- solr
• Start SOLR - conf
- data
‐ rake sunspot:solr:start - development
- test
‐ Creates SOLR directory tree - pids
on first start - development
- test
• Index Your Data SOLR Directory
‐ rake sunspot:solr:reindex
solr/data
solr/pids
.gitignore
21. The Search Form
<%= form_tag(searches_path, :id => 'search-form', :method => :get) do |f| %>
<span>
<%= text_field_tag :query, params[:query] %>
<%= submit_tag 'Search', :id => 'commit' %>
</span>
<% end %>
/app/views/layouts/_search.html.erb
• Just a simple view partial…
• That’s rendered by the site’s layout
22. Search Results
resources :searches, :only => [:index]
config/routes.rb
<% if @posts.present? %>
<%= will_paginate @posts, :container => false %>
<table>
<% for post in @posts %>
<tr><td"><%= raw(post.title) %></td></tr>
<tr><td><%= post.body.truncate(300) %></td></tr>
<% end %>
</table>
<%= page_entries_info @posts %>
<%= will_paginate @posts, :container => false %>
<% else %>
<p>No search results are available. Please try another search.</p>
<% end %>
app/views/searches/index.html.erb
23. Search Controller
class SearchesController < ApplicationController
def index ActiveRecord Search
if params[:query].present? Integration
search = Post.search {
fulltext params[:query]
paginate :page => params[:page].present? ? params[:page] : 1,
:per_page => 10
}
Pagination integrates
@posts = search.results
w/ will_paginate gem
else
@posts = nil
end
end
end Security: Params[:query]
must be scrubbed if it will app/controllers/searches_controller.rb
be re-displayed…
30. Search Weighting
Titles seem more important…can we weight the
title higher than the body?
search = Post.search {
fulltext params[:query]
paginate :page => params[:page].present? ? params[:page] : 1,
:per_page => 10
}
BEFORE
search = Post.search {
fulltext params[:query] do
boost_fields :title => 2.0
end
paginate :page => params[:page].present? ? params[:page] : 1,
:per_page => 10
}
AFTER
31. Is There a Better Way?
The “boost” can be done at index time…
class Post < ActiveRecord::Base
searchable do
text :title, :boost => 2.0
text :body
integer :id
boolean: featured
end
end
/app/models/post.rb
32. What About Related Data?
text :comments do
comments.map { |comment| comment.body }
end
/app/models/post.rb
• Could have used “acts_as_commentable” gem
• Your “document” is virtual
• You define it
• You can reference attributes, methods, etc.
33. Filtering
search = Post.search {
fulltext params[:query] do
boost_fields :title => 2.0
end
with(:featured, true)
}
/app/controllers/searches_controller.rb
• Text fields are searched
• The boolean “featured” attribute is filtered
• Search returns only featured posts that match
the full‐text search criteria
34. Hit Highlighting
class Post < ActiveRecord::Base @search = Post.search {
searchable do fulltext params[:query] do
... highlight :body
text :body, :stored => true end
end }
/app/controllers/searches_controller.rb
end
/app/models/post.rb
@search.hits.each do |hit| Post #1
puts "Post ##{hit.primary_key}" I use *git* on my project
hit.highlights(:body).each do |highlight| Post #2
puts " " + highlight.format { |word| "*#{word}*" } the *git* utility is cool
end
OUTPUT
end
/app/views/searches/index.html.erb
35. Authorization
Just because content is indexed doesn’t mean
a particular user is allowed to see it…
• Search‐enforced access control
‐ Access control data stored in index
‐ Can be used to filter results
(No code, but something to think about…)
40. Questions?
Get the Code: https://github.com/dkeener/rails_solr_demo
Get the Slides: http://www.keenertech.com/presentations/rails_and_solr
I’m David Keener and you can find me at:
• Blog: http://www.keenertech.com
• Facebook: http://www.facebook.com/keenertech
• Twitter: dkeener2010
• Email: dkeener@keenertech.com
david.keener@gd‐ais.com