2. Who am I?
Andrew Mleczko
Plone Integrator
Redturtle Technology (Ferrara/Italy)
andrew.mleczko@redturtle.net
2
3. so why do you need an external search engine?
3
4. why do you need an
external search engine...
• Plone's portal_catalog is slow
with big sites (large number of
indexed objects)
• You want to reduce Plone
memory consumption (by
removing heavy indexes like
SearchableText)
• You want to query Plone's
content from external
applications
• You want to use advanced
search features
4
27. Conclusions
• multiple retrivers –
multiple populators
• we have used only collective.solr
SolrConnection API
• 120.000 books indexed so far in
solr - querying and indexing is
extremly fast
22
30. tsearch2 main features
• Flexible and rich linguistic support
(dictionaries, stop words), thesaurus
• Full UTF-8 support
• Sophisticated ranking functions
with support of proximity and
structure information (rank, rank_cd)
• Rich query language with query
rewriting support
• Headline support (text fragments
with highlighted search terms)
• It is mature (5 years of development)
25
31. first steps with tsearch2
1. PostgreSQL >= 8.4
(but 8.3 will work as well)
2. COLUMN
ALTER TABLE content ADD
COLUMN search_vector tsvector;
3. INDEX
CREATE INDEX search_index ON
content USING gin(search_vector);
26
32. first steps with tsearch2
4. TRIGGER
CREATE FUNCTION fullsearch_trigger() RETURNS trigger AS $$
begin
new.search_vector :=
setweight(to_tsvector('pg_catalog.english',
coalesce(new.subject,'')), 'A') ||
setweight(to_tsvector('pg_catalog.english',
coalesce(new.title,'')), 'B') ||
setweight(to_tsvector('pg_catalog.english',
coalesce(new.description,'')), 'C');
return new;
end
$$ LANGUAGE plpgsql;
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON content FOR EACH ROW EXECUTE PROCEDURE
fullsearch_trigger();
27
38. Geco
• Started in 2009 for
Emilia-Romagna
• Multiple content types,
including video, polls, articles
and more
33
39. Geco
• 95 editors (Emilia-Romagna)
• 100.000 documents (Emilia-
Romagna)
• This year: 2 other regions joins
• Future: all 20 regions joins the
project
• Every region has it's own server
deployment
34
40. Objectives
✓ fast and efficient search engine
that can integrate multiple
different Plone sites
✓ search results should be ordered
by rank
✓ content should be serialized in
SQL so it can be reused by other
applications (ratings, comments)
35