Out of the box, Plone includes an integrated and powerful search engine with features such as live search and full text indexing. Sometimes this isn't enough or you need more robust search features to provide your site visitors with a more custom search experience.
In this talk, Six Feet Up CTO Calvin Hendryx-Parker, will go into the details of implementing Solr with Plone for a large project. Solr is an enterprise search engine that can be deployed alongside of Plone.
Some of the topics to be discussed include:
weighted search
thesaurus
spell check
flexible query parsing
faster search performance
and more...
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Enterprise search in Plone using Solr
1. Enterprise Search in
Plone using Solr
Calvin Hendryx-Parker
Plone Symposium East 2010
nowhere to go but
open source
s i xf e e tu p . c om
2. What is Solr?
• Java Based
• Full-Text Search
• Web Services API
• Standards Based Interfaces
• Scalable
• XML Configuration
• Extensible
sixfeetup.com/deploy
10. nowhere to go but
open source
s i xf e e tu p . c om
11. Solr Performance
• Wiktionary Dataset
• 49.5 Millions lines of XML
• 1.3 GB of data
• 1.7 Million Pages Index 5.5 hours
• ZODB Size after import 1.1GB
sixfeetup.com/deploy
13. collective.solr Issues
• Monkey Patching
• Relies on collective.indexing
• Duplicates all indexes
• Sub-Optimal Integration with Zope Transactions
• Relies on Thread Locals
sixfeetup.com/deploy
14. What to do?
nowhere to go but
open source
s i xf e e tu p . c om
15. Reevaluate
nowhere to go but
open source
s i xf e e tu p . c om
16. Solr Integration as a Catalog Index
• No Monkey Patching
• Simpler Code
sixfeetup.com/deploy
17. Enter alm.solrindex
• ZCatalog Index
• Doesn't depend on Plone
• Utilizes new foreign_connections Connection
Method
• Pass through Solr Queries
• Direct access to the Solr Response
sixfeetup.com/deploy
18. nowhere to go but
open source
s i xf e e tu p . c om
19. nowhere to go but
open source
s i xf e e tu p . c om
20. Sorting
• Still handled by the ZCatalog
• Could change in the future
sixfeetup.com/deploy
21. alm.solrindex Field Handlers
• Handle Parsing Attributes for Indexing
• Translate field-specific queries to Solr
• Zope Utilities
sixfeetup.com/deploy
22. Example Handler
class TextFieldHandler(DefaultFieldHandler):
<html>
def parse_query(self, field, field_query):
<body>
name = field.name
<h3>Code Sample</h3>
request = {name: field_query}
<p>Replace this text!</p>
record = parseIndexRequest(request, name, ('query',))
if not record.keys:
</body>
return None
</html>
query_str = ' '.join(record.keys)
if not query_str:
return None
return {'q': u'+%s:%s' % (name, quote_query(query_str))}
sixfeetup.com/deploy
23. Other alm.solrindex Features
• GenericSetup Profile
• Tests
• Uses solrpy instead of the unsupported solr.py
sixfeetup.com/deploy
24. Tips
• Can replace several ZCatalog indexes
• Remove any indexes you have replaced
• Use it for all Text Indexes
• Still Utilize the ZCatalog Indexes for Everything
Else
sixfeetup.com/deploy
25. Demo
<html>
<body>
<h3>Code Sample</h3>
<p>Replace this text!</p>
</body> Project Gutenberg Data
</html>
sixfeetup.com/deploy
26. Questions?
nowhere to go but
open source
s i xf e e tu p . c om
enterprise search server
XML, JSON, HTTP (REST)
Efficient Replication to other Solr Search Servers
Full Plugin Architecture
XML over HTTP
HTTP GET returns XML results
very similar to the ZCatalog
Numeric Types, Dynamic Fields, Unique Keys
rich set of debugging tools
very similar to the ZCatalog
Numeric Types, Dynamic Fields, Unique Keys
rich set of debugging tools
17GB before packing
still using zope in the middle for this
did the same import with ZCTextIndex and ZCatalog and it crashed and burned at 900k items
designed to work with this package, which is a collection of more monkey patches
except SearchableText which it removes from the catalog
anytime you add an index to the catalog, have to add it to solr also
global or thread local variables usually make it much harder to understand
horrible code readability
could lead to unexpected results, maybe your solr connection isn't there anymore
This is open source software so we can learn from past experiences of others.
Six Feet Up embarked on a large project where flexible search features were a great match for Solr.
We shouldn't throw away everything in collective.solr.
less indirection
allow us to re-factor our connection to Solr
only one needed per catalog
uses the solr schema to determine what columns to index
maintains a persistent connection to external db from a ZODB object
won't be dropped by object deactivations like _v_ methods
won't break when you pass control between treads like thread locals
Pass solr query paramaters directly via the solr_param dictionary.
Requires that some value be passed for "q".
Can be used to access features like weighting of terms in a query.
Pass in a solr_callback function in your query and SolrIndex will call it passing the parsed Solr response object.
Not handled by SolrIndex at this time.
Just use the sort_on parameter like you normally would.
write your own and register them via ZCML
Avoids indexing the same attribute multiple times.
ZCatalog falls short on full text indexing when it comes to performance and features
Native ZCatalog indexes are faster than any network bound service.
ZCatalog Indexes share transaction aware ZODB caches.
32K Books
richer metadata than the wiktionary data