Solr Search Engine Integration
We have made some changes to the Magnolia Solr module which will be highlighted. These include: full multi-site support, support for multiple Solr instances, control over which pages to index by using template configurations and Solr document field configurations. The result is a fully configurable module that is easy to maintain. After finishing up out leftover to dos we hope to publish the module to the Magnolia Forge.
Parameter-Based Image Transformations
As we are becoming more and more focused on creating responsive web designs that scale well across various view ports we are experiencing a proliferation of image variations and increasingly complex frontend code to switch between them. In our previous CMS we could create image transformations with request parameters, and we decided to introduce that feature to Magnolia. The implementation and design decisions will be discussed.
Filesystem Image Variation Caching
Magnolia's Imaging module uses the JCR imaging workspace to cache rendered image variations. This has two disadvantages: performance and a larger backup; and no advantages that we are aware of. So we have created a file-based image cache by creating a custom ImageStreamer implementation. The file system path is equal to the JCR path for caching images: the path of the image node plus a reference to the site defenition and the variation name. Because the Imaging servlet currently does not allow you to configure which ImageStreamer instance you want to use for serving cached images, we created our own version of the servlet that uses our own ImageStreamer version.
We've been using this for some time now and image variations are served noticeably faster, while our backup is significantly smaller.
2. Feature overview
• Multi site support.
• Solr Cloud support.
• Asynchronous indexing.
• Improved way to configure which pages are indexed.
• Template based boosting modifier.
• Flexible page type resolving mechanism.
• Search result: page type to css mapping.
• Various solr document field configuration enhancements:
o Multi value flag to match solr document schema.
o Added pluggable system for converting field data to solr
document (Adders).
• Facets.
• Fake facet for period filtering.
3. Multi site support
• Any number of named configurations.
• Link a site to a specific configuration.
• Admin central solr page updated to
trigger deleting all documents of a
specific site.
5. Asynchronous indexing
• Indexing is not part of the workflow.
• Creation of solr document and publication done in
java.util.concurrent.ExecutorService.
• Faster activation.
• No error when indexing fails.
• Should be configurable.
6. Improved way to configure which
pages are indexed
• Previously done with parameter on template definition.
• Two disadvantages:
o No clear overview of which templates are selected
for indexing.
o Not possible to configure how pages with a given
template are indexed.
• Added template configuration for templates to Website
Document.
• Without this configuration pages are not indexed.
8. Template based boosting modifier
• Property on template configuration.
• Allows you to favour pages of some type with
equal score.
• Defaults to 1.0 (neutral).
9. Flexible page type resolving
mechanism
• We want all documents to have a page type
field.
• Based on circumstance page type must be
resolved differently:
o by path.
o by template
o by some external consideration
• Introduced PageTypeResolver interface. Can be
set on Template Configuration.
11. Search result: page type to css
mapping
• Simple mapping of page types to css names.
• Css class names used when rendering the search
result.
12. Field configuration:
Multi value flag to match solr
document schema.
• In Solr schema fields can be multi value or not.
• Inserting a document with multiple values for a
single value field yields an error.
• The multi value search field configuration property
ignores subsequent values for that field.
13. Field configuration:
Pluggable system for converting field
data to solr document.
• Standard values not a problem (String, Number,
Date, Boolean).
• Need more control for special cases: Images,
Html, categories, ..
15. Facets
• Facets: one of the coolest features in Solr.
• Added new configuration for facet fields.
• Maps Solr field names to display field names.
• New paragraph that shows the facets and re-submits
the query, narrowing the search.
17. Fake facet for period filtering
• Date facets have fixed time intervals.
• Code added for configuring a set of date
ranges.
• Configuration option still missing.
19. caveats:
Index time boosting
• Index Time Boosting is not supported by fields that omit
norms.
• The template boosting modifier creates non-standard
values even for fields with no boosting configuration.
• Now you have to set 'omitNorms' to 'true' configuring
those fields, so any boosting is disabled for these fields
20. Todo:
Solr Server Configuration
• The solr server instances are configured
in the repository.
• This is not nice when you have different
servers for test, acceptance, production.
• Somehow externalize at least part of the
configuration.
21. Todo:
Query time boosting
• Currently all boosting is index time.
• It is hard to tweak the boosting
(reïndexing required).
• Query time boosting should become an option.
• Performance?
22. Todo:
Facets and period filter
• Make it part of the facet configuration
• Probably move the facet configuration out
of the field configuration.
23. Todo:
Indexing on activation
• Postponed activation and deactivation not
supported.
• Indexing should be part of the work flow.
• That precludes asynchronous indexing.
24. Ready to share?
• Create separate module that depends on
Magnolia Solr module.
• Remove or generalize some VPRO
specific stuff:
o Class and package names.
o Custom document fields hard coded.
o Remove obsolete code/features.
• Documentation