Presented by Remi Mikalsen, Search Engineer, The Norwegian Centre for ICT in Education
Learn how utdanning.no leverages open source technologies to deliver a blazing fast multi-faceted responsive search experience and a flexible and efficient feeds engine on top of Solr 3.6. Among the key open source projects that will be covered are Solr, Ajax-Solr, SolrPHPClient, Bootstrap, jQuery and Drupal. Notable highlights are ajaxified pivot facets, multiple parents hierarchical facets, ajax autocomplete with edge-n-gram and grouping, integrating our search widgets on any external website, custom Solr logging and using Solr to deliver Atom feeds. utdanning.no is a governmental website that collects, normalizes and publishes study information for related to secondary school and higher education in Norway. With 1.2 million visitors each year and 12.000 indexed documents we focus on precise information and a high degree of usability for students, potential students and counselors.
3. Introduction
Remi Mikalsen
Search engineer, utdanning.no
«Utdanning.no is the official Norwegian national education and
career portal, and includes an overview of education in Norway
and more than 500 career descriptions» - utdanning.no
« [...] Our main goals are to improve the quality of education and
to improve learning outcomes and learning for children, pupils
and students thourgh use of ICT in education» - iktsenteret.no
4. utdanning.no
Drupal 7 & Solr 3.6
~3 million visitors / year
~12,000 documents
~18,000,000 terms
~260 fields
~1 QPS (~9M searches / year)
~8 ms latency
14. Our goal
Students, councelors and teachers must find what they look for
How?
- Interaction design (IxD) vs graphical design
- User testing, user testing and user testing (and experience)
- Resulting in a GUI specification we must implement
15. Ajax-Solr is our JS framework:
https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial
- manages all querying
- widgets for interaction with and displaying results
- events fire search requests which updates widgets
We extended it heavily
- Developed all our widgets (10+)
- Added logging (async, via ajax, local and GA)
- Distributed configuration (server + client)
- Simplified initialization script
But it also works out of the box!
16. Logger
~200 lines
JS library
~1700 lines
Solr 3.6
Our Website
Solr proxy
~85 lines
ajax-solr
evolvingweb
SolrPhpClient
r60
Default config
Initialize
(config)
JS library
(copy)Search
ACME Engineering
Lorum sollicitudin nunc id nibh
blandit pellentesque ipsum.
ACME Law
Cras nunc id nibh blandit
pellentesque sollicitudin.
ACME Med
Ipsum ollicitudin nunc id blandit
nibh pellentesque nibh.
- Include JS library
- Initialize
- Set up HTML
- Search! (and log)
17. Site search – widgets & faceting
Ajax Solr allows defining N widgets
«Everything» is a widget
A facet is an instance of a FacetWidget
Interaction with widgets may fire query
All facetation is piped into one query
All widgets are updated after Solr response
18. Some facet widgets we have developed
- Plain
Facet values and facet counts in a list
Multiple (AND) or single choice
- Hierarchical
Facet values and facet counts in a list
Clicking on a facet value drills down into the hierarchy; facet.prefix + fq
- Dropdown
Displays facet values in a dropdown list
Useful for mobile devices in our responsive theme
- Tagcloud
Facet values in a tagcloud
- Pivot facet
Our menu system
19. Adding facets
Config
facets['interests'] = new facetobject('tagcloud', 'field_interests', '#interests');
facets['ispublic'] = new facetobject('plain', 'field_ispublic', '#ispublic');
config['facets'] = facets;
HTML
<ul id="interests"></ul>
<ul id="ispublic"></ul>
INITIALIZE
Manager.addFacets(config);
20. Example widget code
AjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({
multivalue: true,
target: null, // HTML target id
field: null, // Solr-field
facet_display_limit: 5, // Max facets to display before «See more»
facet_field_sort: null, // Optional facet sort
dependencies: null, // Conditional display of facet
facet_display_more: 'See more',
facet_display_less: 'See less',
...
init: function() { ...}
beforeRequest: function() { ... }
afterRequest: function() { ... }
});
23. Pivot faceting allows you to facet within the results of the parent facet
- http://wiki.apache.org/solr/SimpleFacetParameters
Slight problem; we don't run Solr 4.x!
25. Our solution
Solr document 1
<str name="ss_menu_1">orgmenu</str>
<str name="ss_menu_2">org</str>
Solr document 2
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">higher_ed</str>
Solr document 3
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">secondary</str>
Solr query when a top level menu tab is selected
fq={!tag=ss_menu_1}ss_menu_1:edumenu&
facet.field={!ex=ss_menu_1}ss_menu_1
Solr query when a sub-level menu tab is selected
fq={!tag=ss_menu_1}ss_menu_1:edumenu&
fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed&
facet.field={!ex=ss_menu_1}ss_menu_1&
facet.field={!ex=ss_menu_2}ss_menu_2
26. Drawbacks
- Can be VERY slow on large indexes with many unique terms in the facet
Why do we do it?
- Small index; 18M terms, 12K documents
- Pivot facet fields have very few distinct values (5-8)!
29. Our goal
Give our users the feeling that we've implemented a mind-reader
How?
With relevant, grouped suggestions* as they type in a search query
Do we succeed?
50% of our «clicks to content» from searches comes from autocomplete
30. Implementing autocomplete is «easy»
1) Ajax
2) Detect keystrokes
3) Send one request per keystroke
4) Receive results, populate result list
Techniques we employ
- Minimal payload (reduced fl)
- But same boosts and qf as «normal» queries
- group=true, group.field=, group.limit=
- start_label^1.5 wild_label^1 wild_other^0.25
- Caching (jsonp, cache=true)
35. Our goal
Let other sites search our data
How?
The exact same way we do ourselves
Do we succeed?
Two external sites are up and running and a third is on its way
36. Logger
~200 lines
JS library
~1700 lines
Solr 3.6
ACME Website
Solr proxy
~85 lines
ajax-solr
evolvingweb
ACME config
SolrPhpClient
r60
Default config
Config
(override)
JS library
(copy)Search
ACME Engineering
Lorum sollicitudin nunc id nibh
blandit pellentesque ipsum.
ACME Law
Cras nunc id nibh blandit
pellentesque sollicitudin.
ACME Med
Ipsum ollicitudin nunc id blandit
nibh pellentesque nibh.
- Register with us
- Include our JS library
- Set up config
- Set up HTML
- Search! (and log)
39. Site owners have full control
Add, edit and configure widgets
Query fields, boosts, etc.
Faceting
Styling
Pre-limit search to parts of our index
Because we eat our own dog food!
41. Our goal
Deliver data in bulk to partner organizations
How?
Restful searchable data endpoint that returns XML (Atom++)
Do we succeed?
Beta-partner up and running with stunning performance
46. How?
Logging back-end written in PHP that writes to a MySQL database
- called asynchronously from JS library
- called inline in Feeds engine
Google Analytics (ga.js)
- called from JS library (searchwords and categories)
What?
- Search terms
- Facets
- User interaction
- List of search results
- Stack latency (JS, PHP, Solr)
- Search domain
- Session
47. Why?
Most popular queries with no results?
Most popular queries?
How does QPS affect latency?
Follow a user through search (interaction design & user testing)
Displaying logs
Charts are generated with Google Chart Tools in Drupal
Other statistics can easily be explored with Drupal Views