SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Multi-faceted responsive search,
autocomplete, feeds engine and logging
Remi Mikalsen
Search Engineer, utdanning.no
Multi-facetedMulti-faceted
responsive search,responsive search,
autocomplete,autocomplete,
feeds engine andfeeds engine and
logginglogging
Introduction
Remi Mikalsen
Search engineer, utdanning.no
«Utdanning.no is the official Norwegian national education and
career portal, and includes an overview of education in Norway
and more than 500 career descriptions» - utdanning.no
« [...] Our main goals are to improve the quality of education and
to improve learning outcomes and learning for children, pupils
and students thourgh use of ICT in education» - iktsenteret.no
utdanning.no
Drupal 7 & Solr 3.6
~3 million visitors / year
~12,000 documents
~18,000,000 terms
~260 fields
~1 QPS (~9M searches / year)
~8 ms latency
Data integration in the CMS
Universities, colleges and
community colleges
~30 different endpoints
~3500 documents
Folk high schools
(non-academic)
1 national endpoint
~650 documents
Secondary schools
1 national endpoint
~1100 documents
Higher education admissions
(Samordna opptak)
1 national endpoint
~1500 documents
Secondary schools
metadata (Grep)
1 national endpoint
~650 documents
Higher education
metadata (NUS)
1 national endpoint
~3500 documents
Transform &
normalize
Drupal 7
ER-model
Added value
Editorial staff
Professions, interviews,
education summaries, etc.
~1500 documents
Professions metadata
(STYRK)
2 national endpoints
~1000 documents
Fetch data
Solr 3.6
De-normalized
Searchable
Indexing
Drupal 7
Apache Solr Search
Integration 7.x-1.1
Customized
business logic
Solr 3.6
Pros
Basic Drupal integration
Track document changes
Some facet support
Easily extendable
Cons
Lacks deep introspecting
Little de-normalization
Hacky hierarchies (Drupal)
Note
Custom config files!
schema.xml
(mainly dynamic fields)
solrconfig.xml
(mainly a drupal request handler)
We added
Deep introspecting
Data de-normalization
Solid hierarchy support
Pivot facet support
Atomization
Manual partial re-index
schema.xml
- field types (auto)
- various copy fields
- better spell
- bucket fields
- autocomplete
Organization
(school)
Study programStudy program
Study program
Organization
(school)
+
all its
Study programs
Drupal DB Solr documents
Study program
+
Organization
<doc>
<str name="id">394353</str>
<bool name="bs_mainsearch">true</bool>
<str name="bundle">org</str>
<str name="bundle_name">Organization</str>
<str name="label">ACME University</str>
<str name="atom">[XML]</str>
<arr name="related_nodes">
<str>ACME Rocket Science</str>
<str>Study program 2</str>
<str>Study program N</str>
</arr>
<arr name="sm_geography_hierarchy">
<str>1>California</str>
<str>2>California>San Diego</str>
<str>3>California>San Diego>Gaslamp Quarter</str>
</arr>
<str name="ss_menu_1">orgmenu</str>
<str name="ss_menu_2">org</str>
</doc>
<doc>
<str name="id">394354</str>
<bool name="bs_mainsearch">true</bool>
<str name="bundle">he</str>
<str name="bundle_name">Higher Education</str>
<str name="label">ACME Rocket Science</str>
<str name="atom">[XML]</str>
<arr name="sm_offered_by">
<str>ACME University</str>
</arr>
<arr name="sm_study_area">
<str>Engineering</str>
<str>Science</str>
</arr>
<long name="its_field_semesters">8</long>
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">he</str>
</doc>
Searching
- Site search
- Embedded search
- Feeds engine
Site search
Our goal
Students, councelors and teachers must find what they look for
How?
- Interaction design (IxD) vs graphical design
- User testing, user testing and user testing (and experience)
- Resulting in a GUI specification we must implement
Ajax-Solr is our JS framework:
https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial
- manages all querying
- widgets for interaction with and displaying results
- events fire search requests which updates widgets
We extended it heavily
- Developed all our widgets (10+)
- Added logging (async, via ajax, local and GA)
- Distributed configuration (server + client)
- Simplified initialization script
But it also works out of the box!
Logger
~200 lines
JS library
~1700 lines
Solr 3.6
Our Website
Solr proxy
~85 lines
ajax-solr
evolvingweb
SolrPhpClient
r60
Default config
Initialize
(config)
JS library
(copy)Search
ACME Engineering
Lorum sollicitudin nunc id nibh
blandit pellentesque ipsum.
ACME Law
Cras nunc id nibh blandit
pellentesque sollicitudin.
ACME Med
Ipsum ollicitudin nunc id blandit
nibh pellentesque nibh.
- Include JS library
- Initialize
- Set up HTML
- Search! (and log)
Site search – widgets & faceting
Ajax Solr allows defining N widgets
«Everything» is a widget
A facet is an instance of a FacetWidget
Interaction with widgets may fire query
All facetation is piped into one query
All widgets are updated after Solr response
Some facet widgets we have developed
- Plain
Facet values and facet counts in a list
Multiple (AND) or single choice
- Hierarchical
Facet values and facet counts in a list
Clicking on a facet value drills down into the hierarchy; facet.prefix + fq
- Dropdown
Displays facet values in a dropdown list
Useful for mobile devices in our responsive theme
- Tagcloud
Facet values in a tagcloud
- Pivot facet
Our menu system
Adding facets
Config
facets['interests'] = new facetobject('tagcloud', 'field_interests', '#interests');
facets['ispublic'] = new facetobject('plain', 'field_ispublic', '#ispublic');
config['facets'] = facets;
HTML
<ul id="interests"></ul>
<ul id="ispublic"></ul>
INITIALIZE
Manager.addFacets(config);
Example widget code
AjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({
multivalue: true,
target: null, // HTML target id
field: null, // Solr-field
facet_display_limit: 5, // Max facets to display before «See more»
facet_field_sort: null, // Optional facet sort
dependencies: null, // Conditional display of facet
facet_display_more: 'See more',
facet_display_less: 'See less',
...
init: function() { ...}
beforeRequest: function() { ... }
afterRequest: function() { ... }
});
Site search – pivot facet
Pivot faceting allows you to facet within the results of the parent facet
- http://wiki.apache.org/solr/SimpleFacetParameters
Slight problem; we don't run Solr 4.x!
Problem
Menu facets shouldn't affect each other, but affect search result and other facets
Our solution
Solr document 1
<str name="ss_menu_1">orgmenu</str>
<str name="ss_menu_2">org</str>
Solr document 2
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">higher_ed</str>
Solr document 3
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">secondary</str>
Solr query when a top level menu tab is selected
fq={!tag=ss_menu_1}ss_menu_1:edumenu&
facet.field={!ex=ss_menu_1}ss_menu_1
Solr query when a sub-level menu tab is selected
fq={!tag=ss_menu_1}ss_menu_1:edumenu&
fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed&
facet.field={!ex=ss_menu_1}ss_menu_1&
facet.field={!ex=ss_menu_2}ss_menu_2
Drawbacks
- Can be VERY slow on large indexes with many unique terms in the facet
Why do we do it?
- Small index; 18M terms, 12K documents
- Pivot facet fields have very few distinct values (5-8)!
Site search - autocomplete
Our goal
Give our users the feeling that we've implemented a mind-reader
How?
With relevant, grouped suggestions* as they type in a search query
Do we succeed?
50% of our «clicks to content» from searches comes from autocomplete
Implementing autocomplete is «easy»
1) Ajax
2) Detect keystrokes
3) Send one request per keystroke
4) Receive results, populate result list
Techniques we employ
- Minimal payload (reduced fl)
- But same boosts and qf as «normal» queries
- group=true, group.field=, group.limit=
- start_label^1.5 wild_label^1 wild_other^0.25
- Caching (jsonp, cache=true)
Define field type
<fieldType name="startsWith" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/>
</analyzer>
</fieldType>
Define fields
<field name="start_label" type="startsWith" indexed="true" stored="false" multiValued="false"/>
Copy fields
<copyField source="label" dest="start_label"/>
Define field type
<fieldType name="wildCardType" class="solr.TextField" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="front"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="back"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt">
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="false"/>
<filter class="solr.NorwegianLightStemFilterFactory"/>
</analyzer>
</fieldType>
Define fields
<field name="wild_label" type="wildCardType" indexed="true" stored="false" multiValued="false"/>
<field name="wild_other" type="wildCardType" indexed="true" stored="false" multiValued="true"/>
Copy fields
<copyField source="label" dest="wild_label"/>
<copyField source="teaser" dest="wild_other"/>
<copyField source="body" dest="wild_other"/>
<copyField source="searchwords" dest="wild_other"/>
<copyField source="related_nodes" dest="wild_other"/>
Embedded search
Our goal
Let other sites search our data
How?
The exact same way we do ourselves
Do we succeed?
Two external sites are up and running and a third is on its way
Logger
~200 lines
JS library
~1700 lines
Solr 3.6
ACME Website
Solr proxy
~85 lines
ajax-solr
evolvingweb
ACME config
SolrPhpClient
r60
Default config
Config
(override)
JS library
(copy)Search
ACME Engineering
Lorum sollicitudin nunc id nibh
blandit pellentesque ipsum.
ACME Law
Cras nunc id nibh blandit
pellentesque sollicitudin.
ACME Med
Ipsum ollicitudin nunc id blandit
nibh pellentesque nibh.
- Register with us
- Include our JS library
- Set up config
- Set up HTML
- Search! (and log)
<html>
<head>
<title>ACME Website</title>
<!-- utdanning.no search framework -->
<script src="/js/jquery.js"></script>
<script src="http://example.com/solrservice/js-min/solr-search-full-min.js"></script>
<script src="/js/search-init.js"></script>
</head>
<body>
<!-- Search form -->
<form>
<input id="query" name="query" type="search" />
<input type="submit" value="Search" />
</form>
<!-- Search results -->
<div><ul class="hits" id="hits"></ul></div>
</body>
</html>
<script type="text/javascript">
// ACME mockup init-script
var Manager; // Search manager object
uno_config = loadConfig(http://example.com/solrservice/.../acme.config);
// Fully customizable search configuration, e.g.:
uno_config['server']['qf'] = 'label^1.8 content^1.2';
// Search box widget
Manager.addPlainSearch(uno_config);
// Result list widget
Manager.addResults(uno_config);
Manager.finalizeConfig(uno_config);
Manager.doRequest(); // Optional
Site owners have full control
Add, edit and configure widgets
Query fields, boosts, etc.
Faceting
Styling
Pre-limit search to parts of our index
Because we eat our own dog food!
Feeds engine
Our goal
Deliver data in bulk to partner organizations
How?
Restful searchable data endpoint that returns XML (Atom++)
Do we succeed?
Beta-partner up and running with stunning performance
Consumer
Query
Default config
Feeds engine
~300 lines
Solr proxy
~85 lines
Solr 3.6
Logger
~200 lines
SolrPhpClient
r60
Feeds engine
- Parses incoming query
- Loads config (filters, weights, ...)
- Transforms incoming + config to Solr URL
- Sends to Solr proxy
Solr Proxy
- Loads Solr PHP Client library
- Sends search request and parses response
- Returns results to Feeds engine
Feeds engine
- Loads logger and logs results
- Picks out ATOM from response
- Glues result inside an ATOM frame
- Display feed
http://example.com/data/atom/organizations
http://example.com/data/atom/organizations/10/2
http://example.com/data/atom/organizations?fq=type:HE
http://example.com/data/atom/organizations?fq=type:HE&q=law
Consume with feeds reader
Logging
How?
Logging back-end written in PHP that writes to a MySQL database
- called asynchronously from JS library
- called inline in Feeds engine
Google Analytics (ga.js)
- called from JS library (searchwords and categories)
What?
- Search terms
- Facets
- User interaction
- List of search results
- Stack latency (JS, PHP, Solr)
- Search domain
- Session
Why?
Most popular queries with no results?
Most popular queries?
How does QPS affect latency?
Follow a user through search (interaction design & user testing)
Displaying logs
Charts are generated with Google Chart Tools in Drupal
Other statistics can easily be explored with Drupal Views
Demo (includes responsiveness)
http://utdanning.no/sok
http://utdanning.no/search
http://utdanning.no/solrservice/utdanning.no
Drupal 7
Apache Solr Search Integration
+ custom indexing
Omega theme (responsiveness with Drupal)
+ custom js
Ajax Solr
+ custom widgets
Solr Php Client r60
+ custom proxy
Bootstrap (responsiveness without Drupal)
jQuery
Google Chart Tools
Remi MikalsenRemi Mikalsen
remi.mikalsen@iktsenteret.noremi.mikalsen@iktsenteret.no
iktsenteret.noiktsenteret.no
Multi-facetedMulti-faceted
responsive search,responsive search,
autocomplete,autocomplete,
feeds engine andfeeds engine and
logginglogging
CONTACT
Remi Mikalsen
remi.mikalsen@iktsenteret.no

Contenu connexe

Tendances

Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
Swapnil & Patil
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
GokulD
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 

Tendances (20)

Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Azure search
Azure searchAzure search
Azure search
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 

Similaire à Multi faceted responsive search, autocomplete, feeds engine & logging

Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Sease
 
Angular js quickstart
Angular js quickstartAngular js quickstart
Angular js quickstart
LinkMe Srl
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
Chengjen Lee
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
lucenerevolution
 
SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2
Splunk
 

Similaire à Multi faceted responsive search, autocomplete, feeds engine & logging (20)

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 
Broadleaf Presents Thymeleaf
Broadleaf Presents ThymeleafBroadleaf Presents Thymeleaf
Broadleaf Presents Thymeleaf
 
Angular js quickstart
Angular js quickstartAngular js quickstart
Angular js quickstart
 
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Lab manual asp.net
Lab manual asp.netLab manual asp.net
Lab manual asp.net
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search API
 
GDI Seattle - Intro to JavaScript Class 4
GDI Seattle - Intro to JavaScript Class 4GDI Seattle - Intro to JavaScript Class 4
GDI Seattle - Intro to JavaScript Class 4
 
Built in filters
Built in filtersBuilt in filters
Built in filters
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTR
 

Plus de lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucene
 

Dernier

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 

Dernier (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

Multi faceted responsive search, autocomplete, feeds engine & logging

  • 1. Multi-faceted responsive search, autocomplete, feeds engine and logging Remi Mikalsen Search Engineer, utdanning.no
  • 3. Introduction Remi Mikalsen Search engineer, utdanning.no «Utdanning.no is the official Norwegian national education and career portal, and includes an overview of education in Norway and more than 500 career descriptions» - utdanning.no « [...] Our main goals are to improve the quality of education and to improve learning outcomes and learning for children, pupils and students thourgh use of ICT in education» - iktsenteret.no
  • 4. utdanning.no Drupal 7 & Solr 3.6 ~3 million visitors / year ~12,000 documents ~18,000,000 terms ~260 fields ~1 QPS (~9M searches / year) ~8 ms latency
  • 6. Universities, colleges and community colleges ~30 different endpoints ~3500 documents Folk high schools (non-academic) 1 national endpoint ~650 documents Secondary schools 1 national endpoint ~1100 documents Higher education admissions (Samordna opptak) 1 national endpoint ~1500 documents Secondary schools metadata (Grep) 1 national endpoint ~650 documents Higher education metadata (NUS) 1 national endpoint ~3500 documents Transform & normalize Drupal 7 ER-model Added value Editorial staff Professions, interviews, education summaries, etc. ~1500 documents Professions metadata (STYRK) 2 national endpoints ~1000 documents Fetch data Solr 3.6 De-normalized Searchable
  • 8. Drupal 7 Apache Solr Search Integration 7.x-1.1 Customized business logic Solr 3.6 Pros Basic Drupal integration Track document changes Some facet support Easily extendable Cons Lacks deep introspecting Little de-normalization Hacky hierarchies (Drupal) Note Custom config files! schema.xml (mainly dynamic fields) solrconfig.xml (mainly a drupal request handler) We added Deep introspecting Data de-normalization Solid hierarchy support Pivot facet support Atomization Manual partial re-index schema.xml - field types (auto) - various copy fields - better spell - bucket fields - autocomplete
  • 9. Organization (school) Study programStudy program Study program Organization (school) + all its Study programs Drupal DB Solr documents Study program + Organization
  • 10. <doc> <str name="id">394353</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">org</str> <str name="bundle_name">Organization</str> <str name="label">ACME University</str> <str name="atom">[XML]</str> <arr name="related_nodes"> <str>ACME Rocket Science</str> <str>Study program 2</str> <str>Study program N</str> </arr> <arr name="sm_geography_hierarchy"> <str>1>California</str> <str>2>California>San Diego</str> <str>3>California>San Diego>Gaslamp Quarter</str> </arr> <str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str> </doc>
  • 11. <doc> <str name="id">394354</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">he</str> <str name="bundle_name">Higher Education</str> <str name="label">ACME Rocket Science</str> <str name="atom">[XML]</str> <arr name="sm_offered_by"> <str>ACME University</str> </arr> <arr name="sm_study_area"> <str>Engineering</str> <str>Science</str> </arr> <long name="its_field_semesters">8</long> <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">he</str> </doc>
  • 12. Searching - Site search - Embedded search - Feeds engine
  • 14. Our goal Students, councelors and teachers must find what they look for How? - Interaction design (IxD) vs graphical design - User testing, user testing and user testing (and experience) - Resulting in a GUI specification we must implement
  • 15. Ajax-Solr is our JS framework: https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial - manages all querying - widgets for interaction with and displaying results - events fire search requests which updates widgets We extended it heavily - Developed all our widgets (10+) - Added logging (async, via ajax, local and GA) - Distributed configuration (server + client) - Simplified initialization script But it also works out of the box!
  • 16. Logger ~200 lines JS library ~1700 lines Solr 3.6 Our Website Solr proxy ~85 lines ajax-solr evolvingweb SolrPhpClient r60 Default config Initialize (config) JS library (copy)Search ACME Engineering Lorum sollicitudin nunc id nibh blandit pellentesque ipsum. ACME Law Cras nunc id nibh blandit pellentesque sollicitudin. ACME Med Ipsum ollicitudin nunc id blandit nibh pellentesque nibh. - Include JS library - Initialize - Set up HTML - Search! (and log)
  • 17. Site search – widgets & faceting Ajax Solr allows defining N widgets «Everything» is a widget A facet is an instance of a FacetWidget Interaction with widgets may fire query All facetation is piped into one query All widgets are updated after Solr response
  • 18. Some facet widgets we have developed - Plain Facet values and facet counts in a list Multiple (AND) or single choice - Hierarchical Facet values and facet counts in a list Clicking on a facet value drills down into the hierarchy; facet.prefix + fq - Dropdown Displays facet values in a dropdown list Useful for mobile devices in our responsive theme - Tagcloud Facet values in a tagcloud - Pivot facet Our menu system
  • 19. Adding facets Config facets['interests'] = new facetobject('tagcloud', 'field_interests', '#interests'); facets['ispublic'] = new facetobject('plain', 'field_ispublic', '#ispublic'); config['facets'] = facets; HTML <ul id="interests"></ul> <ul id="ispublic"></ul> INITIALIZE Manager.addFacets(config);
  • 20. Example widget code AjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({ multivalue: true, target: null, // HTML target id field: null, // Solr-field facet_display_limit: 5, // Max facets to display before «See more» facet_field_sort: null, // Optional facet sort dependencies: null, // Conditional display of facet facet_display_more: 'See more', facet_display_less: 'See less', ... init: function() { ...} beforeRequest: function() { ... } afterRequest: function() { ... } });
  • 21.
  • 22. Site search – pivot facet
  • 23. Pivot faceting allows you to facet within the results of the parent facet - http://wiki.apache.org/solr/SimpleFacetParameters Slight problem; we don't run Solr 4.x!
  • 24. Problem Menu facets shouldn't affect each other, but affect search result and other facets
  • 25. Our solution Solr document 1 <str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str> Solr document 2 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">higher_ed</str> Solr document 3 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">secondary</str> Solr query when a top level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& facet.field={!ex=ss_menu_1}ss_menu_1 Solr query when a sub-level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed& facet.field={!ex=ss_menu_1}ss_menu_1& facet.field={!ex=ss_menu_2}ss_menu_2
  • 26. Drawbacks - Can be VERY slow on large indexes with many unique terms in the facet Why do we do it? - Small index; 18M terms, 12K documents - Pivot facet fields have very few distinct values (5-8)!
  • 27.
  • 28. Site search - autocomplete
  • 29. Our goal Give our users the feeling that we've implemented a mind-reader How? With relevant, grouped suggestions* as they type in a search query Do we succeed? 50% of our «clicks to content» from searches comes from autocomplete
  • 30. Implementing autocomplete is «easy» 1) Ajax 2) Detect keystrokes 3) Send one request per keystroke 4) Receive results, populate result list Techniques we employ - Minimal payload (reduced fl) - But same boosts and qf as «normal» queries - group=true, group.field=, group.limit= - start_label^1.5 wild_label^1 wild_other^0.25 - Caching (jsonp, cache=true)
  • 31. Define field type <fieldType name="startsWith" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> </analyzer> </fieldType> Define fields <field name="start_label" type="startsWith" indexed="true" stored="false" multiValued="false"/> Copy fields <copyField source="label" dest="start_label"/>
  • 32. Define field type <fieldType name="wildCardType" class="solr.TextField" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="front"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="back"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="false"/> <filter class="solr.NorwegianLightStemFilterFactory"/> </analyzer> </fieldType> Define fields <field name="wild_label" type="wildCardType" indexed="true" stored="false" multiValued="false"/> <field name="wild_other" type="wildCardType" indexed="true" stored="false" multiValued="true"/> Copy fields <copyField source="label" dest="wild_label"/> <copyField source="teaser" dest="wild_other"/> <copyField source="body" dest="wild_other"/> <copyField source="searchwords" dest="wild_other"/> <copyField source="related_nodes" dest="wild_other"/>
  • 33.
  • 35. Our goal Let other sites search our data How? The exact same way we do ourselves Do we succeed? Two external sites are up and running and a third is on its way
  • 36. Logger ~200 lines JS library ~1700 lines Solr 3.6 ACME Website Solr proxy ~85 lines ajax-solr evolvingweb ACME config SolrPhpClient r60 Default config Config (override) JS library (copy)Search ACME Engineering Lorum sollicitudin nunc id nibh blandit pellentesque ipsum. ACME Law Cras nunc id nibh blandit pellentesque sollicitudin. ACME Med Ipsum ollicitudin nunc id blandit nibh pellentesque nibh. - Register with us - Include our JS library - Set up config - Set up HTML - Search! (and log)
  • 37. <html> <head> <title>ACME Website</title> <!-- utdanning.no search framework --> <script src="/js/jquery.js"></script> <script src="http://example.com/solrservice/js-min/solr-search-full-min.js"></script> <script src="/js/search-init.js"></script> </head> <body> <!-- Search form --> <form> <input id="query" name="query" type="search" /> <input type="submit" value="Search" /> </form> <!-- Search results --> <div><ul class="hits" id="hits"></ul></div> </body> </html>
  • 38. <script type="text/javascript"> // ACME mockup init-script var Manager; // Search manager object uno_config = loadConfig(http://example.com/solrservice/.../acme.config); // Fully customizable search configuration, e.g.: uno_config['server']['qf'] = 'label^1.8 content^1.2'; // Search box widget Manager.addPlainSearch(uno_config); // Result list widget Manager.addResults(uno_config); Manager.finalizeConfig(uno_config); Manager.doRequest(); // Optional
  • 39. Site owners have full control Add, edit and configure widgets Query fields, boosts, etc. Faceting Styling Pre-limit search to parts of our index Because we eat our own dog food!
  • 41. Our goal Deliver data in bulk to partner organizations How? Restful searchable data endpoint that returns XML (Atom++) Do we succeed? Beta-partner up and running with stunning performance
  • 42. Consumer Query Default config Feeds engine ~300 lines Solr proxy ~85 lines Solr 3.6 Logger ~200 lines SolrPhpClient r60
  • 43. Feeds engine - Parses incoming query - Loads config (filters, weights, ...) - Transforms incoming + config to Solr URL - Sends to Solr proxy Solr Proxy - Loads Solr PHP Client library - Sends search request and parses response - Returns results to Feeds engine Feeds engine - Loads logger and logs results - Picks out ATOM from response - Glues result inside an ATOM frame - Display feed
  • 46. How? Logging back-end written in PHP that writes to a MySQL database - called asynchronously from JS library - called inline in Feeds engine Google Analytics (ga.js) - called from JS library (searchwords and categories) What? - Search terms - Facets - User interaction - List of search results - Stack latency (JS, PHP, Solr) - Search domain - Session
  • 47. Why? Most popular queries with no results? Most popular queries? How does QPS affect latency? Follow a user through search (interaction design & user testing) Displaying logs Charts are generated with Google Chart Tools in Drupal Other statistics can easily be explored with Drupal Views
  • 48.
  • 49.
  • 50.
  • 53. Drupal 7 Apache Solr Search Integration + custom indexing Omega theme (responsiveness with Drupal) + custom js Ajax Solr + custom widgets Solr Php Client r60 + custom proxy Bootstrap (responsiveness without Drupal) jQuery Google Chart Tools
  • 54. Remi MikalsenRemi Mikalsen remi.mikalsen@iktsenteret.noremi.mikalsen@iktsenteret.no iktsenteret.noiktsenteret.no Multi-facetedMulti-faceted responsive search,responsive search, autocomplete,autocomplete, feeds engine andfeeds engine and logginglogging