SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
Dublin, IE
2013.11.07

Trey Grainger

ENHANCING RELEVANCY THROUGH
PERSONALIZATION & SEMANTIC SEARCH

Search Technology Development Manager

@"
My Background

Trey"Grainger"

Search"Technology"Development"Manager"
""@CareerBuilder.com"
"
Relevant"Background"
•  Search"&"Recommenda>ons"
•  HighAvolume,"Distributed"Systems"
•  NLP,"Relevancy"Tuning,"User"Group"Tes>ng,"&"Machine"Learning"
"

"""""""""""""""""""""""""""Other"Projects"

•  CoAauthor:""Solr%in%Ac*on%
•  Founder"and"Chief"Engineer"@"""""""""""""""""""""""""".com"
Roadmap
• 
• 
• 

I. How we use Solr @ CareerBuilder
II. Traditional Relevancy Scoring
III. Advanced Relevancy through functions
–  Factors as a linear function
–  Context-aware relevancy parameter weighting

• 

III. Personalization & Recommendations
–  Profile and Behavior-based
–  Solr as a recommendation engine
–  Collaborative Filtering

• 

IV. Semantic Search
– 
– 
– 
– 
– 

Mining user-behavior for synonyms
Uncovering meaning through clustering
Latent Semantic Indexing overview
Document-based searching
Foreground vs. Background analysis
How"we"use"Solr"@"CareerBuilder"
Search Scale @

• 
• 
• 
• 
• 
• 

Over"2.5"million"new"jobs"each"month""
Over"60"million"ac>vely"searchable"resumes"
~300"globally"distributed"search"servers""
Thousands"of"unique,"dynamically"generated"indexes"
Over"1"Billion"ac>vely"searchable"documents"
Over"1"million"searches"an"hour"
Data Analytics
Data Analytics
Data Analytics (market supply)
Data Analytics (market demand)
Data Analytics (labor pressure: supply/demand)
Data Analytics (hiring comparison per market)
Traditional Search
Recommendations
Tradi>onal"Relevancy"Scoring"
Default Lucene Relevancy Algorithm (DefaultSimilarity)
Score(q,d)"="""
""""""∑""("-(t"in"d)".""idf(t)2"."t.getBoost()"."norm(t,"d)")6.6coord(q,"d)".6queryNorm(q)
"""""t"in"q"

""""

"
Where:""
"t"="term;"d"="document;"q"="query;"f"="field"
666666666-(t"in"d)""=""numTermOccurrencesInDocument"½"
666666666idf(t)"=""1"+"log"(numDocs"/"(docFreq"+"1))"
666666666coord(q,"d)"="numTermsInDocumentFromQuery"/"numTermsInQuery"
666666666queryNorm(q)"="1"/"(sumOfSquaredWeights"½")"
666666666sumOfSquaredWeights"="q.getBoost()2"."∑"("idf(t)"."t.getBoost()")2""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""t"in"q"

666666666norm(t,"d)"""="""d.getBoost()""f""lengthNorm(f)""f"""f.getBoost()"
*Source:"Solr%in%Ac*on,"chapter"3"

6
TF * IDF
• 

Term Frequency: “How well a term describes a document?”
–  Measure: how often a term occurs per document

• 

Inverse Document Frequency: “How important is a term overall?”
–  Measure: how rare the term is across all documents
Boosting documents and fields

• 

Certain fields may be more important than other fields:
–  The Job Title and Skills may be more relevant than other aspects of the job:
/select?qf=jobtitle^10 skills^5 jobrequirements^2 jobdescription^1

• 

It’s possible to boost documents and fields at both index time and query time

• 

If you need more fine-grained control (such as per-term index-time boosting),
you can make use of payloads
Custom scoring with Payloads
• 

In addition to boosting search terms and fields, content within Fields can also be
boosted differently using Payloads (requires a custom scoring implementation):
design [1] / engineer [1] / really [ ] / great [ ] / job [ ] / ten[3] / years[3] /
experience[3] / careerbuilder [2] / design [2], …
jobtitle: bucket=[1] boost=10; company: bucket=[2] boost=4;
jobdescription: bucket=[ ] weight=1; experience: bucket=[3] weight=1.5
We can pass in a parameter to solr at query time specifying the boost to apply to each
bucket i.e. …&bucketWeights=1:10;2:4;3:1.5;default:1;

• 

This allows us to map many relevancy buckets to search terms at index time and adjust
the weighting at query time without having to search across hundreds of fields.

• 

By making all scoring parameters overridable at query time, we are able to do A / B
testing to consistently improve our relevancy model
That’s great, but what about domain-specific knowledge?
• 
• 
• 
• 
• 

News search: popularity and freshness drive relevance
Restaurant search: geographical proximity and price range are critical
Ecommerce: likelihood of a purchase is key
Movie search: More popular titles are generally more relevant
Job search: category of job, salary range, and geographical proximity matter

TF * IDF of keywords can’t hold it’s own against good
domain-specific relevance factors!
Advanced"Relevancy"through"Func>ons"
Example of domain-specific relevancy calculation
News website:
/select?
fq=$myQuery&
25%"
q=_query_:"{!func}scale(query($myQuery),0,100)"
AND _query_:"{!func}div(100,map(geodist(),0,1,1))"
25%"
AND _query_:"{!func}recip(rord(publicationDate),0,100,100)"
25%"
AND _query_:"{!func}scale(popularity,0,100)"&
myQuery="street festival"&
25%"
sfield=location&
pt=33.748,-84.391

*Example"from"chapter"16"of"Solr%in%Ac*on%
Fancy boosting functions
• 

Separating “relevancy” and “filtering” from the query:
q=_val_:"$keywords"&fq={!cache=false v=$keywords}&keywords=solr

• 

Keywords (50%) + distance (25%) + category (25%)
q=_val_:"scale(mul(query($keywords),1),0,50)" AND
_val_:"scale(sum($radiusInKm,mul(query($distance),-1)),0,25)” AND
_val_:"scale(mul(query($category),1),0,25)"
&keywords=solr
&radiusInKm=48.28
&distance=_val_:"geodist(latitudelongitude.latlon_is,33.77402,-84.29659)”
&category=jobtitle:"java developer"
&fq={!cache=false v=$keywords}
Context aware relevancy
Example: Willingness to relocate for a job
2,500"
2,000"
1,500"
1,000"
500"
0"

So>ware6engineers6
Food6service6workers6
1%" 5%" 10%" 20%" 25%" 30%" 40%" 50%" 60%" 70%" 75%" 80%" 90%" 95%"
Willingness to relocate

Somware"engineers"in"Chicago"want"jobs"in"these"loca>ons:"
Willingness to relocate

Food"service"workers"in"Chicago"want"jobs"in"these"loca>ons:"
Personaliza>on"&"Recommenda>ons"
Beyond domain knowledge… consider per-user knowledge
• 

John lives in Boston but wants to move to New York or possibly another big city.
He is currently a sales manager but wants to move towards business
development.

• 

Irene is a bartender in Dublin and is only interested in jobs within 10KM of her
location in the food service industry.

• 

Irfan is a software engineer in Atlanta and is interested in software engineering
jobs at a Big Data company. He is happy to move across the U.S. for the right job.

• 

Jane is a nurse educator in Boston seeking between $40K and $60K working in
the healthcare industry
Query for Jane
Jane is a nurse educator in Boston seeking between $40K and $60K
working in the healthcare industry

http://localhost:8983/solr/jobs/select/?
fl=jobtitle,city,state,salary&
q=(
jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10
)
AND (
(city:"Boston" AND state:"MA")^15
OR state:"MA”)
AND _val_:"map(salary, 40000, 60000,10, 0)”

*Example from chapter 16 of Solr in Action
Search Results for Jane
{ ...
"response":{"numFound":22,"start":0,"docs":[
{"jobtitle":"Clinical Educator
(New England/ Boston)",
"city":"Boston",
"state":"MA",
"salary":41503},

{"jobtitle":"Nurse Educator",
"city":"Braintree",
"state":"MA",
"salary":56183},

{"jobtitle":"Nurse Educator",
"city":"Brighton",
"state":"MA",
"salary":71359}

…]}}

*Example documents available @ http://github.com/treygrainger/solr-in-action/

"
What did we just do?
• 

We built a recommendation engine!

• 

What is a recommendation engine?
–  A system that uses known information (or derived information from that
known information) to automatically suggest relevant content

• 

Our example was just an attribute based recommendation… we’ll see that
behavioral-based (i.e. collaborative filtering) is also possible.
Redefining “Search Engine”

•  “Lucene is a high-performance, full-featured
text search engine library…”
Yes,6but6really…6
•  "Lucene"is"a"highAperformance,"fullyAfeatured"
token"matching"and"scoring"library…"which"
can"perform"fullAtext"searching."
Redefining “Search Engine”

or,6in6machine6learning6speak:6
•  A"Lucene"index"is"mul>Adimensional""
sparse"matrix…"with"very"fast"and"powerful"lookup"
capabili>es."
•  Think"of"each"field"as"a"matrix"containing"each"term"
mapped"to"each"document"
The Lucene Inverted Index (traditional text example)

What6you6SEND6to6Lucene/Solr:6

How6the6content6is6INDEXED6into6
Lucene/Solr6(conceptually):6

Document6

Content6Field6

Term6

Documents6

doc1""

once"upon"a">me,"in"a"land"far,"far"
away"

a"

doc1"[2x]"

brown"

doc2"

the"cow"jumped"over"the"moon."

doc3"[1x]","doc5"[1x]"

cat"

doc4"[1x]"

doc3""

the"quick"brown"fox"jumped"over"
the"lazy"dog."

cow"

doc2"[1x]","doc5"[1x]"

…"

...6

doc4"

the"cat"in"the"hat"

once"

doc1"[1x],"doc5"[1x]"

doc5"

The"brown"cow"said"“moo”"once."

over"

doc2"[1x],"doc3"[1x]"

the"

…"

…"

doc2"[2x],"doc3"[2x],"
doc4[2x],"doc5"[1x]"

…"

…"
Matching text queries to text fields

/solr/select/?q=jobcontent:“software engineer”
Job6Content6Field6

Documents6

…"

…"

engineer"

doc1,"doc3,"doc4,"doc5"

engineer"

doc5"

somware"engineer"

…"
mechanical"

doc2,"doc4,"doc6"

…"

…6

somware"

doc1,"doc3,"doc4,"doc7,"
doc8"

…"

…"

doc1"""""doc3""""
"""""""doc4"

somware"
doc7"""""doc8"
Beyond Text Searching

•  Lucene/Solr"is"a"search"matching"engine"
•  When"Lucene/Solr"search"text,"they"are"matching"
tokens"in"the"query"with"tokens"in"index"
•  Anything"that"can"be"searched"upon"can"form"the"
basis"of"matching"and"scoring:"
–  text,"atributes,"loca>ons,"results"of"func>ons,"user"
behavior,"classifica>ons,"etc.""
Approaches to Recommendations
• 

Content-based
–  Attribute based
i.e. income level, hobbies, location, experience
–  Hierarchical
i.e. “medical//nursing//oncology”, “animal//dog//terrier”
–  Textual Similarity
i.e. Solr’s MoreLikeThis Request Handler & Search Handler
–  Concept Based
i.e. Solr => “software engineer”, “java”, “search”, “open source”

• 

Collaborative Filtering
“Users who liked that also liked this…”

• 

Hybrid Approaches
Collaborative Filtering
What6you6SEND6to6Lucene/Solr:6
Document6

“Users6who6bought6this6product”6field6

doc1""

How6the6content6is6INDEXED6into6
Lucene/Solr6(conceptually):6
Term6

Documents6

user1,"user4,"user5"

user1"

doc1,"doc5"

doc2"

user2,"user3"

user2"

doc2"

doc3""

user4"

user3"

doc2"

doc4"

user4,"user5"

user4"

doc5"

user4,"user1"

doc1,"doc3,""
doc4,"doc5"

…"

…"

user5"

doc1,"doc46

…"

…"
Step 1: Find similar users who like the same documents

q=documen>d:"("doc1""OR""doc4")"
Document6

“Users6who6bought6this6product”6field6

doc1""

user1,"user4,"user5"

doc2"

user2,"user3"

doc3""

user4"

doc4"

user4,"user5"

doc5"

user4,"user1"

…"

…"

*Source:"Solr%in%Ac*on,"chapter"16"

doc16
user166666user466
6666
666666666user56

doc46
666user466666user56

TopAscoring"results"(most"similar"users):"
1)  "user4"(2"shared"likes)"
2)  "user5"(2"shared"likes)"
3)  "user"1"(1"shared"like)"
"

Step 2: Search for docs “liked” by those similar users

"""
Most"similar"users:"
1)  "user4"(2"shared"likes)"
"""""""""""""""""""""""""""""""""""""""""""""""""""""""/solr/select/?q=userlikes:("user4"^2"" "
"
2)  "user5"(2"shared"likes)"
" """""""""""""""""""""""""""""""""""""""""""""""""""""""""OR""user5"^2"OR""user1"^1)"
3)  "user"1"(1"shared"like)"
Term6

Documents6

user1"

doc1,"doc5"

user2"

doc2"

user3"

doc2"

user4"

doc1,"doc3,""
doc4,"doc5"

user5"

doc1,"doc46

…"

…"

*Source:"Solr%in%Ac*on,"chapter"16"

Top"recommended"documents:"
1)"doc1"(matches"user4,"user5,"user1)"
2)"doc4"(matches"user4,"user5)"
3)"doc5"(matches"user4,"user1)"
4)"doc3"(matches"user4)"
"
//"doc2"does"not"match"
Building up to personalization
• 

Use what you have:
–  User’s keywords, IP address, searches, clicks, “likes” (purchases,
job applications, comments, etc.)
–  Build up a dossier of information on your users
–  If a user gives you a profile (resume, social profile, etc), even better.
For full coverage of building a recommendation engine in Solr…
• 

See my talk from Lucene Revolution 2012 (Boston):
Personalized Search
• 

Why limit yourself to JUST explicit search or JUST automated recommendations?

• 

By augmenting your user’s explicit queries with information you know about them, you
can personalize their search results.

• 

Examples:
–  A known software engineer runs a blank job search in New York…
•  Why not show software engineering higher in the results?
–  A new user runs a keyword-only search for nurse
•  Why not use the user’s IP address to boost documents geographically closer?
Seman>c"Search"
Not going to talk about…
•  Using the SynonymFilter
•  Automatic language detection
•  Stemming/lemmatization/multi-lingual search
•  Stopwords
(For all of the above, see the Solr Wiki, Reference Guide, or read Solr in Action)
• 

Instead, we’re going to cover:
–  Mining user behavior to discover synonyms/related queries
–  Discovering related concepts using document clustering in Solr
–  Future work: Latent Semantic Indexing
–  Document to Document searching using More Like This
–  Foreground/Background corpus analysis
Automatic Synonym Discovery
• 
• 

Our primary approach: Search Co-occurrences
Strategy: Map/Reduce job which computes similar searches run for the same
users
John searched for “java developer” and “j2ee”
Jane searched for “registered nurse” and “r.n.” and “prn”.
Zeke searched for “java developer” and “scala” and “jvm”

• 

By mining the searches of tens millions of search terms per day, we get a list of top
searches, with the corresponding top co-occurring searches.

• 

We also tie each search term to the top category of jobs (i.e java developer, truck
driver, etc.), so that we know in what context people search for each term.
Example of “related search terms”

Example:"“RN”:"
registered"nurse"6588,"
rn"registered"nurse"4300,"
nurse"2492,"
nursing"912,"
lpn"707,"
healthcare"453,"
rn"case"manager"446,"
registered"nurse"rn"404,"
director"of"nursing"321,"
case"manager"292"

Example:"“accoun>ng”"
accountant"8880,"
accounts"payable"5235,"
finance"3675,"
accoun>ng"clerk"3651,"
bookkeeper"3225,"
controller"2898,"
staff"accountant"2866,"
accounts"receivable"2842"
Future work on building conceptual links
Latent Semantic Indexing
•  Concept: Build a matrix of all terms, perform singular value decomposition on that
Matrix to reduce the number of dimensions, and index the meaningful (i.e. blurred)
terms on each document.
• 

Why this matters: if done correctly, the search engine can automatically collapse
terms by meaning, remove the useless and redundant ones, and for it’s own
conceptual model of your domain space. This can be used to infuse more
meaning into a document than just a keyword.

• 

See blog posts and presentations by John Berryman and Doug Turnbull about
their work on this. They’re leading the way on this right now (in the open-source
community).

• 

http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy
Using Clustering to find semantic links
Setting up Clustering in solrconfig.xml
<searchComponent.name="clustering".enable=“true“..class="solr.clustering.ClusteringComponent">"
..<lst.name="engine">"
....<str.name="name">default</str>"
....<str.name="carrot.algorithm">.
.org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>"
....<str.name="MultilingualClustering.defaultLanguage">ENGLISH</str>"
..</lst>"
</searchComponent>"
."
<requestHandler.name="/clustering".enable=“true".class="solr.SearchHandler">"
..<lst.name="defaults">"
....<str.name="clustering.engine">default</str>"
....<bool.name="clustering.results">true</bool>"
....<str.name="fl">*,score</str>"
..</lst>"
..<arr.name="lastIcomponents">"
....<str>clustering</str>"
..</arr>"
</requestHandler>"
Clustering Query

/solr/clustering/?q=(solr or lucene)
&rows=100
&carrot.title=titlefield
&carrot.snippet=titlefield
&LingoClusteringAlgorithm.desiredClusterCountBase=25
//clustering & grouping don’t currently play nicely
Allows you to dynamically identify “concepts” and their
prevalence within a user’s top search results
Clustering Results

Stage"1:"Iden>fy"Concepts"
Original"Query:"""q=(solr"or"lucene)""""
"
"
"
"
"
"
"//"can"be"a"user’s"search,"their"job">tle,""a"list"of"skills,"
//"or"any"other"keyword"rich"data"source"

Clusters Identified:

"

Developer (22)
Java Developer (13)
Software (10)
Senior Java Developer (9)
Architect (6)
Software Engineer (6)
Web Developer (5)
Search (3)
"
"
""""""""""""""""""""
Software Developer (3)
Systems (3)
Administrator (2)
Hadoop Engineer (2)
Java J2EE (2)
Search Development (2)
Software Architect (2)
Solutions Architect (2)
Stage"2:"Use"Seman>c"Links"in"your"relevancy"calcula>on"
q=content:(“Developer”^22"or"“Java"Developer”^13"or"“Somware"
”^10"or"“Senior"Java"Developer”^9""or"“Architect"”^6"or"“Somware"
Engineer”^6"or"“Web"Developer"”^5"or"“Search”^3"or"“Somware"
Developer”^3"or"“Systems”^3"or"“Administrator”^2"or"“Hadoop"
Engineer”^2"or"“Java"J2EE”^2"or"“Search"Development”^2"or"
“Somware"Architect”^2"or"“Solu>ons"Architect”^2)6
6
//6Your6can6also6add6the6user’s6loca[on6or6the6original6keywords6to6the66
//6recommenda[ons6search6if6it6helps6results6quality6for6your6usecase."
Document to Document Searching

Goal: use an entire document as your Solr Query, recommending
other related documents.
Standard approach: More Like This Handler
Alternative Approach: Foreground vs. Background corpus analysis
More Like This (Query)
solrconfig.xml:
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />
Query:
/solr/jobs/mlt/?df=jobdescription&
fl=id,jobtitle&
rows=3&
q=J2EE&
// recommendations based on top scoring doc
mlt.fl=jobtitle,jobdescription& // inspect these fields for interesting terms
mlt.interestingTerms=details& // return the interesting terms
mlt.boost=true

*Example"from"chapter"16"of"Solr%in%Ac*on%
More Like This (Results)
{"match":{"numFound":122,"start":0,"docs":[
{"id":"fc57931d42a7ccce3552c04f3db40af8dabc99dc",
"jobtitle":"Senior

Java / J2EE Developer"}]

},
"response":{"numFound":2225,"start":0,"docs":[
{"id":"0e953179408d710679e5ddbd15ab0dfae52ffa6c",
"jobtitle":"Sr

Core Java Developer"},

{"id":"5ce796c758ee30ed1b3da1fc52b0595c023de2db",
"jobtitle":"Applications

Developer"},

{"id":"1e46dd6be1750fc50c18578b7791ad2378b90bdd",
"jobtitle":"Java Architect/

Lead Java Developer WJAV Java - Java in Pittsburgh PA"},]},

""interes>ngTerms":[" "

"
""""""""
""""""jobdescrip>on:j2ee",1.0,"
""""""jobdescrip>on:java",0.68131137,"
""""""jobdescrip>on:senior",0.52161527,"
""""""job>tle:developer",0.44706684,"
""""""jobdescrip>on:source",0.2417754,"
""""""jobdescrip>on:code",0.17976432,"
""""""jobdescrip>on:is",0.17765637,"
""""""jobdescrip>on:client",0.17331646,"
""""""jobdescrip>on:our",0.11985878,"
""""""jobdescrip>on:for",0.07928475,"
""""""jobdescrip>on:a",0.07875194,"
""""""jobdescrip>on:to",0.07741922,"
""""""jobdescrip>on:and",0.07479082]}}"
More Like This (passing in external document)

/solr/jobs/mlt/? df=jobdescription&
fl=id,jobtitle&
mlt.fl=jobtitle,jobdescription&
mlt.interestingTerms=details&
mlt.boost=true
stream.body=Solr is an open source enterprise search platform from the Apache
Lucene project. Its major features include full-text search, hit highlighting, faceted search,
dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling.
Providing distributed search and index replication, Solr is highly scalable. Solr is the most
popular enterprise search engine. Solr 4 adds NoSQL features.
More Like This (Results)
{"response":{"numFound":2221,"start":0,"docs":[
{"id":"eff5ac098d056a7ea6b1306986c3ae511f2d0d89 ",
• 

"jobtitle":"Enterprise

Search Architect…"},

{"id":"37abb52b6fe63d601e5457641d2cf5ae83fdc799 ",
"jobtitle":"Sr.

Java Developer"},

{"id":"349091293478dfd3319472e920cf65657276bda4 ",
"jobtitle":"Java

Lucene Software Engineer"},]},

""interes>ngTerms":["
""""""jobdescrip>on:search",1.0,"
""""""jobdescrip>on:solr",0.9155779,"
""""""jobdescrip>on:features",0.36472517,"
""""""jobdescrip>on:enterprise",0.30173126,"
""""""jobdescrip>on:is",0.17626463,"
""""""jobdescrip>on:the",0.102924034,"
""""""jobdescrip>on:and",0.098939896]}"}"
CareerBuilder’s Alternative approach (“enhanced” More Like This)
I. Send document as content stream to Solr
II. Perform Language Identification on the content
III. Do language-specific parts of speech detection
•  Keep nouns, remove other parts of speech (removes noise)
IV. Do analysis of additional terms for statistical significance:
tf * idf OR foreground vs. background corpus comparison OR Both
Preferred statistical significance measure:
countFG(x) - totalCountFG * probBG(x)

z=

-------------------------------------------------------sqrt(totalCountFG * probBG(x) * (1 - probBG(x)))

V. Return top scoring terms
Foreground vs. Background Corpus Comparison
/solr/doc2doc?
fg=category:"software engineer"&bg=*:*&stream.body=java nurse and is are was
were ruby php solr oncology part-time … other text in a really long document”
Terms statistically more likely to appear in foreground query than background query:
java
ruby
We"are"essen>ally"boos>ng"terms"which"are"more"related"to"
some"known"feature"(and"ignoring"terms"which"are"equally"
php
likely"to"appear"in"the"background"corpus)"
document
Note: This method requires you pre-classify your documents (which we do)… it
doesn’t work with a document that hasn’t already been classified.
Pulling it all together

Tradi>onal"
Search"

Personalized"
Search"
Profit!"

Seman>c"
Search"

Recommenda>ons"
Take-aways
• 

Lucene’s inverted index is a sparse matrix useful for traditional search
(keywords, locations, etc.), recommendations, and discovering links
between terms/tokens

• 

Traditional tf * idf keyword search is a good starting point, but the best
relevancy lies in combining your domain knowledge (knowledge of user’s
in aggregate) and user-specific knowledge into your own relevancy
factors.

• 

The ability to understand user queries (semantic search) further
enhances the search experience, and you already have many tools at
your fingertips for this.
Questions?

!  Trey6Grainger6
trey.grainger@careerbuilder.com6
@treygrainger6
6
6
6
6
Other6presenta[ons:666
66666h_p://www.treygrainger.com6

htp://solrinac>on.com"

Contenu connexe

Tendances

Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewKevin Watters
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 

Tendances (19)

Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
Haystacks slides
Haystacks slidesHaystacks slides
Haystacks slides
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 

En vedette

Implementing search with solr at 7digital
Implementing search with solr at 7digitalImplementing search with solr at 7digital
Implementing search with solr at 7digitallucenerevolution
 
DESIGNATION-Program-Roadmap
DESIGNATION-Program-RoadmapDESIGNATION-Program-Roadmap
DESIGNATION-Program-RoadmapAaron Fazulak
 
I am the Place
I am the Place I am the Place
I am the Place Alice Toth
 
seanresume15-a
seanresume15-aseanresume15-a
seanresume15-aSean Lynch
 
UX for Lean Startups May 23
UX for Lean Startups May 23UX for Lean Startups May 23
UX for Lean Startups May 23Lane Goldstone
 
Nayoon_Sams-Resume
Nayoon_Sams-ResumeNayoon_Sams-Resume
Nayoon_Sams-ResumeNayoon Sams
 
Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...
Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...
Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...IxDA Chicago
 
Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...
Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...
Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...leisa reichelt
 
University web environments
University web environmentsUniversity web environments
University web environmentsAnne Petersen
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Conversion Conference Chicago - Guerrilla UX Methods
Conversion Conference Chicago - Guerrilla UX MethodsConversion Conference Chicago - Guerrilla UX Methods
Conversion Conference Chicago - Guerrilla UX MethodsRuss U
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
IbrahimUpdated_resume
IbrahimUpdated_resumeIbrahimUpdated_resume
IbrahimUpdated_resumeZigin
 
Ezio Magarotto UI, UX, IA Resume
Ezio Magarotto UI, UX, IA ResumeEzio Magarotto UI, UX, IA Resume
Ezio Magarotto UI, UX, IA ResumeEzio E Magarotto
 
Bryan Daniel UX Portfolio
Bryan Daniel UX PortfolioBryan Daniel UX Portfolio
Bryan Daniel UX PortfolioBryandan6
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 

En vedette (18)

Implementing search with solr at 7digital
Implementing search with solr at 7digitalImplementing search with solr at 7digital
Implementing search with solr at 7digital
 
DESIGNATION-Program-Roadmap
DESIGNATION-Program-RoadmapDESIGNATION-Program-Roadmap
DESIGNATION-Program-Roadmap
 
I am the Place
I am the Place I am the Place
I am the Place
 
seanresume15-a
seanresume15-aseanresume15-a
seanresume15-a
 
UX for Lean Startups May 23
UX for Lean Startups May 23UX for Lean Startups May 23
UX for Lean Startups May 23
 
Nayoon_Sams-Resume
Nayoon_Sams-ResumeNayoon_Sams-Resume
Nayoon_Sams-Resume
 
Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...
Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...
Transition to UX: Panel of Local UX Leaders Hosted by IxDA Chicago and Genera...
 
Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...
Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...
Redesigning the Drupal Issue Queue (Codename Prairie: a Social Architecture P...
 
University web environments
University web environmentsUniversity web environments
University web environments
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Conversion Conference Chicago - Guerrilla UX Methods
Conversion Conference Chicago - Guerrilla UX MethodsConversion Conference Chicago - Guerrilla UX Methods
Conversion Conference Chicago - Guerrilla UX Methods
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Resume
ResumeResume
Resume
 
IbrahimUpdated_resume
IbrahimUpdated_resumeIbrahimUpdated_resume
IbrahimUpdated_resume
 
Ezio Magarotto UI, UX, IA Resume
Ezio Magarotto UI, UX, IA ResumeEzio Magarotto UI, UX, IA Resume
Ezio Magarotto UI, UX, IA Resume
 
Bryan Daniel UX Portfolio
Bryan Daniel UX PortfolioBryan Daniel UX Portfolio
Bryan Daniel UX Portfolio
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 

Similaire à Enhancing relevancy through personalization & semantic search

CPGjobs information Packet
CPGjobs information PacketCPGjobs information Packet
CPGjobs information PacketMichael Carrillo
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Creating Consistency in ​ Compensation with Global Job Leveling
Creating Consistency in ​ Compensation with Global Job LevelingCreating Consistency in ​ Compensation with Global Job Leveling
Creating Consistency in ​ Compensation with Global Job LevelingPayScale, Inc.
 
Re-engineered Job Postings
Re-engineered Job PostingsRe-engineered Job Postings
Re-engineered Job PostingsRick Stomphorst
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...Simplilearn
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 
Fostering Long-Term Test Automation Success
Fostering Long-Term Test Automation SuccessFostering Long-Term Test Automation Success
Fostering Long-Term Test Automation SuccessJosiah Renaudin
 
How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...
How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...
How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...Simplilearn
 
Using ML and Azure to improve Customer Lifetime Value
Using ML and Azure to improve Customer Lifetime ValueUsing ML and Azure to improve Customer Lifetime Value
Using ML and Azure to improve Customer Lifetime ValueNavin Albert
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Neo4j
 
CI or FS Poly Cleared Job Fair Handbook | May 11
CI or FS Poly Cleared Job Fair Handbook | May 11CI or FS Poly Cleared Job Fair Handbook | May 11
CI or FS Poly Cleared Job Fair Handbook | May 11ClearedJobs.Net
 
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond AgileEngineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond AgileKenAtIndeed
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...
Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...
Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...Neo4j
 

Similaire à Enhancing relevancy through personalization & semantic search (20)

CPGjobs information Packet
CPGjobs information PacketCPGjobs information Packet
CPGjobs information Packet
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Creating Consistency in ​ Compensation with Global Job Leveling
Creating Consistency in ​ Compensation with Global Job LevelingCreating Consistency in ​ Compensation with Global Job Leveling
Creating Consistency in ​ Compensation with Global Job Leveling
 
Re-engineered Job Postings
Re-engineered Job PostingsRe-engineered Job Postings
Re-engineered Job Postings
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
 
Resume
ResumeResume
Resume
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
Rahul dutta
Rahul duttaRahul dutta
Rahul dutta
 
Fostering Long-Term Test Automation Success
Fostering Long-Term Test Automation SuccessFostering Long-Term Test Automation Success
Fostering Long-Term Test Automation Success
 
How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...
How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...
How To Become A Big Data Engineer | Big Data Engineer Skills, Roles & Respons...
 
SearchLab
SearchLabSearchLab
SearchLab
 
Using ML and Azure to improve Customer Lifetime Value
Using ML and Azure to improve Customer Lifetime ValueUsing ML and Azure to improve Customer Lifetime Value
Using ML and Azure to improve Customer Lifetime Value
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
 
CI or FS Poly Cleared Job Fair Handbook | May 11
CI or FS Poly Cleared Job Fair Handbook | May 11CI or FS Poly Cleared Job Fair Handbook | May 11
CI or FS Poly Cleared Job Fair Handbook | May 11
 
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond AgileEngineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...
Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...
Neo4j GraphTalk Düsseldorf - How Graphs revolutionise Identity & Access Manag...
 

Plus de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucenelucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucene
 

Dernier

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 

Dernier (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 

Enhancing relevancy through personalization & semantic search

  • 1. Dublin, IE 2013.11.07 Trey Grainger ENHANCING RELEVANCY THROUGH PERSONALIZATION & SEMANTIC SEARCH Search Technology Development Manager @"
  • 2. My Background Trey"Grainger" Search"Technology"Development"Manager" ""@CareerBuilder.com" " Relevant"Background" •  Search"&"Recommenda>ons" •  HighAvolume,"Distributed"Systems" •  NLP,"Relevancy"Tuning,"User"Group"Tes>ng,"&"Machine"Learning" " """""""""""""""""""""""""""Other"Projects" •  CoAauthor:""Solr%in%Ac*on% •  Founder"and"Chief"Engineer"@"""""""""""""""""""""""""".com"
  • 3. Roadmap •  •  •  I. How we use Solr @ CareerBuilder II. Traditional Relevancy Scoring III. Advanced Relevancy through functions –  Factors as a linear function –  Context-aware relevancy parameter weighting •  III. Personalization & Recommendations –  Profile and Behavior-based –  Solr as a recommendation engine –  Collaborative Filtering •  IV. Semantic Search –  –  –  –  –  Mining user-behavior for synonyms Uncovering meaning through clustering Latent Semantic Indexing overview Document-based searching Foreground vs. Background analysis
  • 10. Data Analytics (labor pressure: supply/demand)
  • 11. Data Analytics (hiring comparison per market)
  • 15. Default Lucene Relevancy Algorithm (DefaultSimilarity) Score(q,d)"=""" """"""∑""("-(t"in"d)".""idf(t)2"."t.getBoost()"."norm(t,"d)")6.6coord(q,"d)".6queryNorm(q) """""t"in"q" """" " Where:"" "t"="term;"d"="document;"q"="query;"f"="field" 666666666-(t"in"d)""=""numTermOccurrencesInDocument"½" 666666666idf(t)"=""1"+"log"(numDocs"/"(docFreq"+"1))" 666666666coord(q,"d)"="numTermsInDocumentFromQuery"/"numTermsInQuery" 666666666queryNorm(q)"="1"/"(sumOfSquaredWeights"½")" 666666666sumOfSquaredWeights"="q.getBoost()2"."∑"("idf(t)"."t.getBoost()")2"" """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""t"in"q" 666666666norm(t,"d)"""="""d.getBoost()""f""lengthNorm(f)""f"""f.getBoost()" *Source:"Solr%in%Ac*on,"chapter"3" 6
  • 16. TF * IDF •  Term Frequency: “How well a term describes a document?” –  Measure: how often a term occurs per document •  Inverse Document Frequency: “How important is a term overall?” –  Measure: how rare the term is across all documents
  • 17. Boosting documents and fields •  Certain fields may be more important than other fields: –  The Job Title and Skills may be more relevant than other aspects of the job: /select?qf=jobtitle^10 skills^5 jobrequirements^2 jobdescription^1 •  It’s possible to boost documents and fields at both index time and query time •  If you need more fine-grained control (such as per-term index-time boosting), you can make use of payloads
  • 18. Custom scoring with Payloads •  In addition to boosting search terms and fields, content within Fields can also be boosted differently using Payloads (requires a custom scoring implementation): design [1] / engineer [1] / really [ ] / great [ ] / job [ ] / ten[3] / years[3] / experience[3] / careerbuilder [2] / design [2], … jobtitle: bucket=[1] boost=10; company: bucket=[2] boost=4; jobdescription: bucket=[ ] weight=1; experience: bucket=[3] weight=1.5 We can pass in a parameter to solr at query time specifying the boost to apply to each bucket i.e. …&bucketWeights=1:10;2:4;3:1.5;default:1; •  This allows us to map many relevancy buckets to search terms at index time and adjust the weighting at query time without having to search across hundreds of fields. •  By making all scoring parameters overridable at query time, we are able to do A / B testing to consistently improve our relevancy model
  • 19. That’s great, but what about domain-specific knowledge? •  •  •  •  •  News search: popularity and freshness drive relevance Restaurant search: geographical proximity and price range are critical Ecommerce: likelihood of a purchase is key Movie search: More popular titles are generally more relevant Job search: category of job, salary range, and geographical proximity matter TF * IDF of keywords can’t hold it’s own against good domain-specific relevance factors!
  • 21. Example of domain-specific relevancy calculation News website: /select? fq=$myQuery& 25%" q=_query_:"{!func}scale(query($myQuery),0,100)" AND _query_:"{!func}div(100,map(geodist(),0,1,1))" 25%" AND _query_:"{!func}recip(rord(publicationDate),0,100,100)" 25%" AND _query_:"{!func}scale(popularity,0,100)"& myQuery="street festival"& 25%" sfield=location& pt=33.748,-84.391 *Example"from"chapter"16"of"Solr%in%Ac*on%
  • 22. Fancy boosting functions •  Separating “relevancy” and “filtering” from the query: q=_val_:"$keywords"&fq={!cache=false v=$keywords}&keywords=solr •  Keywords (50%) + distance (25%) + category (25%) q=_val_:"scale(mul(query($keywords),1),0,50)" AND _val_:"scale(sum($radiusInKm,mul(query($distance),-1)),0,25)” AND _val_:"scale(mul(query($category),1),0,25)" &keywords=solr &radiusInKm=48.28 &distance=_val_:"geodist(latitudelongitude.latlon_is,33.77402,-84.29659)” &category=jobtitle:"java developer" &fq={!cache=false v=$keywords}
  • 23. Context aware relevancy Example: Willingness to relocate for a job 2,500" 2,000" 1,500" 1,000" 500" 0" So>ware6engineers6 Food6service6workers6 1%" 5%" 10%" 20%" 25%" 30%" 40%" 50%" 60%" 70%" 75%" 80%" 90%" 95%"
  • 27. Beyond domain knowledge… consider per-user knowledge •  John lives in Boston but wants to move to New York or possibly another big city. He is currently a sales manager but wants to move towards business development. •  Irene is a bartender in Dublin and is only interested in jobs within 10KM of her location in the food service industry. •  Irfan is a software engineer in Atlanta and is interested in software engineering jobs at a Big Data company. He is happy to move across the U.S. for the right job. •  Jane is a nurse educator in Boston seeking between $40K and $60K working in the healthcare industry
  • 28. Query for Jane Jane is a nurse educator in Boston seeking between $40K and $60K working in the healthcare industry http://localhost:8983/solr/jobs/select/? fl=jobtitle,city,state,salary& q=( jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10 ) AND ( (city:"Boston" AND state:"MA")^15 OR state:"MA”) AND _val_:"map(salary, 40000, 60000,10, 0)” *Example from chapter 16 of Solr in Action
  • 29. Search Results for Jane { ... "response":{"numFound":22,"start":0,"docs":[ {"jobtitle":"Clinical Educator (New England/ Boston)", "city":"Boston", "state":"MA", "salary":41503}, {"jobtitle":"Nurse Educator", "city":"Braintree", "state":"MA", "salary":56183}, {"jobtitle":"Nurse Educator", "city":"Brighton", "state":"MA", "salary":71359} …]}} *Example documents available @ http://github.com/treygrainger/solr-in-action/ "
  • 30. What did we just do? •  We built a recommendation engine! •  What is a recommendation engine? –  A system that uses known information (or derived information from that known information) to automatically suggest relevant content •  Our example was just an attribute based recommendation… we’ll see that behavioral-based (i.e. collaborative filtering) is also possible.
  • 31. Redefining “Search Engine” •  “Lucene is a high-performance, full-featured text search engine library…” Yes,6but6really…6 •  "Lucene"is"a"highAperformance,"fullyAfeatured" token"matching"and"scoring"library…"which" can"perform"fullAtext"searching."
  • 32. Redefining “Search Engine” or,6in6machine6learning6speak:6 •  A"Lucene"index"is"mul>Adimensional"" sparse"matrix…"with"very"fast"and"powerful"lookup" capabili>es." •  Think"of"each"field"as"a"matrix"containing"each"term" mapped"to"each"document"
  • 33. The Lucene Inverted Index (traditional text example) What6you6SEND6to6Lucene/Solr:6 How6the6content6is6INDEXED6into6 Lucene/Solr6(conceptually):6 Document6 Content6Field6 Term6 Documents6 doc1"" once"upon"a">me,"in"a"land"far,"far" away" a" doc1"[2x]" brown" doc2" the"cow"jumped"over"the"moon." doc3"[1x]","doc5"[1x]" cat" doc4"[1x]" doc3"" the"quick"brown"fox"jumped"over" the"lazy"dog." cow" doc2"[1x]","doc5"[1x]" …" ...6 doc4" the"cat"in"the"hat" once" doc1"[1x],"doc5"[1x]" doc5" The"brown"cow"said"“moo”"once." over" doc2"[1x],"doc3"[1x]" the" …" …" doc2"[2x],"doc3"[2x]," doc4[2x],"doc5"[1x]" …" …"
  • 34. Matching text queries to text fields /solr/select/?q=jobcontent:“software engineer” Job6Content6Field6 Documents6 …" …" engineer" doc1,"doc3,"doc4,"doc5" engineer" doc5" somware"engineer" …" mechanical" doc2,"doc4,"doc6" …" …6 somware" doc1,"doc3,"doc4,"doc7," doc8" …" …" doc1"""""doc3"""" """""""doc4" somware" doc7"""""doc8"
  • 35. Beyond Text Searching •  Lucene/Solr"is"a"search"matching"engine" •  When"Lucene/Solr"search"text,"they"are"matching" tokens"in"the"query"with"tokens"in"index" •  Anything"that"can"be"searched"upon"can"form"the" basis"of"matching"and"scoring:" –  text,"atributes,"loca>ons,"results"of"func>ons,"user" behavior,"classifica>ons,"etc.""
  • 36. Approaches to Recommendations •  Content-based –  Attribute based i.e. income level, hobbies, location, experience –  Hierarchical i.e. “medical//nursing//oncology”, “animal//dog//terrier” –  Textual Similarity i.e. Solr’s MoreLikeThis Request Handler & Search Handler –  Concept Based i.e. Solr => “software engineer”, “java”, “search”, “open source” •  Collaborative Filtering “Users who liked that also liked this…” •  Hybrid Approaches
  • 38. Step 1: Find similar users who like the same documents q=documen>d:"("doc1""OR""doc4")" Document6 “Users6who6bought6this6product”6field6 doc1"" user1,"user4,"user5" doc2" user2,"user3" doc3"" user4" doc4" user4,"user5" doc5" user4,"user1" …" …" *Source:"Solr%in%Ac*on,"chapter"16" doc16 user166666user466 6666 666666666user56 doc46 666user466666user56 TopAscoring"results"(most"similar"users):" 1)  "user4"(2"shared"likes)" 2)  "user5"(2"shared"likes)" 3)  "user"1"(1"shared"like)"
  • 39. " Step 2: Search for docs “liked” by those similar users """ Most"similar"users:" 1)  "user4"(2"shared"likes)" """""""""""""""""""""""""""""""""""""""""""""""""""""""/solr/select/?q=userlikes:("user4"^2"" " " 2)  "user5"(2"shared"likes)" " """""""""""""""""""""""""""""""""""""""""""""""""""""""""OR""user5"^2"OR""user1"^1)" 3)  "user"1"(1"shared"like)" Term6 Documents6 user1" doc1,"doc5" user2" doc2" user3" doc2" user4" doc1,"doc3,"" doc4,"doc5" user5" doc1,"doc46 …" …" *Source:"Solr%in%Ac*on,"chapter"16" Top"recommended"documents:" 1)"doc1"(matches"user4,"user5,"user1)" 2)"doc4"(matches"user4,"user5)" 3)"doc5"(matches"user4,"user1)" 4)"doc3"(matches"user4)" " //"doc2"does"not"match"
  • 40. Building up to personalization •  Use what you have: –  User’s keywords, IP address, searches, clicks, “likes” (purchases, job applications, comments, etc.) –  Build up a dossier of information on your users –  If a user gives you a profile (resume, social profile, etc), even better.
  • 41. For full coverage of building a recommendation engine in Solr… •  See my talk from Lucene Revolution 2012 (Boston):
  • 42. Personalized Search •  Why limit yourself to JUST explicit search or JUST automated recommendations? •  By augmenting your user’s explicit queries with information you know about them, you can personalize their search results. •  Examples: –  A known software engineer runs a blank job search in New York… •  Why not show software engineering higher in the results? –  A new user runs a keyword-only search for nurse •  Why not use the user’s IP address to boost documents geographically closer?
  • 44. Not going to talk about… •  Using the SynonymFilter •  Automatic language detection •  Stemming/lemmatization/multi-lingual search •  Stopwords (For all of the above, see the Solr Wiki, Reference Guide, or read Solr in Action) •  Instead, we’re going to cover: –  Mining user behavior to discover synonyms/related queries –  Discovering related concepts using document clustering in Solr –  Future work: Latent Semantic Indexing –  Document to Document searching using More Like This –  Foreground/Background corpus analysis
  • 45. Automatic Synonym Discovery •  •  Our primary approach: Search Co-occurrences Strategy: Map/Reduce job which computes similar searches run for the same users John searched for “java developer” and “j2ee” Jane searched for “registered nurse” and “r.n.” and “prn”. Zeke searched for “java developer” and “scala” and “jvm” •  By mining the searches of tens millions of search terms per day, we get a list of top searches, with the corresponding top co-occurring searches. •  We also tie each search term to the top category of jobs (i.e java developer, truck driver, etc.), so that we know in what context people search for each term.
  • 46. Example of “related search terms” Example:"“RN”:" registered"nurse"6588," rn"registered"nurse"4300," nurse"2492," nursing"912," lpn"707," healthcare"453," rn"case"manager"446," registered"nurse"rn"404," director"of"nursing"321," case"manager"292" Example:"“accoun>ng”" accountant"8880," accounts"payable"5235," finance"3675," accoun>ng"clerk"3651," bookkeeper"3225," controller"2898," staff"accountant"2866," accounts"receivable"2842"
  • 47. Future work on building conceptual links Latent Semantic Indexing •  Concept: Build a matrix of all terms, perform singular value decomposition on that Matrix to reduce the number of dimensions, and index the meaningful (i.e. blurred) terms on each document. •  Why this matters: if done correctly, the search engine can automatically collapse terms by meaning, remove the useless and redundant ones, and for it’s own conceptual model of your domain space. This can be used to infuse more meaning into a document than just a keyword. •  See blog posts and presentations by John Berryman and Doug Turnbull about their work on this. They’re leading the way on this right now (in the open-source community). •  http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy
  • 48. Using Clustering to find semantic links
  • 49. Setting up Clustering in solrconfig.xml <searchComponent.name="clustering".enable=“true“..class="solr.clustering.ClusteringComponent">" ..<lst.name="engine">" ....<str.name="name">default</str>" ....<str.name="carrot.algorithm">. .org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>" ....<str.name="MultilingualClustering.defaultLanguage">ENGLISH</str>" ..</lst>" </searchComponent>" ." <requestHandler.name="/clustering".enable=“true".class="solr.SearchHandler">" ..<lst.name="defaults">" ....<str.name="clustering.engine">default</str>" ....<bool.name="clustering.results">true</bool>" ....<str.name="fl">*,score</str>" ..</lst>" ..<arr.name="lastIcomponents">" ....<str>clustering</str>" ..</arr>" </requestHandler>"
  • 50. Clustering Query /solr/clustering/?q=(solr or lucene) &rows=100 &carrot.title=titlefield &carrot.snippet=titlefield &LingoClusteringAlgorithm.desiredClusterCountBase=25 //clustering & grouping don’t currently play nicely Allows you to dynamically identify “concepts” and their prevalence within a user’s top search results
  • 51. Clustering Results Stage"1:"Iden>fy"Concepts" Original"Query:"""q=(solr"or"lucene)"""" " " " " " " "//"can"be"a"user’s"search,"their"job">tle,""a"list"of"skills," //"or"any"other"keyword"rich"data"source" Clusters Identified: " Developer (22) Java Developer (13) Software (10) Senior Java Developer (9) Architect (6) Software Engineer (6) Web Developer (5) Search (3) " " """""""""""""""""""" Software Developer (3) Systems (3) Administrator (2) Hadoop Engineer (2) Java J2EE (2) Search Development (2) Software Architect (2) Solutions Architect (2)
  • 53. Document to Document Searching Goal: use an entire document as your Solr Query, recommending other related documents. Standard approach: More Like This Handler Alternative Approach: Foreground vs. Background corpus analysis
  • 54. More Like This (Query) solrconfig.xml: <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" /> Query: /solr/jobs/mlt/?df=jobdescription& fl=id,jobtitle& rows=3& q=J2EE& // recommendations based on top scoring doc mlt.fl=jobtitle,jobdescription& // inspect these fields for interesting terms mlt.interestingTerms=details& // return the interesting terms mlt.boost=true *Example"from"chapter"16"of"Solr%in%Ac*on%
  • 55. More Like This (Results) {"match":{"numFound":122,"start":0,"docs":[ {"id":"fc57931d42a7ccce3552c04f3db40af8dabc99dc", "jobtitle":"Senior Java / J2EE Developer"}] }, "response":{"numFound":2225,"start":0,"docs":[ {"id":"0e953179408d710679e5ddbd15ab0dfae52ffa6c", "jobtitle":"Sr Core Java Developer"}, {"id":"5ce796c758ee30ed1b3da1fc52b0595c023de2db", "jobtitle":"Applications Developer"}, {"id":"1e46dd6be1750fc50c18578b7791ad2378b90bdd", "jobtitle":"Java Architect/ Lead Java Developer WJAV Java - Java in Pittsburgh PA"},]}, ""interes>ngTerms":[" " " """""""" """"""jobdescrip>on:j2ee",1.0," """"""jobdescrip>on:java",0.68131137," """"""jobdescrip>on:senior",0.52161527," """"""job>tle:developer",0.44706684," """"""jobdescrip>on:source",0.2417754," """"""jobdescrip>on:code",0.17976432," """"""jobdescrip>on:is",0.17765637," """"""jobdescrip>on:client",0.17331646," """"""jobdescrip>on:our",0.11985878," """"""jobdescrip>on:for",0.07928475," """"""jobdescrip>on:a",0.07875194," """"""jobdescrip>on:to",0.07741922," """"""jobdescrip>on:and",0.07479082]}}"
  • 56. More Like This (passing in external document) /solr/jobs/mlt/? df=jobdescription& fl=id,jobtitle& mlt.fl=jobtitle,jobdescription& mlt.interestingTerms=details& mlt.boost=true stream.body=Solr is an open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable. Solr is the most popular enterprise search engine. Solr 4 adds NoSQL features.
  • 57. More Like This (Results) {"response":{"numFound":2221,"start":0,"docs":[ {"id":"eff5ac098d056a7ea6b1306986c3ae511f2d0d89 ", •  "jobtitle":"Enterprise Search Architect…"}, {"id":"37abb52b6fe63d601e5457641d2cf5ae83fdc799 ", "jobtitle":"Sr. Java Developer"}, {"id":"349091293478dfd3319472e920cf65657276bda4 ", "jobtitle":"Java Lucene Software Engineer"},]}, ""interes>ngTerms":[" """"""jobdescrip>on:search",1.0," """"""jobdescrip>on:solr",0.9155779," """"""jobdescrip>on:features",0.36472517," """"""jobdescrip>on:enterprise",0.30173126," """"""jobdescrip>on:is",0.17626463," """"""jobdescrip>on:the",0.102924034," """"""jobdescrip>on:and",0.098939896]}"}"
  • 58. CareerBuilder’s Alternative approach (“enhanced” More Like This) I. Send document as content stream to Solr II. Perform Language Identification on the content III. Do language-specific parts of speech detection •  Keep nouns, remove other parts of speech (removes noise) IV. Do analysis of additional terms for statistical significance: tf * idf OR foreground vs. background corpus comparison OR Both Preferred statistical significance measure: countFG(x) - totalCountFG * probBG(x) z= -------------------------------------------------------sqrt(totalCountFG * probBG(x) * (1 - probBG(x))) V. Return top scoring terms
  • 59. Foreground vs. Background Corpus Comparison /solr/doc2doc? fg=category:"software engineer"&bg=*:*&stream.body=java nurse and is are was were ruby php solr oncology part-time … other text in a really long document” Terms statistically more likely to appear in foreground query than background query: java ruby We"are"essen>ally"boos>ng"terms"which"are"more"related"to" some"known"feature"(and"ignoring"terms"which"are"equally" php likely"to"appear"in"the"background"corpus)" document Note: This method requires you pre-classify your documents (which we do)… it doesn’t work with a document that hasn’t already been classified.
  • 60. Pulling it all together Tradi>onal" Search" Personalized" Search" Profit!" Seman>c" Search" Recommenda>ons"
  • 61. Take-aways •  Lucene’s inverted index is a sparse matrix useful for traditional search (keywords, locations, etc.), recommendations, and discovering links between terms/tokens •  Traditional tf * idf keyword search is a good starting point, but the best relevancy lies in combining your domain knowledge (knowledge of user’s in aggregate) and user-specific knowledge into your own relevancy factors. •  The ability to understand user queries (semantic search) further enhances the search experience, and you already have many tools at your fingertips for this.