Search Quality at LinkedIn

Search Quality at LinkedIn
Abhimanyu Lad
Senior Software Engineer
Recruiting Solutions

Satya Kanduri
Senior Software Engineer

verticals:
people, jobs
intent: exploratory

tag: skill OR title
related skills:
search, ranking, …

tag: company
id: 1337
industry: internet

2

SEARCH USE CASES

How do people use LinkedIn’s search?

3

PEOPLE SEARCH
Search for people by name

4

PEOPLE SEARCH
Search for people by other attributes

5

OUR GOAL
 Universal Search
– Single search box

 High Recall
– Spelling correction, synonym expansion, …

 High Precision
– Entity-oriented search: match things, not strings

10

QUERY UNDERSTANDING
PIPELINE

11

QUERY UNDERSTANDING PIPELINE
Raw query

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Structured query
+
Annotations
12

Raw query

Spellcheck

Query Tagging


Query Expansion

Structured query
+
Annotations
13

SPELLING CORRECTION
Fix obvious typos

Help users spell names

14

SPELLING OUT THE DETAILS
N-grams
marissa => ma ar ri is ss sa

Metaphone

PEOPLE NAMES
COMPANIES
TITLES

mark/marc => MRK

Co-occurrence counts
PAST QUERIES

marissa:mayer = 1000

marisa meyer yahoo
marissa

meyer

marisa

yahoo

mayer
15

PROBLEM: Corpus as well as query logs contain many spelling errors

Certain spelling errors are quite frequent

While genuine words (especially names) might be infrequent

16

PROBLEM: Corpus as well as query logs contain many spelling errors
SOLUTION: Use query chains to infer correct spelling

[product manger]

[marissa mayer]

[product manager]

CLICK

CLICK

17

Raw query

Spellcheck

Query Tagging


Query Expansion

Structured query
+
Annotations
18

QUERY TAGGING
IDENTIFYING ENTITIES IN THE QUERY

TITLE

TITLE-237
software engineer
software developer
programmer
…

CO

GEO

CO-1441
Google Inc.
Industry: Internet

GEO-7583
Country: US
Lat: 42.3482 N
Long: 75.1890 W

(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

19

QUERY TAGGING
IDENTIFYING ENTITIES IN THE QUERY

TITLE

CO

GEO

MORE PRECISE MATCHING WITH DOCUMENTS

20

ENTITY-BASED FILTERING
BEFORE

21

BEFORE

AFTER

22

BEFORE

23

BEFORE
AFTER

24

QUERY TAGGING : SEQUENTIAL MODEL
TRAINING

EMISSION PROBABILITIES
(Learned from user profiles)

TRANSITION PROBABILITIES
(Learned from query logs)

27

QUERY TAGGING : SEQUENTIAL MODEL
INFERENCE
Given a query, find the most likely sequence of tags

28

Raw query

Spellcheck

Query Tagging


Query Expansion

Structured query
+
Annotations
29

VERTICAL INTENT PREDICTION

JOBS
PEOPLE
COMPANIES
(Probability distribution over verticals)

30

VERTICAL INTENT PREDICTION : SIGNALS
1. Past query counts in each vertical + Query tags
(TAG:COMPANY)

[Company]

(TAG:NAME)

[Name Search]

[Employees]

[Jobs]

2. Personalization: User’s search history
31

Raw query

Spellcheck

Query Tagging


Query Expansion

Structured query
+
Annotations
32

QUERY EXPANSION
GOAL: Improve recall through synonym expansion

33

QUERY EXPANSION : NAME SYNONYMS

34

QUERY EXPANSION : JOB TITLE SYNONYMS

35

QUERY EXPANSION : SIGNALS
Trained using query chains:
[jon]

[jonathan]

CLICK

[programmer]

[developer]

CLICK

[software engineer]

[software developer]

CLICK

Symmetric but not transitive!

Context based!

[francis] ⇔ [frank]
[franklin] ⇔ [frank]

[software engineer] => [software developer]
[civil engineer] ≠ [civil developer]

[francis] ≠ [franklin]

36

Raw query

Spellcheck

Query Tagging


Query Expansion

Structured query
+
Annotations
37

QUERY UNDERSTANDING: SUMMARY
 High degree of structure in queries as well as corpus
(user profiles, job postings, companies, …)

 Query understanding allows us to optimally balance recall
and precision by supporting entity-oriented search
 Query tagging and query log analysis play a big role in
query understanding

38

BUT NAMES CAN BE AMBIGUOUS
kevin scott

≠

SEARCHING FOR A COMPANY’S EMPLOYEES

SEARCHING FOR PEOPLE WITH A SKILL

RANKING IS COMPLICATED
 Seemingly similar queries require dissimilar scoring
functions

 Personalization matters
– Multiple dimensions to personalize on
– Dimensions vary with query class

TRAINING

Documents for
training

F
e
a
t
u
r
e
s
Machine
learning
model

Human
evaluation

L
a
b
e
l
s

RELEVANCE DEPENDS ON WHO’S SEARCHING
What if the
searcher is a job
seeker?
Or a recruiter?
Or…

WE NEED USER FEATURES
 Non-personalized relevance model:
score = f(Document | Query)

 Personalized relevance model:
score = f(Document | Query, User)

COLLECTING RELEVANCE JUDGMENTS WON’T SCALE

TRAINING

Documents for
training

F
e
a
t
u
r
e
s
Machine
learning
model

Human
evaluation
Search logs

L
a
b
e
l
s

CLICKS AS TRAINING DATA
Approach: Clicked = Relevant, Not-Clicked = Not Relevant

Approach: Clicked = Relevant, Not-Clicked = Not Relevant

User eye
scan
direction

 Good results not
seen are marked
Not Relevant.
Unfairly penalized?

Approach: Clicked = Relevant, Skipped = Not Relevant
• Only penalize results that the user has seen but
ignored

Approach: Clicked = Relevant, Skipped = Not Relevant
• Only penalize results that the user has seen but ignored
• Risks inverting model by overweighing low-ranked results

FAIR PAIRS
• Fair Pairs:
• Randomize, Clicked=
R, Skipped= NR

[Radlinski and
Joachims, AAAI’06]

FAIR PAIRS
• Fair Pairs:
R, Skipped= NR

Flipped

[Radlinski and Joachims,
AAAI’06]

FAIR PAIRS
• Fair Pairs:
R, Skipped= NR
• Great at dealing with position bias
• Does not invert models

Flipped

[Radlinski and
Joachims, AAAI’06]

EASY NEGATIVES
• Assumption: A decent current model would
push out bad results to the very end.
• Easy Negatives: Some of the results at the
end are picked up as negative examples

EASY NEGATIVES

2 pages

•

90+ pages

Use strategies that sample across the feature space
• Searches with less results preferred
• Always sample from a given page, say page 10

PUTTING IT ALL TOGETHER

 Human evaluation is not practical for personalized
searches
 Learn from user behavior
– Multiple heuristics depending on the need
– Different pros and cons

EFFICIENCY VS EXPRESSIVENESS
 Build tree with logistic regression leaves.
 By restricting decision nodes to (Query, User)
segments, only one regression model can be evaluated for
each document.
X2=?

b0 + b1 T(x1 )+...+ bn xn

a0 + a1 P(x1 )+...+ anQ(xn )

X4?

g 0 + g1 R(x1 )+...+ g nQ(xn )

66

SCORING

New
document
New
document
New
document

F
e
aF
t eF
uae
r ta
eut
sru
e
sr
e
s

Machin
e
Machin
learning
e
model
Machine
learning
learning
model
model

score
score
score

Ordered
Ordered
list
Ordered
list
list

A SIMPLIFIED EXAMPLE
Name Query?

b0 + 0.85*(IndustryOverlap)+... + bn xn

Skill Query?

a0 +0*(IndustryOverlap)+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn )

68

TEST, TEST, TEST
Interleaving
Model 1

Model 2

Interleaved

a

b

a

b

e

b

c

a

c

d

f

e

g

g

d

h

h

f

[Radlinski et al., CIKM 2008]
69

SUMMARY
 Query understanding leverages the rich structure of
LinkedIn’s content and information needs.

 Query tagging and rewriting allows us to deliver precision
and recall.
 For ranking, personalization is both the biggest challenge
and the core of our solution.
 Segmenting relevance models by query type helps us
efficiently address the diversity of search needs.

Abhimanyu Lad
alad@linkedin.com
https://linkedin.com/in/abhilad

Satya Kanduri
skanduri@linkedin.com
https://linkedin.com/in/skanduri
71

Search Quality at LinkedIn

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (19)

Plus de Daniel Tunkelang

Plus de Daniel Tunkelang (20)

Dernier

Dernier (20)

Search Quality at LinkedIn

Notes de l'éditeur