SlideShare une entreprise Scribd logo
1  sur  71
Search Quality at LinkedIn
Abhimanyu Lad
Senior Software Engineer
Recruiting Solutions

Satya Kanduri
Senior Software Engineer
verticals:
people, jobs
intent: exploratory

tag: skill OR title
related skills:
search, ranking, …

tag: company
id: 1337
industry: internet

2
SEARCH USE CASES

How do people use LinkedIn’s search?

3
PEOPLE SEARCH
Search for people by name

4
PEOPLE SEARCH
Search for people by other attributes

5
EXPLORATORY PEOPLE SEARCH

6
JOB SEARCH

7
COMPANY SEARCH

8
AND MUCH MORE…

9
OUR GOAL
 Universal Search
– Single search box

 High Recall
– Spelling correction, synonym expansion, …

 High Precision
– Entity-oriented search: match things, not strings

10
QUERY UNDERSTANDING
PIPELINE

11
QUERY UNDERSTANDING PIPELINE
Raw query

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Structured query
+
Annotations
12
QUERY UNDERSTANDING PIPELINE
Raw query

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Structured query
+
Annotations
13
SPELLING CORRECTION
Fix obvious typos

Help users spell names

14
SPELLING OUT THE DETAILS
N-grams
marissa => ma ar ri is ss sa

Metaphone

PEOPLE NAMES
COMPANIES
TITLES

mark/marc => MRK

Co-occurrence counts
PAST QUERIES

marissa:mayer = 1000

marisa meyer yahoo
marissa

meyer

marisa

yahoo

mayer
15
SPELLING OUT THE DETAILS
PROBLEM: Corpus as well as query logs contain many spelling errors

Certain spelling errors are quite frequent

While genuine words (especially names) might be infrequent

16
SPELLING OUT THE DETAILS
PROBLEM: Corpus as well as query logs contain many spelling errors
SOLUTION: Use query chains to infer correct spelling

[product manger]

[marissa mayer]

[product manager]

CLICK

CLICK

17
QUERY UNDERSTANDING PIPELINE
Raw query

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Structured query
+
Annotations
18
QUERY TAGGING
IDENTIFYING ENTITIES IN THE QUERY

TITLE

TITLE-237
software engineer
software developer
programmer
…

CO

GEO

CO-1441
Google Inc.
Industry: Internet

GEO-7583
Country: US
Lat: 42.3482 N
Long: 75.1890 W

(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

19
QUERY TAGGING
IDENTIFYING ENTITIES IN THE QUERY

TITLE

CO

GEO

MORE PRECISE MATCHING WITH DOCUMENTS

20
ENTITY-BASED FILTERING
BEFORE

21
ENTITY-BASED FILTERING
BEFORE

AFTER

22
ENTITY-BASED FILTERING
BEFORE

23
ENTITY-BASED FILTERING
BEFORE
AFTER

24
ENTITY-BASED SUGGESTIONS

25
ENTITY-BASED SUGGESTIONS

26
QUERY TAGGING : SEQUENTIAL MODEL
TRAINING

EMISSION PROBABILITIES
(Learned from user profiles)

TRANSITION PROBABILITIES
(Learned from query logs)

27
QUERY TAGGING : SEQUENTIAL MODEL
INFERENCE
Given a query, find the most likely sequence of tags

28
QUERY UNDERSTANDING PIPELINE
Raw query

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Structured query
+
Annotations
29
VERTICAL INTENT PREDICTION

JOBS
PEOPLE
COMPANIES
(Probability distribution over verticals)

30
VERTICAL INTENT PREDICTION : SIGNALS
1. Past query counts in each vertical + Query tags
(TAG:COMPANY)

[Company]

(TAG:NAME)

[Name Search]

[Employees]

[Jobs]

2. Personalization: User’s search history
31
QUERY UNDERSTANDING PIPELINE
Raw query

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Structured query
+
Annotations
32
QUERY EXPANSION
GOAL: Improve recall through synonym expansion

33
QUERY EXPANSION : NAME SYNONYMS

34
QUERY EXPANSION : JOB TITLE SYNONYMS

35
QUERY EXPANSION : SIGNALS
Trained using query chains:
[jon]

[jonathan]

CLICK

[programmer]

[developer]

CLICK

[software engineer]

[software developer]

CLICK

Symmetric but not transitive!

Context based!

[francis] ⇔ [frank]
[franklin] ⇔ [frank]

[software engineer] => [software developer]
[civil engineer] ≠ [civil developer]

[francis] ≠ [franklin]

36
QUERY UNDERSTANDING PIPELINE
Raw query

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Structured query
+
Annotations
37
QUERY UNDERSTANDING: SUMMARY
 High degree of structure in queries as well as corpus
(user profiles, job postings, companies, …)

 Query understanding allows us to optimally balance recall
and precision by supporting entity-oriented search
 Query tagging and query log analysis play a big role in
query understanding

38
ranking

39
WHAT’S IN A NAME QUERY?
BUT NAMES CAN BE AMBIGUOUS
kevin scott

≠
SEARCHING FOR A COMPANY’S EMPLOYEES
SEARCHING FOR PEOPLE WITH A SKILL
RANKING IS COMPLICATED
 Seemingly similar queries require dissimilar scoring
functions

 Personalization matters
– Multiple dimensions to personalize on
– Dimensions vary with query class
TRAINING

Documents for
training

F
e
a
t
u
r
e
s
Machine
learning
model

Human
evaluation

L
a
b
e
l
s
TRAINING

Documents for
training

F
e
a
t
u
r
e
s
Machine
learning
model

Human
evaluation

L
a
b
e
l
s
ASSESSING RELEVANCE
RELEVANCE DEPENDS ON WHO’S SEARCHING
What if the
searcher is a job
seeker?
Or a recruiter?
Or…
THE QUERY IS NOT ENOUGH
WE NEED USER FEATURES
 Non-personalized relevance model:
score = f(Document | Query)

 Personalized relevance model:
score = f(Document | Query, User)
COLLECTING RELEVANCE JUDGMENTS WON’T SCALE
TRAINING

Documents for
training

F
e
a
t
u
r
e
s
Machine
learning
model

Human
evaluation
Search logs

L
a
b
e
l
s
CLICKS AS TRAINING DATA
Approach: Clicked = Relevant, Not-Clicked = Not Relevant
CLICKS AS TRAINING DATA
Approach: Clicked = Relevant, Not-Clicked = Not Relevant
CLICKS AS TRAINING DATA
Approach: Clicked = Relevant, Not-Clicked = Not Relevant
CLICKS AS TRAINING DATA
Approach: Clicked = Relevant, Not-Clicked = Not Relevant

User eye
scan
direction

 Good results not
seen are marked
Not Relevant.
Unfairly penalized?
CLICKS AS TRAINING DATA
Approach: Clicked = Relevant, Skipped = Not Relevant
• Only penalize results that the user has seen but
ignored
CLICKS AS TRAINING DATA
Approach: Clicked = Relevant, Skipped = Not Relevant
• Only penalize results that the user has seen but ignored
• Risks inverting model by overweighing low-ranked results
FAIR PAIRS
• Fair Pairs:
• Randomize, Clicked=
R, Skipped= NR

[Radlinski and
Joachims, AAAI’06]
FAIR PAIRS
• Fair Pairs:
• Randomize, Clicked=
R, Skipped= NR

Flipped

[Radlinski and Joachims,
AAAI’06]
FAIR PAIRS
• Fair Pairs:
• Randomize, Clicked=
R, Skipped= NR
• Great at dealing with position bias
• Does not invert models

Flipped

[Radlinski and
Joachims, AAAI’06]
EASY NEGATIVES
• Assumption: A decent current model would
push out bad results to the very end.
• Easy Negatives: Some of the results at the
end are picked up as negative examples
EASY NEGATIVES

2 pages

•

90+ pages

Use strategies that sample across the feature space
• Searches with less results preferred
• Always sample from a given page, say page 10
PUTTING IT ALL TOGETHER

 Human evaluation is not practical for personalized
searches
 Learn from user behavior
– Multiple heuristics depending on the need
– Different pros and cons
EFFICIENCY VS EXPRESSIVENESS
 Build tree with logistic regression leaves.
 By restricting decision nodes to (Query, User)
segments, only one regression model can be evaluated for
each document.
X2=?

b0 + b1 T(x1 )+...+ bn xn

a0 + a1 P(x1 )+...+ anQ(xn )

X4?

g 0 + g1 R(x1 )+...+ g nQ(xn )

66
SCORING

New
document
New
document
New
document

F
e
aF
t eF
uae
r ta
eut
sru
e
sr
e
s

Machin
e
Machin
learning
e
model
Machine
learning
learning
model
model

score
score
score

Ordered
Ordered
list
Ordered
list
list
A SIMPLIFIED EXAMPLE
Name Query?

b0 + 0.85*(IndustryOverlap)+... + bn xn

Skill Query?

a0 +0*(IndustryOverlap)+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn )

68
TEST, TEST, TEST
Interleaving
Model 1

Model 2

Interleaved

a

b

a

b

e

b

c

a

c

d

f

e

g

g

d

h

h

f

[Radlinski et al., CIKM 2008]
69
SUMMARY
 Query understanding leverages the rich structure of
LinkedIn’s content and information needs.

 Query tagging and rewriting allows us to deliver precision
and recall.
 For ranking, personalization is both the biggest challenge
and the core of our solution.
 Segmenting relevance models by query type helps us
efficiently address the diversity of search needs.
Abhimanyu Lad
alad@linkedin.com
https://linkedin.com/in/abhilad

Satya Kanduri
skanduri@linkedin.com
https://linkedin.com/in/skanduri
71

Contenu connexe

En vedette

Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEYelp Engineering
 
Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRDavid Carmel
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInViet Ha-Thuc
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTJulian Qian
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksViet Ha-Thuc
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionRakebul Hasan
 
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Lucidworks
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbLucidworks
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartLucidworks
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016MLconf
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
AWS Elastic Beanstalk - Running Microservices and Docker
AWS Elastic Beanstalk - Running Microservices and DockerAWS Elastic Beanstalk - Running Microservices and Docker
AWS Elastic Beanstalk - Running Microservices and DockerAmazon Web Services
 

En vedette (19)

Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IR
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedIn
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMART
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional Networks
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance Prediction
 
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
AWS Elastic Beanstalk - Running Microservices and Docker
AWS Elastic Beanstalk - Running Microservices and DockerAWS Elastic Beanstalk - Running Microservices and Docker
AWS Elastic Beanstalk - Running Microservices and Docker
 

Plus de Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and SemanticsDaniel Tunkelang
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the UserDaniel Tunkelang
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInDaniel Tunkelang
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityDaniel Tunkelang
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsDaniel Tunkelang
 

Plus de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter Authority
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
 
exploring semantic means
exploring semantic meansexploring semantic means
exploring semantic means
 
Set Retrieval 2.0
Set Retrieval 2.0Set Retrieval 2.0
Set Retrieval 2.0
 

Dernier

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Dernier (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

Search Quality at LinkedIn

Notes de l'éditeur

  1. There’s a high degree of structure in our users’ queries as well as our corpus(i.e, user profiles, job postings, companies, etc)Query understanding allows us to take advantage of this structure to do entity-oriented search to optimally balance recall and precision.Finally, our ability to understand and intelligently rewrite queries heavily depends on two things: query tagging (the ability to identify entities in the query) and query log analysis (analysing how users reformulate their queries)
  2. ThanksAbhi. Today I’ll talking about some of the ranking challenges we face here at LinkedIn. Through this talk I’ll be focusing on People Search, but the challenges are applicable to all search problems we strive to solve.In order to get a sense of the ranking problem, let’s take a look at some examples.
  3. One of the more frequent types of queries we see in people search are name queries. In this example, that query happens to be richardbranson. While we have other Richard bransons on LinkedIn, most likely the searcher was looking for the founder of Virgin group. In order to get this search right, we only need 2 things – name has to match the terms and the rank should be based on global popularity. That is pretty straightforward. Now, let’s take a look at another example.
  4. multiple dimensions of personalizationIf we look at the result sets, the left ones are mostly clustered around san francisco bay area and the right ones are clustered around atlanta area. We could be looking for any Kevin Scott local to us, but considering global prior we put the respective SVPs on top kevinscott --> non #1 results - start talking about featuresThis is also another name query. But there are multiple Kevin Scotts present on LinkedIn. Which of these 2 result sets are relevant? It’s hard to say. If I was issuing this search, I’d say the left result set is better as I work at LinkedIn and I live in SF Bay Area. On the other hand, if someone works at Home Depot and located in Atlanta area, the right result set is probably better. This example shows us 2 dimensions of personalization – 1. company, 2. location. Are there more factors we could personalize on?
  5. There is not query here. I chose to use facets in this case to select the results preciselyWe’ve seen in the previous slide that personalization involves more than 1 dimension. Let us look at an example to see if there are any more dimensions that influence personalizationLet’s take a look at yet another example. Let us say I am looking for someone working at NetApp. Apart from the location personalization that you can see in the 2nd result, there are 2 other important dimensions to personalization – network distance on the first one and industry overlap on the 3rd result.So the question now is, by personalizing on all these dimensions (company, location, network distance, industry etc.) for every query, can we obtain the best set of results? Let’s find out with another example.
  6. One of the unique value propositions of LinkedIn is to search people with a skillNot all features are useful in any query class. For instance, for skills searches, industry overlap didn’t turn out to be a significant features whereas for name searches that is one of the significant featuresballet --> all of the top ranked results are in performing arts (where as I am not in performing industry)One of the unique value propositions of linkedin is to search for people possessing a skill. Most of these results are from performing arts industry. As you can probably guess, I do not work in performing arts industry and I possess no skills related to performing arts. So personalization based on industry is not applicable here. However, if you look carefully, the results are still personalized based on my network distance.
  7. To recap what we saw in the examples,
  8. Considering all these factors that we should take into account – how do we train a personalized machine learning model
  9. Of course I have severely simplified the process, but this is just to give those are you who aren’t familiar with machine learning an ideathis is how machine learned models are trained for ranking.
  10. Most of the work typically involves around sampling documents, engineering features, and obtaining truth data. In my next few slide, we’ll explore different ways in which we can get labelsmore important part is data - unreasonable effectiveness of data- train a model for each of 270 million models
  11. Let’s say a recruiter is looking for someone with skill oracle database – is this still a right result?
  12. - non-personalized (allude to others conventional, traditional) - we are always personalizedSatya’s notes – A conventional non-personalized model is a function of document and query. But in LinkedIn’s case, we have an additional “User” dimension due to personalization. As you can see, our score is a function of document, query, and user.
  13. Cannot use human labels
  14. - lower ranked results not labeled negative - we are throwing our own ranking function under the bus.. there might be a good reason they are ranked lower, but there might be a good result...
  15. ----- Meeting Notes (11/15/13 10:06) -----all the results we didn't evaluate look better than the any results before the ones that was clicked. If the original model was pretty good, that gives a lot of credit to the unseen ones
  16. sampling bias – data concentrated to top results. model does not know how to differentiate really poor results
  17. ----- Meeting Notes (11/15/13 10:25) -----Why is this okay given that unrepresentative sample?
  18. The best models for LTR are generally complex, like ensembles of trees. These models are expensive – especially in first-pass rankers which often need to score hundreds of thousands of results for every query. The approach we use is to first train complex models, then use insights from those to train simple models.don’t talk about ndcgexpressiveness/complexity
  19. Potentially score hundreds of documents… trade-off between expressiveness vs evaluation
  20. The decision nodes can also be based on user segments – such as whether the user is a recruiter or a regular userIndustry overlap is not required ins kill queriesAs can be seen, we can avoid computing IndustryOverlap for a skill query
  21. We test offline as much as possible, but online evaluation is the litmus testConventional ways to measure are CTR, MRR, P/R etc.Interleaving side-by-side2 result sets…
  22. Platform supports fast iteration