SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
formation Retrieval at LinkedIn
Shakti Sinha Daniel Tunkelang
Head, Search Relevance Head, Query Understanding
Shakti Daniel
Find and be Found:
Why do 225M+ people use LinkedIn?
2
Profile: the professional identity of record.
3
Job recommendations.
4
Publishing platform for professional content.
5
Search helps members find and be found.
6
Search for people,
7
Search for people, jobs,
8
Search for people, jobs, groups, and more.
9
Every search is personalized.
10
Let’s talk a bit about how it all works.
§  Query Understanding
§  Ranking
More at http://data.linkedin.com/search.
11
Query Understanding
12
Daniel Tunkelang
Head, Query Understanding
Pre-retrieval: segment and tag queries.
lucene software engineer
lucene “software engineer”
LinkedIn’s focus: entity-oriented search.
14
Company
Employees
Jobs
Name
Search
Query tagging: key to query understanding.
§  Using human judgments to evaluate tag precision.
–  Extremely accurate (> 99%) for identifying person names.
–  Harder to distinguish company vs. title vs. skill (e.g., oracle dba).
§  Comparing CTR for tag matches vs. non-matches.
–  Difference can be large enough to suggest filtering vs. ranking:
15
Detecting navigational vs. exploratory queries.
Pre-retrieval
§  Sequence of query tags.
Post-retrieval
§  Distribution of scores / features.
16
Click behavior
§  Title searches >50x more
likely to get 2+ clicks than
name searches.
Query expansion for exploratory queries.
17
software patent lawyer
Query expansions derived
from reformulations.
e.g., lawyer -> attorney
Understanding misspelled queries.
18
daniel tankalong infomation retrieval
marisa meyer ingenero eletrico
jonathan podemsky desenista industrail
Did you mean daniel tunkelang?
Did you mean marissa mayer?
Did you mean johnathan podemsky?
Did you mean information retrieval?
Did you mean ingeniero electrico?
Did you mean desenhista industrial?
Spelling out the details.
entity data
people, companies
successful queries
tunkelang =>
reformulations
marisa => marissa
n-grams
dublin => du ub bl li in
metaphones
mark/marc => MRK
word pairs
johnathan podemsky
INDEX
} {marisa meyer yoohoo
marissa
marisa
meyer
mayer
yahoo
yoohoo
19
Ranking
20
Shakti Sinha
Head, Search Relevance
LinkedIn search is personalized.
21
kevin scott
But global factors matter.
22
Relevant results can be in or out of network.
23
§  Searcher’s network matters for relevance.
–  Within network results have higher CTR.
§  But the network is not enough.
–  About two thirds of search clicks come from out of
network results.
Personalized machine-learned ranking.
24
§  Data point is a triple (searcher, query, document).
–  Searcher features are important!
§  Labels: Is this document relevant to the query and
the user?
–  Depends on the user’s network, location, etc.
–  Too much to ask random person to judge.
§  Training data has to be collected from search logs.
Search log data has biases.
25
§  Presentation bias
–  Results shown higher tend to get clicked more often.
–  Use FairPairs [Radlinski and Joachims, AAAI’06].
not flipped
flipped
flipped
Clicked!
✗
✔
✔
✗
✗
✗
training data
Search log data has biases.
26
§  Sample bias
–  User clicks or skips only what is shown.
–  What about low scoring results from existing model?
–  Add low-scoring results as ‘easy negatives’ so model
learns bad results not presented to user.
…
label 0
label 0
label 0
label 0
…
page 1 page 2 page 3 page n
27
How to train your model.
How to train your model.
28
§  Train simple models to resemble complex ones.
–  Build Additive Groves model [Sorokina et al, ECML ’07],
which is good at detecting interactions.
§  Build tree with logistic regression leaves.
§  By restricting tree to user and query features, only
regression model evaluated for each document.
β0 +β1 T(x1)+...+βn xn
α0 +α1 P(x1)+...+αnQ(xn )
X2=?
X10< 0.1234 ?
γ0 +γ1 R(x1)+...+γnQ(xn )
Take-Aways
§  LinkedIn’s search problem is unique because of deep role
of personalization – users are integral part of the corpus.
§  Query understanding allows us to optimize for entity-
oriented search against semi-structured content.
§  Ranking requires us to contextually apply global and
personalized user, query, and document features.
29
Thank you!
30
225,
Want to learn more?
§  Check out http://data.linkedin.com/search.
§  Contact us:
–  Shakti: ssinha@linkedin.com
http://linkedin.com/in/sdsinha
–  Daniel: dtunkelang@linkedin.com
http://linkedin.com/in/dtunkelang
–  Asif: amakhani@linkedin.com
http://linkedin.com/in/asifmakhani
§  Did we mention that we’re hiring?
31

Contenu connexe

Tendances

Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
Malware Detection in Android Applications
Malware Detection in Android ApplicationsMalware Detection in Android Applications
Malware Detection in Android Applicationsijtsrd
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search  Engine Advert...【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search  Engine Advert...
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...Deep Learning JP
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query UnderstandingDaniel Tunkelang
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Activity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedActivity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedBodla Kumar
 
データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析Seiichi Uchida
 
Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...MLconf
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
 
NetworkXによる語彙ネットワークの可視化
NetworkXによる語彙ネットワークの可視化NetworkXによる語彙ネットワークの可視化
NetworkXによる語彙ネットワークの可視化Shintaro Takemura
 
ユーザーサイド情報検索システム
ユーザーサイド情報検索システムユーザーサイド情報検索システム
ユーザーサイド情報検索システムjoisino
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
 
Python基礎その2
Python基礎その2Python基礎その2
Python基礎その2大貴 末廣
 

Tendances (20)

Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Malware Detection in Android Applications
Malware Detection in Android ApplicationsMalware Detection in Android Applications
Malware Detection in Android Applications
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search  Engine Advert...【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search  Engine Advert...
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Activity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedActivity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn Feed
 
データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析
 
Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
NetworkXによる語彙ネットワークの可視化
NetworkXによる語彙ネットワークの可視化NetworkXによる語彙ネットワークの可視化
NetworkXによる語彙ネットワークの可視化
 
ユーザーサイド情報検索システム
ユーザーサイド情報検索システムユーザーサイド情報検索システム
ユーザーサイド情報検索システム
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
 
Python基礎その2
Python基礎その2Python基礎その2
Python基礎その2
 

En vedette

Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInDaria Sorokina
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017LinkedIn
 
Design in Tech Report 2017
Design in Tech Report 2017Design in Tech Report 2017
Design in Tech Report 2017John Maeda
 

En vedette (10)

Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 
Design in Tech Report 2017
Design in Tech Report 2017Design in Tech Report 2017
Design in Tech Report 2017
 

Similaire à Find and be Found: Information Retrieval at LinkedIn

Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedInViet Ha-Thuc
 
Keep calm presentation for cipd exhibition 2012
Keep calm presentation for cipd exhibition 2012Keep calm presentation for cipd exhibition 2012
Keep calm presentation for cipd exhibition 2012EasyWebRecruitment
 
smAlbany 2013 power resume_search presentation times union monster
smAlbany 2013 power resume_search presentation  times union monstersmAlbany 2013 power resume_search presentation  times union monster
smAlbany 2013 power resume_search presentation times union monsterLiberteks
 
LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices Bruce Bennett
 
LinkedIn Basics and Best Practices July 2018
LinkedIn Basics and Best Practices July 2018LinkedIn Basics and Best Practices July 2018
LinkedIn Basics and Best Practices July 2018Bruce Bennett
 
Personal Brand Exploration I George Stefas
Personal Brand Exploration I George StefasPersonal Brand Exploration I George Stefas
Personal Brand Exploration I George StefasGeorge Stefas
 
Intermediate LinkedIn - November 2018
Intermediate LinkedIn - November 2018Intermediate LinkedIn - November 2018
Intermediate LinkedIn - November 2018Bruce Bennett
 
LinkedIn For Your Job Search
LinkedIn For Your Job SearchLinkedIn For Your Job Search
LinkedIn For Your Job SearchBruce Bennett
 
Linkedin for Danish University Students
Linkedin for Danish University StudentsLinkedin for Danish University Students
Linkedin for Danish University StudentsAndré Bjørn Nielsen
 
Referrals Get Hired - Speach 2013
Referrals Get Hired - Speach 2013Referrals Get Hired - Speach 2013
Referrals Get Hired - Speach 2013Jonathan Duarte
 
LinkedIn Basics and Best Practices
LinkedIn Basics and Best PracticesLinkedIn Basics and Best Practices
LinkedIn Basics and Best PracticesBruce Bennett
 
LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices Bruce Bennett
 
LinkedIn for Your Job Search
LinkedIn for Your Job SearchLinkedIn for Your Job Search
LinkedIn for Your Job SearchBruce Bennett
 
Quarterly Product Release Webinar: Q1 Edition
Quarterly Product Release Webinar: Q1 EditionQuarterly Product Release Webinar: Q1 Edition
Quarterly Product Release Webinar: Q1 EditionLinkedIn Talent Solutions
 
New LinkedIn Recruiter Product Enhancements | North America Webcast
New LinkedIn Recruiter Product Enhancements | North America WebcastNew LinkedIn Recruiter Product Enhancements | North America Webcast
New LinkedIn Recruiter Product Enhancements | North America WebcastLinkedIn Talent Solutions
 
The art of intranet search
The art of intranet searchThe art of intranet search
The art of intranet searchSam Marshall
 

Similaire à Find and be Found: Information Retrieval at LinkedIn (20)

Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedIn
 
Keep calm presentation for cipd exhibition 2012
Keep calm presentation for cipd exhibition 2012Keep calm presentation for cipd exhibition 2012
Keep calm presentation for cipd exhibition 2012
 
smAlbany 2013 power resume_search presentation times union monster
smAlbany 2013 power resume_search presentation  times union monstersmAlbany 2013 power resume_search presentation  times union monster
smAlbany 2013 power resume_search presentation times union monster
 
LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices
 
LinkedIn Basics and Best Practices July 2018
LinkedIn Basics and Best Practices July 2018LinkedIn Basics and Best Practices July 2018
LinkedIn Basics and Best Practices July 2018
 
Personal Brand Exploration I George Stefas
Personal Brand Exploration I George StefasPersonal Brand Exploration I George Stefas
Personal Brand Exploration I George Stefas
 
Questions on sourcing
Questions on sourcingQuestions on sourcing
Questions on sourcing
 
Intermediate LinkedIn - November 2018
Intermediate LinkedIn - November 2018Intermediate LinkedIn - November 2018
Intermediate LinkedIn - November 2018
 
LinkedIn For Your Job Search
LinkedIn For Your Job SearchLinkedIn For Your Job Search
LinkedIn For Your Job Search
 
Linkedin for Danish University Students
Linkedin for Danish University StudentsLinkedin for Danish University Students
Linkedin for Danish University Students
 
Referrals Get Hired - Speach 2013
Referrals Get Hired - Speach 2013Referrals Get Hired - Speach 2013
Referrals Get Hired - Speach 2013
 
LinkedIn Hiring Playbook
LinkedIn Hiring PlaybookLinkedIn Hiring Playbook
LinkedIn Hiring Playbook
 
Smb hiring playbook
Smb hiring playbookSmb hiring playbook
Smb hiring playbook
 
LinkedIn Basics and Best Practices
LinkedIn Basics and Best PracticesLinkedIn Basics and Best Practices
LinkedIn Basics and Best Practices
 
LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices LinkedIn Basics & Best Practices
LinkedIn Basics & Best Practices
 
LinkedIn for Your Job Search
LinkedIn for Your Job SearchLinkedIn for Your Job Search
LinkedIn for Your Job Search
 
Quarterly Product Release Webinar: Q1 Edition
Quarterly Product Release Webinar: Q1 EditionQuarterly Product Release Webinar: Q1 Edition
Quarterly Product Release Webinar: Q1 Edition
 
New LinkedIn Recruiter Product Enhancements | North America Webcast
New LinkedIn Recruiter Product Enhancements | North America WebcastNew LinkedIn Recruiter Product Enhancements | North America Webcast
New LinkedIn Recruiter Product Enhancements | North America Webcast
 
The art of intranet search
The art of intranet searchThe art of intranet search
The art of intranet search
 
Toronto | ConnectIn 2013
Toronto | ConnectIn 2013Toronto | ConnectIn 2013
Toronto | ConnectIn 2013
 

Plus de Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and SemanticsDaniel Tunkelang
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the UserDaniel Tunkelang
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInDaniel Tunkelang
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityDaniel Tunkelang
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsDaniel Tunkelang
 

Plus de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter Authority
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
 
exploring semantic means
exploring semantic meansexploring semantic means
exploring semantic means
 
Set Retrieval 2.0
Set Retrieval 2.0Set Retrieval 2.0
Set Retrieval 2.0
 

Dernier

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Find and be Found: Information Retrieval at LinkedIn

  • 1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions formation Retrieval at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding Shakti Daniel Find and be Found:
  • 2. Why do 225M+ people use LinkedIn? 2
  • 3. Profile: the professional identity of record. 3
  • 5. Publishing platform for professional content. 5
  • 6. Search helps members find and be found. 6
  • 9. Search for people, jobs, groups, and more. 9
  • 10. Every search is personalized. 10
  • 11. Let’s talk a bit about how it all works. §  Query Understanding §  Ranking More at http://data.linkedin.com/search. 11
  • 13. Pre-retrieval: segment and tag queries. lucene software engineer lucene “software engineer”
  • 14. LinkedIn’s focus: entity-oriented search. 14 Company Employees Jobs Name Search
  • 15. Query tagging: key to query understanding. §  Using human judgments to evaluate tag precision. –  Extremely accurate (> 99%) for identifying person names. –  Harder to distinguish company vs. title vs. skill (e.g., oracle dba). §  Comparing CTR for tag matches vs. non-matches. –  Difference can be large enough to suggest filtering vs. ranking: 15
  • 16. Detecting navigational vs. exploratory queries. Pre-retrieval §  Sequence of query tags. Post-retrieval §  Distribution of scores / features. 16 Click behavior §  Title searches >50x more likely to get 2+ clicks than name searches.
  • 17. Query expansion for exploratory queries. 17 software patent lawyer Query expansions derived from reformulations. e.g., lawyer -> attorney
  • 18. Understanding misspelled queries. 18 daniel tankalong infomation retrieval marisa meyer ingenero eletrico jonathan podemsky desenista industrail Did you mean daniel tunkelang? Did you mean marissa mayer? Did you mean johnathan podemsky? Did you mean information retrieval? Did you mean ingeniero electrico? Did you mean desenhista industrial?
  • 19. Spelling out the details. entity data people, companies successful queries tunkelang => reformulations marisa => marissa n-grams dublin => du ub bl li in metaphones mark/marc => MRK word pairs johnathan podemsky INDEX } {marisa meyer yoohoo marissa marisa meyer mayer yahoo yoohoo 19
  • 21. LinkedIn search is personalized. 21 kevin scott
  • 22. But global factors matter. 22
  • 23. Relevant results can be in or out of network. 23 §  Searcher’s network matters for relevance. –  Within network results have higher CTR. §  But the network is not enough. –  About two thirds of search clicks come from out of network results.
  • 24. Personalized machine-learned ranking. 24 §  Data point is a triple (searcher, query, document). –  Searcher features are important! §  Labels: Is this document relevant to the query and the user? –  Depends on the user’s network, location, etc. –  Too much to ask random person to judge. §  Training data has to be collected from search logs.
  • 25. Search log data has biases. 25 §  Presentation bias –  Results shown higher tend to get clicked more often. –  Use FairPairs [Radlinski and Joachims, AAAI’06]. not flipped flipped flipped Clicked! ✗ ✔ ✔ ✗ ✗ ✗ training data
  • 26. Search log data has biases. 26 §  Sample bias –  User clicks or skips only what is shown. –  What about low scoring results from existing model? –  Add low-scoring results as ‘easy negatives’ so model learns bad results not presented to user. … label 0 label 0 label 0 label 0 … page 1 page 2 page 3 page n
  • 27. 27 How to train your model.
  • 28. How to train your model. 28 §  Train simple models to resemble complex ones. –  Build Additive Groves model [Sorokina et al, ECML ’07], which is good at detecting interactions. §  Build tree with logistic regression leaves. §  By restricting tree to user and query features, only regression model evaluated for each document. β0 +β1 T(x1)+...+βn xn α0 +α1 P(x1)+...+αnQ(xn ) X2=? X10< 0.1234 ? γ0 +γ1 R(x1)+...+γnQ(xn )
  • 29. Take-Aways §  LinkedIn’s search problem is unique because of deep role of personalization – users are integral part of the corpus. §  Query understanding allows us to optimize for entity- oriented search against semi-structured content. §  Ranking requires us to contextually apply global and personalized user, query, and document features. 29
  • 31. Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact us: –  Shakti: ssinha@linkedin.com http://linkedin.com/in/sdsinha –  Daniel: dtunkelang@linkedin.com http://linkedin.com/in/dtunkelang –  Asif: amakhani@linkedin.com http://linkedin.com/in/asifmakhani §  Did we mention that we’re hiring? 31